Git Product home page Git Product logo

proposal-array-grouping's Introduction

proposal-array-grouping

Note: this proposal is now at stage 4. See the spec PR here: tc39/ecma262#3176

A proposal to make grouping of items in an array (and iterables) easier.

const array = [1, 2, 3, 4, 5];

// `Object.groupBy` groups items by arbitrary key.
// In this case, we're grouping by even/odd keys
Object.groupBy(array, (num, index) => {
  return num % 2 === 0 ? 'even': 'odd';
});
// =>  { odd: [1, 3, 5], even: [2, 4] }

// `Map.groupBy` returns items in a Map, and is useful for grouping
// using an object key.
const odd  = { odd: true };
const even = { even: true };
Map.groupBy(array, (num, index) => {
  return num % 2 === 0 ? even: odd;
});
// =>  Map { {odd: true}: [1, 3, 5], {even: true}: [2, 4] }

Champions

Status

Current Stage: 4

Motivation

Array grouping is an extremely common operation, best exemplified by SQL's GROUP BY clause and MapReduce programming (which is better thought of map-group-reduce). The ability to combine like data into groups allows developers to compute higher order datasets, like the average age of a cohort or daily LCP values for a webpage.

Two methods are offered, Object.groupBy and Map.groupBy. The first returns a null-prototype object, which allows ergonomic destructuring and prevents accidental collisions with global Object properties. The second returns a regular Map instance, which allows grouping on complex key types (imagine a compound key or tuple).

Why static methods?

We've found a web compatibility issue with the name Array.prototype.groupBy. The Sugar library until v1.4.0 conditionally monkey-patches Array.prototype with an incompatible method. By providing a native groupBy, these versions of Sugar would fail to install their implementation, and any sites that depend on their behavior would break. We've found some 660 origins that use these versions of the Sugar library.

We then attempted the name Array.prototype.group, but this ran into code that uses an array as an arbitrary hashmap. Because these bugs are exceptionally difficult to detect (it requires devs to detect and know how to report the bug to us), the committee didn't want to attempt another prototype method name. Instead we chose to use static method, which we believe is unorthodox enough to not risk a web compatibility issue. This also gives us a nice way to support Records and Tuples in the future.

Polyfill

Related

proposal-array-grouping's People

Contributors

abdelrahmanhafez avatar bakkot avatar jridgewell avatar legendecas avatar linusg avatar ljharb avatar michaelficarra avatar taupiqueur avatar tchetwin avatar zloirock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proposal-array-grouping's Issues

Output an Object vs a Map

With a Map, grouping don't have to be on key compatible types and avoids coercions:

o = {};
o[1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] = 1;
Object.keys(o); // ["1e+90"]

Using an object knowledge of key coercion might be a bit odd.

Problems with Map are usability on reading data out of the Map not being simple property access though but you can always Object.fromEntries(map).

Should arrays be created through Symbol.species?

Continuing discussion from here:

Should the arrays created to hold the sublists be created using ArraySpeciesCreate, so that if you subclass Array and call groupBy the resulting object will have values which are instances of your subclass?

I lean towards no. The existing uses of ArraySpeciesCreate are all straightforward transformations of the original array: you put an array (as this) in, you get an array out. By contrast, this method returns an object (which holds some number of arrays). I don't think it's obvious that ArraySpeciesCreate is appropriate here.

Other lodash use cases where the method could be handy

Thanks for creating this function proposal, I am very exiting to see it. I find this could be useful for many use cases, here are some of them from lodash.

Chunk

const chunk = (array, size = 1) => Object.values(array.groupBy((v, i) => Math.floor(i / size)))

Partition

// predicate returns boolean
const partition = (array, predicate) => Object.values(array.groupBy((v, i, a) => Number(predicate(v, i, a)))

Covered by proposal in #9


Covering Object.values() order consistency was left out for the simplicity.

If this is not worth adding to examples, feel free to close the issue. Thank you 🙂

Why not let the user provide a second argument to control how the values accumulate against a key?

Why not let the user provide a second argument to control how the values accumulate against a key? Basically, the second argument would look a bit like the function normally passed into Array#reduce.

e.g.

[1, 2, 3, 4, 5].groupToMap(
  (v) => (v % 2 === 0 ? "even" : "odd"),
  (accumulator = 0) => accumulator + 1
);
// Map {"odd" => 3, "even" => 2}

I wrote my own groupByMap the other day to show what I mean.

The way I see it is: how easily can somebody implement Python's Counter in JavaScript? It's such a basic thing that it'd be nice to be able to achieve it with a one-liner.

Of course, if you don't do this you could use a combination of .entries() and .map() to transform and then recreate your object/map, but this seems longwinded to me and maybe not a good use of memory.

We can define another parameter named `accumulator`

The accumulator is defined to store grouped data, which should be {} by default:

['one', 'two', 'three'].groupBy(i => i.length); // => {3: Array(2), 5: Array(1)}
['one', 'two', 'three'].groupBy(i => i.length, {}); // => {3: Array(2), 5: Array(1)}
['one', 'two', 'three'].groupBy(i => i.length, []); // => [empty × 3, Array(2), empty, Array(1)]
['one', 'two', 'three'].groupBy(i => i.length === 3 ? 0 : 1, []); // => [Array(2), Array(1)]

Subclassing

class MyMap extends Map { /* ... */ }
MyMap.groupBy(something, fn) instanceof MyMap; // -> false

image
Why? Object always was a special case, more other, Object.groupBy should create an object with null prototype. But Map... Unlike some new instance methods where TC39 decided to break subclassing, new static methods use proper subclassing and create new instances from this. Promise.any, Array.fromAsync, Promise.withResolvers, etc...

Web Compatibility Issue: Sugar versions v0.9 through v1.3.9 (inclusive)

Hi,

Unfortunately, we have found a web compat issue with the sugar.js library. Details here: https://bugzilla.mozilla.org/show_bug.cgi?id=1750812#c5

I didn't check all possible versions for breakage. For sure it is broken on 1.1.2, it looks (from reading the code) that it is fixed by 1.5.

I will add a pref on nightly to allow our users to keep working, and see how broad this issue is. Maybe if it is just a few consumers we can get around it...

Static method naming

With the new static based functions. I was wondering if there were alternatives to the groupBy name. Object.groupBy and Map.groupBy sounds a little like it operates on Object or Map as opposed to building them.

Background

The majority of static Object methods operate on an object argument:

  • getOwnPropertyDescriptor
  • getOwnPropertyDescriptors
  • getOwnPropertyNames
  • getOwnPropertySymbols
  • hasOwn
  • preventExtensions
  • seal
  • assign
  • defineProperties
  • defineProperty
  • freeze
  • getPrototypeOf
  • setPrototypeOf
  • isExtensible
  • isFrozen
  • isSealed
  • keys
  • entries
  • values

The static Object methods that create an object, Object.create and Object.fromEntries, are in the minority but use words that strongly convey that a new value is being created.

Suggestions

Some alternatives we came up with at Bloomberg:

  • Object.byGrouping
  • Object.fromGrouping
  • Object.groupFrom

Thoughts?

groupByMap naming

flatMap means ".flat after .map". It kind of seems like groupByMap should mean ".groupBy after .map", by that precedent.

I'm not sure I have a better naming suggestion, but perhaps it's worth bikeshedding?

Types passed to callback

What is the type of the second parameter passed to the callback? Is it a String or a Number? The current implementation is passing neither — it's passing a mathematical value k there.

Typed arrays creation optimization

TypedArraySpeciesCreate calls SpeciesConstructor / gets constructor and @@species on each typed array creation. Seems it should be optimized for call SpeciesConstructor only one time.

with target object

in the current group() function, it creates the result object in the function.
and it does not allow the user to give the target object.
it is important because at some use case, user does not want to break the reference of the target object.

const targetObj = {};
export const getGroupedValue = ()=> return targetObj;
export const setGroupedValue = (arr) => {
  const grouped = arr.group(val=>val.someId);
  Object.keys(grouped).forEach(key=>{targetObj[key]=grouped[key]}); // <== this is additional part that is needed.
}

if we have a functionality that can get the targetObject as an argument, it will be a lot easier:

...
export const setGroupedValue = (arr) => {
  targetObj = arr.groupToTarget(val=>val.someId, targetObj);
}

Stage 4 Criteria

A polyfill

I made a polyfill and I hope this is useful:

if (!("groupBy" in Array.prototype)) {
  Object.defineProperty(Array.prototype, "groupBy", {
    "configurable": true, "writable": true, "enumerable": false,
    "value": function (fn, thisArg) {
      "use strict";
      if (!((typeof fn === "Function" || typeof fn === "function") && this)) {
        throw new TypeError();
      }
      var key, len = this.length >>> 0, res = {}, t = this, i = -1, kValue;
      if (thisArg === undefined) {
        while (++i !== len) {
          if (i in this) {
            kValue = t[i];
            key = fn(kValue, i, t);
            if (!(key in res)) { res[key] = []; }
            res[key].push(kValue);
          }
        }
      } else {
        while (++i !== len) {
          if (i in this) {
            kValue = t[i];
            key = fn(kValue, i, t);
            if (!(key in res)) { res[key] = []; }
            res[key].push(kValue);
          }
        }
      }
      return res;  
    }
  });
}

/* testcases */

var array = [1,2,3,4,5];

console.log( array.groupBy(i => (i % 2 === 0 ? "even": "odd")) );
// -> { odd: [1, 3, 5], even: [2, 4] }

console.log(
  Array.prototype.groupBy.call(array, i => (i % 2 === 0 ? "even": "odd") )
);
// -> { odd: [1, 3, 5], even: [2, 4] }

Consider alternative attachment on Object and Map

Given the recent webcompat issues that have been identified, has consideration been made about attaching this functionality as static methods on Object and Map instead of array instances?

This would probably conflict less with current web reality, and would match the existing practice of e.g. Array.from() and Object.fromEntries().

Maybe also have `groupByMap`?

In the interest of having both the convenience of regular objects and the additional utility of Maps available, albeit not in the same object, perhaps we can have a method which returns a plain object (coalescing return values with ToPropertyKey) as well as a method which returns a Map (coalescing only 0 and -0, and otherwise by identity).

Request to reconsider `groupBy` as a prototype function

As it stands at the moment, the proposal is to implement groupBy as a static function on Object and Map instead of implementing it as a prototype function. I want to express my severe disappointment with this.

The proposed API will

  1. differ from other general array functions
  2. not allow function chaining

(1) Implementing the function as a static function on Object will not be in line with existing utility functions on Array. I think this makes the feature very redundant as you could simply implement it yourself or just use any utility library with a similar function (like lodash). I also think it significantly decreases discoverability. You are able to find all the other nice utility functions on the array itself, so why should groupBy be different then? I know there is a reason (we will come back to that), but for developers this change in the API would be very weird and nonsensical.

products.find((product) => product.sku === "FOO42BAR");
products.map((product) => ({ name: product.name, expensive: product.price > 1000 }));
products.reduce((total, product) => total + product.price, 0);
products.some((product) => product.price > 1000);
products.toSorted((product) => product.name);

Object.groupBy(products, (product) => product.category);

(2) The main benefit of having these utility functions on the prototype rather than as static functions is that you can easily chain them. In my opinion (and I think this is a widely shared opinion) function chaining is way more readable than function nesting. And this is especially true if we need to mix the two approaches:

// Array.prototype.groupBy
products
  .filter((p) => p.price > 100)
  .groupBy((p) => p.category)
  .some((grp) => grp.length > 10);

// Object.groupBy
Object.groupBy(products.filter((p) => p.price > 100), (p) => p.category)
      .some((grp) => grp.length > 10);

In this example, you can easily follow the flow of the code when chaining, but with groupBy as a static function you need some effort to understand the same flow because the function calls are not written in the order they are executed.


This is the reasoning for using a static function instead of prototyping:

We've #37 a web compatibility issue with the name Array.prototype.groupBy. The Sugar library until v1.4.0 conditionally monkey-patches Array.prototype with an incompatible method. By providing a native groupBy, these versions of Sugar would fail to install their implementation, and any sites that depend on their behavior would break. We've found some 660 origins that use these versions of the Sugar library.

We then attempted the name Array.prototype.group, but this ran into #44. Because these bugs are exceptionally difficult to detect (it requires devs to detect and know how to report the bug to us), the committee didn't want to attempt another prototype method name. Instead we chose to use static method, which we believe is unorthodox enough to not risk a web compatibility issue. This also gives us a nice way to support Records and Tuples in the future.

So because some web pages use an old version of a utility library which prototypes built-in types (which you are generally discouraged from doing, due to this exact potential issue), we now cannot use implement the function properly for all others to use?

I am pretty sure that Sugar is disclaiming what it's doing, so developers using that library should definitely be aware of it. It seems like this is only an issue for versions lower than v1.4.0 which was released 10 years ago.

I understand that you would want to avoid breaking changes to the language, but these websites have been built using a library that advertises its use of a bad practice (monkey-patching built-in prototypes), and those websites have not updated their code for 10 years? And those websites are the reason we end up with a mediocre (or even redundant) feature? I do not have any sympathy for those websites and I think this strategy is too defensive and limiting.


To sum up, I was looking forward to this feature because it's something we cannot implement ourselves (unless you, like Sugar, make risky prototyping). I think the language is moving in the right direction by adding these utility functions, but I think the feature as it is proposed now is not worth implementing. As mentioned, we get a sub-optimal version of the feature (no function chaining) that already exists in various utility libraries (or you can create it yourself very easily), and it has problems with discoverability. I think the function should definitely be implemented as a prototype function, or the proposal should be scrapped entirely.

I really hope that the implementation can be reconsidered to be implemented as a prototype function as originally planned.

How to convert an array of of objects to a hash?

How would I use the proposed additions to achieve the following grouping?

 const objects = [{ identifier: 17, name: 'Karol' }, { identifier: 19, name: 'Henry' }];
 const objMap = objects.reduce((obj, c) => ({ ...obj, [c.identifier]: c }), {});
 console.log(objMap);

Should HasProperty be called, the same way Array.prototype.map calls it?

Example:

const a = [1, , 3];
a.length = 5;

a.map((n) => { console.log("map callback " + n); return n; });
a.group((n) => { console.log("group callback " + n); return 0; });

Output:

map callback 1
map callback 3
group callback 1
group callback undefined
group callback 3
group callback undefined
group callback undefined

It would be more consistent if Array.prototype.group would do a HasProperty check before calling the callback. Is there a particular reason why it isn't?

Treatment of array holes?

I accidentally (then purposefully) changed the behavior of array holes when migrating the proposal to this repo. Before, we performed a Has check before Get, and if the key did not exist, we skipped this element. Now, we unconditionally call Get.

This has the following side-effect:

[1, 2, /* hole */, 4].groupBy(v => {
  return v <= 2;
});

// { true: [1, 2], false: [undefined, 4] }
// Previously, would have been { true: [1, 2], false: [4] }

We've explicitly transformed a hole into an undefined in the false grouping.

How about add `groupToArray` or `chunk`?

const array = [1, 2, 3, 4, 5];

array.groupToArray((num, index, array) => {
  return Math.floor(index / 2)
});

// or array.chunk(2);
// =>  Array [[1, 2], [3, 4], [5]]

Supporing Records/Tuples and Maps

tc39/proposal-record-tuple#275 brings up how groupBy could be supported on Tuples. Two open questions are:

  1. Should the groups be Array or Tuples?
  2. Should the output be an Object or a Record?

I think we can sidestep the first question for the time being (Array.p.groupBy groups into Arrays, and it wouldn't be weird for either choice to be chosen for Tuples).

But, I think we should address the second question now. We currently have Array.p.groupByToMap, and I don't think adding a new groupByToX is a great solution for new return types. I think it'll be a bit weird to have:

  • Array.p.groupBy
  • Array.p.groupByToMap
  • Array.p.groupByToRecord (this would probably be bad, because how likely is it that the array is all primitives?)
  • Tuple.p.groupBy
  • Tuple.p.groupByToMap
  • Tuple.p.groupByToRecord

@denk0403 suggests in #30 (comment) that we extend to new types by providing a static method on the constructor. Eg,

  • Array.p.groupBy(callback) returns a prototype-less Object
  • Tuple.p.groupBy(callback) returns a prototype-less Object
  • Map.groupBy(iterable, callback) returns a Map
  • Record.groupBy(iterable, callback) returns a Record

Design question

Personally, I see the arbitrary key usage as odd and easily usable in a way that could introduce bugs into code.

Trivial example:

function count_odds_and_evens(array) {
  const groups = array.groupBy(
    n => n % 2 === 0
      ? "even"
      : "odd"
  );

  console.log(
    "There were %i even numbers and %i odd numbers!",
    groups.even.length,
    groups.odd.length
  );
}

Assuming a normal number array, this should be fine, unless it's given a function that has no odds or evens, in which the key will never be populated with an array, and accessing length will fail. Could this be a real problem when in use?

don't prefer Object over Map

I suggest renaming groupBy and groupByToMap to groupByToObject and groupBy respectively or dropping support for an object altogether.

  1. there are various reasons to prefer Map over Object

    reasons from MDN
    Map Object
    Accidental Keys A Map does not contain any keys by default. It only contains what is explicitly put into it.

    An Object has a prototype, so it contains default keys that could collide with your own keys if you're not careful.

    Note: As of ES5, this can be bypassed by using {{jsxref("Object.create", "Object.create(null)")}}, but this is seldom done.

    Key Types A Map's keys can be any value (including functions, objects, or any primitive). The keys of an Object must be either a {{jsxref("String")}} or a {{jsxref("Symbol")}}.
    Key Order

    The keys in Map are ordered in a simple, straightforward way: A Map object iterates entries, keys, and values in the order of entry insertion.

    Although the keys of an ordinary Object are ordered now, this was not always the case, and the order is complex. As a result, it's best not to rely on property order.

    The order was first defined for own properties only in ECMAScript 2015; ECMAScript 2020 defines order for inherited properties as well. See the OrdinaryOwnPropertyKeys and EnumerateObjectProperties abstract specification operations. But note that no single mechanism iterates all of an object's properties; the various mechanisms each include different subsets of properties. ({{jsxref("Statements/for...in", "for-in")}} includes only enumerable string-keyed properties; {{jsxref("Object.keys")}} includes only own, enumerable, string-keyed properties; {{jsxref("Object.getOwnPropertyNames")}} includes own, string-keyed properties even if non-enumerable; {{jsxref("Object.getOwnPropertySymbols")}} does the same for just Symbol-keyed properties, etc.)

    Size

    The number of items in a Map is easily retrieved from its {{jsxref("Map.prototype.size", "size")}} property. The number of items in an Object must be determined manually.
    Iteration A Map is an iterable, so it can be directly iterated.

    Object does not implement an iteration protocol, and so objects are not directly iterable using the JavaScript for...of statement (by default).

    Note:

    • An object can implement the iteration protocol, or you can get an iterable for an object using Object.keys or Object.entries.
    • The for...in statement allows you to iterate over the enumerable properties of an object.
    Performance

    Performs better in scenarios involving frequent additions and removals of key-value pairs.

    Not optimized for frequent additions and removals of key-value pairs.

  2. destructuring and transformation for JSON stringification can be accomplished from a Map easily enough using Object.fromEntries:

    const { even, odd } = Object.fromEntries(
      array.groupBy((num, index, array) => {
        return num % 2 === 0 ? "even" : "odd";
      })
    );
  3. groupBy (to a null-prototype object) may set a precedence for future features to avoid defaulting to more proper keyed collections like Map and Set and use objects and arrays instead; these have been used historically and continue to be used prevalently (from what I can tell) even when there are now these more optimized/appropriate collections available

in short, making it easy for developers to use a Map seems like a win to me and paves the way for more rich data structures and utilities using them and for those use cases where an Object is wanted (destructuring and JSON stringification) then there is a built-in way to quickly, efficiently, and (IMO) ergonomically transform the returned Map to an Object. If further convenience/ergonomics is desired then a groupByToObject method could still be provided while the shorter go-to method groupBy would return a Map as the "preferred" type to work with

`Array#groupBy...)` should take (value, index, array) as parameters

In order to be consistent with all the other array methods, there will be use cases where we want to use the item's index, or look at other items in order to decide the return value for this item.

const numbers = [1, 2, 3, 4];
numbers.groupBy((num, index, arr) => {
  return num % 2 === 0 ? 'even' : 'odd';
});

Also, here's a similar functionality that I'm relying on heavily in my projects which groups an array of objects that have a similar field:

import _ from 'lodash';
import * as Objects from './helpers/objects.js';

const { get } = _;

/**
 * groups by one or multiple fields
 *
 * @param {Array} items - The array of items to be grouped.
 * @param {Array<String>} fields - The array of fields to group items by.
 */
function groupBy (items, fields) {
  const groups = {};

  items.forEach((object) => {
    const groupValues = fields.map(field => get(object, field));
    const groupKey = Objects.safeJSONStringify(groupValues);

    groups[groupKey] = groups[groupKey] || [];

    groups[groupKey].push(object);
  });

  return Object.values(groups);
}

export default groupBy;

Should we create a null-prototype object?

Should the object returned by groupBy() have a null prototype? It could help add some safety when using the object returned by .groupBy() with unknown keys. Some of the value of this would be lessened if we provide a .groupByMap() function, as being discussed in #3, but it would still be useful either way.

Grouping via iterator methods?

Given how the input is processed, I’m wondering if the grouping operations wouldn’t make more sense as iterator methods. Then they could be used for non-Arrays too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.