Comments (12)
I should be able to do it. The hard part is not the code, now, but the tests :)
from medley.
That's not a bad idea. I'd change the default initializer, though, and replace get
with find
in order to account for nil
values.
(defn group-with
([kf cf coll]
(group-with kf cf (cf) coll))
([kf cf init coll]
(reduce (fn [m v]
(let [k (kf v)]
(assoc m k (if-some [kv (find m k)]
(cf init (val kv))
init))))
{}
coll)))
The reason is that aggregation functions typically have a initial value that can be determined by calling the function with zero arguments:
(conj) => []
(+) => 0
(*) => 1
I also wonder, should the kf
be before or after cf
?
(group-with first conj [] coll)
;; or
(group-with conj [] first coll)
I guess following transduce
it would make sense for init
to be third, which would mean cf
would be second, but I'd be interested in hearing your opinion.
from medley.
replace get with find in order to account for nil values
Agreed, that's an improvement.
change the default initializer...aggregation functions typically have a initial value that can be determined by calling the function with zero arguments
While I see the elegance in that approach, I think it would limit the utility of the function. Several of the examples in the top post would no longer work (or would require more complicated collation functions to work), such as the following:
;; Pick winner based on max-key
(group-with first (partial max-key count) data)
;; Pick first found rather than last found like index-by
(group-with first (fn [a _] a) data)
It's true that in most functions with multiple arities, the shorter arities act as shorthands for the longest one, but the proposed function more follows the model of reduce
, where the arities have slightly different functionality. One way to think about it is that the arity-4 form performs aggregation while the arity-3 form performs selection.
Though the arity-2 form of reduce
is often maligned, I think in this case it makes sense to follow that model. The reason reduce/2
is "bad" is because apply
can do everything it does better because all of the interesting binary-ish functions take varargs. With group-with/3
, the collation function is a little like a comparator: it takes 2 things of the same type and picks one. It's more like a classical binary reduction (reduce op [x1 x2 x3 ...] -> x1 op x2 op x3 op ...
)
I also wonder, should the kf be before or after cf?
I haven't thought about it deeply, but to me it seems natural for cf
to be next to init
, and for init
to be next-to-last, which implies the order kf cf init coll
. Which of these feels more natural to you?
- Index by
first
, collating byconj
with initial value[]
- Collate using
conj
, index byfirst
, for collation use the initial value[]
Even disregarding the awkwardness of having to say collate
twice, I feel like the first matches my mental information hierarchy better.
Finally, I'm interested in your thoughts on what to call this function. group-with
? index-with
? collate-by
?
from medley.
I've given it some thought, and I believe you're right regarding the 3-arity form. The parallels between the 3-arity group-with
and the 2-arity reduce
make sense.
Regarding what to call it, group-with
makes some sense. group-by-with
would perhaps be more accurate but doesn't read well. We could also call it collate
or collate-by
- the latter you suggested as well. My inclination is to call it collate-by
in order to fit with group-by
and index-by
, while giving the impression of a more generate purpose function.
from medley.
I agree collate-by
is an accurate description, and correctly puts it in a category with index-by
and group-by
. My only hesitation is, do most potential users know what "collate" means, and do they associate it with the problem they would use the function to solve? Like, if what I knew I was looking for was "a more general group-by" and I ctrl+f'd medley's docs, that wouldn't necessarily turn up collate-by
. But is this a real issue? ¯\_(ツ)_/¯
So what's the next step, should I put a PR together? I confess I don't have a ton of experience with optimizing functions like this, but I'm open to learning.
from medley.
If we add "group-by" to the docstring, it would turn up the right function if you Ctrl-F'd the docs.
Feel free to put together a PR. Take a quick look over the CONTRIBUTING.md document if you can - it's not very long.
In terms of optimising, reduce
is an efficient iterator, so the only efficiency gain I can see beyond that is to use a transient map:
(defn- collate-by* [kf cf initf coll]
(persistent!
(reduce (fn [m v]
(let [k (kf v)]
(assoc! m k (if-let [kv (find m k)]
(cf (val kv) v)
(initf v)))))
(transient {})
coll)))
(defn collate-by
([kf cf coll]
(collate-by* kf cf identity coll))
([kf cf init coll]
(collate-by* kf cf #(cf init %) coll)))
However, now that I write that out, I'm wondering if having an initf
function instead of an init
value would be a more general purpose function. It doesn't follow the pattern of reduce
, but unlike reduce
we can't easily get the first element of each group. That is, we know that (reduce f coll)
is the same as (reduce f (first coll) (rest coll))
, but we can't easily do this for collate-by
.
Transducing functions do have a 1 argument form, i.e. (conj x) => x
, but in a transducer this is for being called on the final result, rather than the initial value, so the purpose is different.
So would this be better?
(defn collate-by
([keyf collatef coll]
(collate-by keyf collatef identity coll))
([keyf collatef initf coll]
(persistent!
(reduce (fn [m v]
(let [k (keyf v)]
(assoc! m k ((if-let [kv (find m k)]
(collatef (val kv) v)
(initf v)))))
(transient {})
coll))))
The advantage of this design is that the 3-arity functionality is a subset of the 4-arity functionality, and it potentially opens up more use-cases (though I'm unsure what they would be).
The disadvantage is that you'd need to write:
(collate-by kf cf #(cf init %) coll)
But most collections have functions for this already, e.g.
(collate-by kf conj vector coll) ;; group in vectors
(collate-by kf conj hash-set coll) ;; group in sets
(collate-by kf conj list coll) ;; group in lists
from medley.
If we add "group-by" to the docstring, it would turn up the right function if you Ctrl-F'd the docs.
Good point, we should do the same with index-by
too.
However, now that I write that out, I'm wondering if having an initf function instead of an init value would be a more general purpose function...
To me it seems like the tradeoff here is between familiarity/clarity and generality. On one hand, people already know how reduce
works, and []
is a little easier on the eyes than vector
. Also, not having to use cf
in two places makes it easier to use anonymous functions. On the other, an init function is more general.
My instinct says that though the initf
approach is more general, if we can't come up with a single usecase for it then the demand for it would be pretty rare, which tips the balance towards familiarity/clarity in my view. The only thing I could come up with was that if there was some collection from a library or something that had no 0-arity constructor, then it would be useful. But that seems pretty exotic, and in every other case, using the 0-arity constructor would suffice.
from medley.
I'll give it some thought - there's no need for a hasty decision - but I'm currently leaning toward initf
.
In either case we'll need a function that uses initf
, as that's the common functionality between the two arities. The only question is whether or not we make that function public, or wrap it in another function that replaces initf
with init
.
My inclination is to avoid wrappers that make the API more familiar, at the cost of distancing ourselves from the underlying implementation. This seems a case where initf
would be "simple" and init
would be "easy", and in general we should favor the former over the latter.
from medley.
I was reminded of this issue today, and figured i'd check in! Shall we get the ball rolling again?
from medley.
Having thought more about the initf
vs init
question, I realized one material advantage of initf
over init
: it supports collating into mutable types (e.g. ArrayList, JS arrays, etc.). I think the use case would be rare, but when you needed it, it would be nice to have it there.
On the other hand, the major inconvenience of having to write out #(cf init %)
I think would come not from writing it out per se, but from having to bind cf
to a name so you could reference it in the cf
and initf
args. I do think that inconvenience is significant though, since it forces you to bind an otherwise small anonymous fn to a name and add another level of nesting to the code.
So I think the important question is: how often would it come up? If a type has a constructor that takes at least one argument, then you can always use the constructor rather than writing out an anonymous fn. All Clojure types have constructors like that (even numbers via *
and +
, yay monoids), and the only one I've seen in the wild that doesn't is this interval set library, which provides empty
and mark
fns but no constructor for non-empty sets. There very well might be more, but they're likely to also be somewhat exotic library-provided types.
So this leaves us with a tradeoff between an uncommon use case and another uncommon use case, but at least now I think the tradeoffs are concrete.
I think there are also two possible compromises in addition to the previously discussed approaches:
- Check the value of
init
withfn?
and wrap it in a fn if it isn't already in one. This could possibly create a moment of confusion if someone tried to aggregate fns (e.g. withcomp
) but this seems so exotic as to not be significant. It's slightly magic, but maybe not so bad? - Use kwargs, i.e.
:init []
or:initf vector
. The newish unified call syntax makes this really nice to do, though users of pre-1.11 Clojure wouldn't have that nicety. I think there's also a midway version of this approach: have a 3-arity that accepts an:initf
kwarg and a 4-arity with a staticinit
arg. That would preserve the convenience of a regular oldinit
arg for the vast majority of usecases, and let users break out theinitf
kwarg in the small number of cases where it was actually needed.
Thoughts?
from medley.
Thanks for the update, and apologies for the delay in getting back to you. After giving it some thought, my preference leans toward initf
implementation. This may be less convenient under certain circumstances, but under most use-cases there should be no difference, and it's the simpler implementation - insofar that the init
version would essentially be a wrapper over the initf
version. I hadn't considered that it would work better with mutable collections, but you're right, and that does seem like another minor point in favour of initf
.
from medley.
Sounds good, do you want me to submit a PR or would you prefer to do it? I think the code is pretty much already in this issue.
from medley.
Related Issues (20)
- Suggested addition: index-of HOT 3
- dissoc-in has unexpected behaviors HOT 3
- Minor Clojure 1.11 Alpha 4 compatibility issue HOT 1
- Release request, to prevent warnings for Clojure 1.11 users HOT 1
- Suggested addition: update-items HOT 9
- Suggested addition: ClojureCLR support HOT 2
- medley/join may occur StackOverflowError in rare cases HOT 3
- Suggestion: template HOT 3
- Suggested addition: mapmap HOT 2
- NPE from assoc-some if map is nil and there is more than one key supplied HOT 1
- `time` as data? HOT 1
- toggle in set HOT 2
- Add conj-some fn HOT 2
- Suggested addition: replace-subvec HOT 1
- Add flip function HOT 4
- Proposal: re-find-groups-seq HOT 3
- Incompatibility between medley/medley and dev.weavejester/medley HOT 1
- Suggested addition: assoc-conj
- insert-nth: ability to use negative numbers HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from medley.