Git Product home page Git Product logo

cats-collections's Introduction

Cats-Collections

Build Status codecov.io Maven Central Discord

It intends to be a library containing data structures which facilitate pure functional programming in the Scala programming language. Some of these are replacements for structures already present in the Scala standard library, but with improvements in safety, some are data structures for which there is no analogue in the Scala standard library.

Getting Started

Cats Collections is currently available for Scala 2.12, 2.13, and 3.

To get started with sbt, simply add the following to your build.sbt file (you may find the latest version via a badge on top of this page or on release page):

libraryDependencies += "org.typelevel" %% "cats-collections-core" % "<version>"

CONTRIBUTING

Contributions in all forms (including discussion, suggestions, bug reports, usage reports, pull requests, patches, etc) are welcome and encouraged from everyone!

The cats-collection project demands that all activities surrounding the project follow the Typelevel Code of Conduct. If you witness any behavior from any individual which you feel might be harmful in any way or might not be a violation of the code of conduct, please speak up. Please contact any individual named in the Typelevel Code of Conduct so that it can be addressed as soon as possible.

cats-collections's People

Contributors

alistair-johnson avatar anicolaspp avatar armanbilge avatar catostrophe avatar ceedubs avatar christopherdavenport avatar danicheg avatar djspiewak avatar durban avatar erwan avatar gemelen avatar gitter-badger avatar gyfarkas-test avatar jiminhsieh avatar johnynek avatar kailuowang avatar larsrh avatar ljoublanc avatar mkhl avatar non avatar osleonard avatar pfcoperez avatar ritschwumm avatar satorg avatar scala-steward avatar stew avatar typelevel-steward[bot] avatar vlachjosef avatar xplosunn avatar xuwei-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cats-collections's Issues

Eq[AvlSet[A]]

Curiously, there appear to be no instances for AvlSet.

improve how we add ranges to Diet

right now in Diet, if you add a range of values, it adds them one by one. So adding a range is O(n) on the size of the range, but we should be able to make this O(log n) on the size of the tree.

If you think about it, adding 0 - 100000 to an empty diet should just be a single operation, right now it is 100001 separate operations

Adding Bounded to enhance Enum?

I was wondering whether Bounded like the one in monocle would be a typeclass which should belong within dogs since Enum exists and it would be useful to be able to call a variant of the succ method which returned an Option if at the maximum/minimum for user defined enumerated types.

It's worth noting that in scalaz the Bounded and Enum typeclass are one and the same which is another way of adding the bounds.

something is wrong with codecov

Our coverage is not as bad as codecov makes it seem like.

If you run the coverage locally with sbt coverage test coverageReport and look at the report in core/target you'll see that we're doing much better than the website leads on.

One obvious difference is that if you look at the website there are a bunch of files such as Set.scala that the web site says we have zero coverage for, but the local report says we have more like 90% coverage

Removing from Diets doesn't work

scala> Diet.empty[Int].addRange(Range(1,2)).addRange(Range(4,5)).removeRange(Range(0,6))
res4: cats.collections.Diet[Int] = DietNode(Range(4,5),EmptyDiet,EmptyDiet)

Correct result should be an empty diet.

Clarify LICENSE

It says "Apache 2" in the build.sbt, but there's no LICENSE file in the repository.

implement union in Diet

There is currently a merge in Diet, but it doesn't work for overlapping Diet's, it is only used internally when a range is deleted. There is currently a commented out test that shows that it won't work for overlapping ranges

convert Diet into a balanced tree.

I found that Scalaz Diev has lower inserts/deletions than our Diet, but is quite faster in searches. We can balance Diet so we improve search, yet, it might compromise inserts/deletions.

The original Diet paper doesn't talk about balancing it out, but will it be something we should consider?

efficient merges for Heaps

we can always merge two heaps by taking the smaller of the two and adding each element. This would be n log n where n is the size of the smaller heap. This totally discards the structure of the smaller one.

It seems like we should be able to leverage the heap property to more efficiently merge (say O(log n) would be ideal).

The original paper this is based on does not have a merge operation, but there may be other heap algorithms or small changes we can make to this implementation to support merge.

use discipline for law checking

Although we don't have things like Functor and Monad, we still have things like map and flatMap, we should be using discipline to encode and check laws that say, for example, the our map methods should still be compositional, and our flatMap methods should respect associativity.

Expose range as a first class concept

Diet has a Range type that it uses intenerally. We could move this out from inside Diet and expose it as a first class data type as a replacement for Range from the scala standard library

dogs becomes typelevel/cats-collections

@stew would you consider donating dogs to become a new cats-collections project under typelevel org? There have been interests in having more collection implementations available for the Cats ecosystem (latest typelevel/cats#2463). It would be amazing to have cats-collections start from dogs' code base. What do you think?

Heap add and mergeChildren up have dead code, likely bug

looking at the code coverage I noticed I can't reach this code:

else if (left.size < (1 >> right.height) - 1)

  def add(x: A)(implicit order: Order[A]): Heap[A] =
    if (isEmpty)
      Heap(x, Leaf(), Leaf())
    else if (left.size < (1 >> right.height) - 1)
      bubbleUp(min, left.add(x), right)
    else if (right.size < (1 >> right.height) - 1)
      bubbleUp(min, left, right.add(x))

note 1 >> x is either 0 or 1. So (1 >> x) - 1 is -1 or 0. But size >= 0 so size < (1 >> x) - 1 is never true. So those branches are dead.

Similarly with:

else if (l.size < (1 >> l.height) - 1) {
in mergeChildren.

It seems like have log N height (I added a test for that) currently, so one fix could just be to remove these branches, but it may be good to go back to the paper and see what they were intended to do.

looking at: https://arxiv.org/pdf/1312.4666.pdf

page 4, it seems to just be a mistranscription of math.pow(2, height) which should be 1 << height not 1 >> height.

cc @anicolaspp @larsrh

DietMap

I found Diet, which seems quite nice, but I'm particularly interested in a DietMap, which would allow you to store values associated with each range.

dietMap.addRange((1, 10), myCaseClass)
def intersection(range): Seq[V]

Traverse[Streaming] is eager

import cats._
import cats.implicits._

import dogs._

object TestStream extends App {
  val stream = Streaming.continually(1)
  val trav = stream.traverse[Id, Int](identity)
}

runs forever.

stdlib Stream in scalaz executes immediately.

Related: typelevel/cats#2075

complete migration

  • move root package to cats.collections
  • s/dogs/cats-collections/ in build and documentation
  • we might not want to start cats-collections with all existing dogs collections. e.g. DList (now we have Chain in cats), Streaming which we have more mature solution in the ecosystem , etc. cc @mpilquist. We can create a legacy package or module and move those that we are no so sure in.
  • make sure the new project website works

Heap could allow a Long size

Since we are building a tree, we could use a Long as the size, as we tend to do with cats collections that are not directly array backed (it also matches the Foldable size method).

This is binary breaking.

foldRight implementations should be properly lazy

Currently it looks like many of these types define something like:

def foldRight[B](b: B)(f: (A, B) => B): B = ???

We should update them to instead define:

import cats.Eval
def foldRight[B](lb: Eval[B])(f: (A, Eval[B]) => Eval[B]): Eval[B] = ???

Port scalaz `Tree` and `TreeLoc`

Is this something worthwile? Or would quiver cover this? I think circe ported this as well.
I am not sure I'd be able to get the typeclass dependencies and so on right.

EDIT: Or is it actually already there?

Adding into Diets doesn't work

scala> val d = Diet.empty[Int] + Range(572, 573) + 567 + 570 + Range(569, 574)
scala> d.toList
res30: List[Int] = List(567, 570, 569, 570, 571, 572, 573, 574)

Unused import warning in documentation

On all of the pages of the documentation, there are unused import warnings. For example, on binaryheap, the example looks like this:

scala> h.getMin
<console>:10: warning: Unused import
       import dogs._
                   ^
<console>:13: warning: Unused import
       import cats._
                   ^
<console>:16: warning: Unused import
       import cats.implicits._
                             ^
<console>:19: warning: Unused import
       import dogs.syntax.all._
                              ^
res2: Option[Int] = None
``

Disjoint-Set

Hi,

I re-raised a PR for a DisjointSet, I tidied up the branch, so there is a single commit for the set, with test and doc.

#86
I did an 'sbt clean test tut', and it all runs fine locally.

Hopefully there is home for this within dogs, I think there are a few others that may be useful (and don't go to heavily into the work done in quiver).

--Mike

missing Diet.removeRange

we have a def - which does this, we just need the removeRange method for symmetry with addRange. When we do this, lets make sure to add it to the docs project

Clean up DietSpec

There are a lot of commented-out tests and also it prints some messages which seem to indicate that some tests should be failing.

Heap can have an Order

If we have Order[A] we can make an Order[Heap[A]] in the worst case by linearizing to List, but I think just popping the min until we find the first difference should work.

A purely functional Set implementation using `Eq`

I noticed that the distinct implementations in cats data types are using on Scala's TreeSet and the Order typeclass. But what about a purely functional set based on Eq using only boolean logic?

Upsides:

Downsides:

  • Ironically, could not derive an Eq instance for the set itself since it's based on a function?
  • GC pressure due to copying of case classes
  • No way to inspect elements, since it is only optimized for checking containment
  • No way to check size

I wrote up a basic implementation (https://gist.github.com/nasadorian/8f10cc1f89225a932238026cf46e57a7) using just a case class and Eq.

Would like to improve it and use more lawful constructs - I am thinking a better impl could depend on some Contravariant Functor like Predicate since it is "containing" a function, but I'm not totally sure how to make that work.

The Set itself could have Monoid and Semigroup instances easily. I would also like to figure out if it's possible to make it a Functor.

Cheers!

Heap.heapify appears to be O(N^2)

https://github.com/typelevel/cats-collections/blob/master/core/src/main/scala/cats/collections/Heap.scala#L71

This method is claimed (without proof) to be O(N). But it is calling in a loop .length and accessing elements from a list, both of which are O(N) operations.

In the very best case I think this can be O(N log N), but I think in fact it is O(N^2).

I think converting the input List into a mutable java Array[A] would fix the performance issue since we don't need to update that list.

Incidentally, you could also widen the input type to Iterable[A] I think since we should copy into an O(1) indexable structure in the method.

That would regain the O(N) complexity I believe (or again, maybe O(N log N) in the worst case).

cc @anicolaspp

Diet Remove Range

The current implementation of Diet does not delete ranges, but item by item. I should be able to delete a range efficiently (without iterating for each item).

Disjoint-set

Hi,
I have an initial (requires some tidy up, more tests, more detail on the documentation) implementation of a disjoint-set, here: https://github.com/lewismj/dogs/tree/feature/disjoint-set
Whilst it is a very simple data-structure, it is useful as a building-block to some graph structures/functions. And, I thought it would be a good starting point as I'd like to see if I could add to the dogs library.

I'd appreciate it if you could do a quick code review and let me know if (with a bit of cleanup) this is something that I could raise a PR for ?

Thanks,
Mike

Create PDF documentation.

PDF documentation would be useful when there's no internet connection.

I've created some scripts which work well enough for cats, dogs and fetch, as shown below:
https://github.com/frgomes/debian-bin/blob/master/bash_30pdf.sh
https://github.com/frgomes/debian-bin/blob/master/bash_30httrack.sh
https://github.com/frgomes/debian-bin/blob/master/bash_31makepdf_cats.sh
https://github.com/frgomes/debian-bin/blob/master/bash_31makepdf_dogs.sh
https://github.com/frgomes/debian-bin/blob/master/bash_31makepdf_fetch.sh

I'm not suggesting you guys adopt these scripts.
However, anyone willing to do that may eventually find some pointers here.

Specialization warning message

This code:

object Discrete {
  implicit def integralDiscrete[@specialized(Specializable.Integral) I](
      implicit I: Integral[I]): Discrete[I] =
  new Discrete[I] {
    import Integral.Implicits._
    override def succ(x: I): I = x + I.one
    override def pred(x: I): I = x - I.one
  }
}

produces the following warning:

[warn] /home/lars/proj/typelevel/cats-collections/core/src/main/scala/cats/collections/Discrete.scala:26:16: type I is unused or used in non-specializable positions.
[warn]   implicit def integralDiscrete[@specialized(Specializable.Integral) I](
[warn]                ^

Some Refactorings

  • Remove DList as Chain supersedes it
  • Rename Enum as folks will think it means an enumeration ala Scala 3 enums
  • Rename Set and Map to AvlSet and AvlMap to avoid name collision with stdlib
  • Rename Range to avoid name collision with stdlib
  • Remove Streaming in favor of fs2, monix, and iteratees
  • Remove the birds package as thrush and kestrel aren't really related to collections

Originally posted by @mpilquist in #105 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.