Git Product home page Git Product logo

debox's Introduction

Debox

Overview

Debox provides specialized mutable collections that don't box.

For performance reasons, Debox's types are not compatible with Scala's collections framework (although conversions are possible). You may find that Debox's structures provide more reliable performance than Scala's mutable collections.

Debox is available for Scala 2.11 and 2.12.

(For Scala 2.10, use Debox 0.7.3 or older.)

Set up

If you use SBT to build your project, you can use Debox via the following snippet:

resolvers += Resolver.sonatypeRepo("releases"),

libraryDependencies += "org.spire-math" %% "debox" % "0.8.0"

Debox Types

Buffer

debox.Buffer is an indexed data structure backed by an array which can be appended to. It corresponds to collection.mutable.ArrayBuffer but has better performance due to specialization. It can also wrap instances of Array to provide specialized versions of foreach and map, which are a bit faster.

Buffers can grow internally. Appending, such as += and ++=, and removing and from the end of the buffer, as pop does, will be fast operations. Other operations (like adding to the middle of the buffer with insert) may require internal copying.

Large buffers which have most of their elements removed will still maintain a large underlying array. Use compact to reclaim unnecessary memory in these situations.

Example usage:

import debox.Buffer

val buf = Buffer.empty[Int]
buf += 1
buf += 1
buf += 2
buf.foreach { n => println(n) } // prints 1, 1, 2

val child = buf.copy
child += 3
child(0) = 999
child.toString // Buffer(999, 1, 2, 3)

val buf ++= child
buf.pop
buf.sum  // uses spire

Set

debox.Set corresponds to collection.mutable.Set, but without boxing. The hashing is done via an open addressing scheme with re-hashing, which is derived from the strategy used by Python. See Hashing Strategy for a more complete description.

Sets are required to maintain extra space to ensure fast average lookup times. Sets will tend to use 33-66% of the underlying storage.

Large sets which have most of their elements removed will still maintain a large underlying array. Use compact to reclaim unnecessary memory in these situations.

Example usage:

import debox.Set

val set = Set.empty[Int]
set += 1
set += 1
set += 2
set(0) // false
set(1) // true

val child = buf.copy
child += 3
child += 999
child.size == 4 // true

val other = Set(2, 3, 4)
set &= other
set(1) // false
set(2) // true
set.size == 1 // true

Map

debox.Map corresponds to collection.mutable.Map, but without boxing. As with debox.Set the hashing is done via an open addressing scheme with re-hashing, which is derived from the strategy used by Python. See Hashing Strategy for a more complete description.

Maps are required to maintain extra space to ensure fast average lookup times. Maps will tend to use 33-66% of the underlying storage.

Large maps which have most of their elements removed will still maintain a large underlying array. Use compact to reclaim unnecessary memory in these situations.

Unlike Scala Maps (which store a key and value together as a Tuple2), Debox stores keys and values in separate arrays. This makes iterating over keys or values separately faster, but means that operations which treat a map as a sequence of tuples are slower and/or not supported.

Example usage:

import debox.Map

val m = Map.empty[String, Int]
m("boris") = 1887
m("bela") = 1880
m("bela") = 1882
m.size // 2

m.contains("bela")   // true
m.contains("donald") // false
m.contains(12345)    // compile-time error!

m.get("bela") // Some(1882)
m.get("donald") // None

m("boris")  // 1887
m("bela")   // 1882
m("donald") // debox.KeyNotFoundException

m ++= Map("christopher" -> 1922, "vincent" -> 1911)
m.keysSet // Set(christopher, vincent, bela, boris), order is arbitrary

val b = m.mapValues(year => "born in %d" format year)
b // Map(boris -> born in 1887, bela -> born in 1882, ...)

Hashing Strategy

The hashing used in Set and Map works as follows:

  1. Get the item's hashcode (retrieved by the ## operator) as i.
  2. Mask this by the underlying array's max index to get j.
  3. If slot j is free, use it and return.
  4. Else, re-hash i and repeat.

The re-hashing strategy uses perturbation (initialized to the original hashcode) as well as the current i value. The transition can be expressed as:

i = (i << 2) + i + perturbation + 1
perturbation >>= 5

For a much more detailed treatment, you can read the comments on CPython's dictobject.c here.

Benchmarks

Most of Debox has been developed in tandem with aggressive benchmarking using Caliper and other tools.

The benchmarks can be run from SBT via benchmark/run.

Disclaimers and Provisos

Debox aims to achieve the best possible performance through use of features like specialization, macros, arrays, etc. All other concerns (such as visibility, subtyping relationships, type signatures, etc.) are secondary.

Unlike many Java (and Scala?) projects, Debox is not interested in hiding its internals beyond what is convenient. To aid inlining, most internals are public. This does not mean that users should modify them directly--attempting to manually update the structures could produce non-deterministic effects.

Debox chooses not to provide methods whose implementations are guaranteed to be slow. Rather than trying to provide every possibly useful method, the goal is to provide core functionality which can be implemented efficiently and which plays to the data structure's strengths.

It's possible that in the future Debox will use a system of imports to optionally add "slow" methods which have been left off the current API. Since Debox is not at a 1.0 release yet, the API may change from version to version. Debox makes no source- or binary-compatibility guarantees.

Criticisms, suggestions and patches are all welcome, as are benchmarking results (especially surprising ones)!

Copyright and License

All code is available to you under the MIT license, available at http://opensource.org/licenses/mit-license.php and also in the COPYING file.

Copyright Erik Osheim, 2012-2018.

debox's People

Contributors

adam-wyluda avatar erik-stripe avatar jd557 avatar non avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

debox's Issues

Set.foreach skips last added element

Consider following code:

object Main {
  def main(args: Array[String]) {
    var sz = 1l
    def sum(set: debox.set.Set[Long]) = {
      var r = 0l
      set.foreach(r += _)
      r
    }
    def str(set: debox.set.Set[Long]) = {
      val sb = new StringBuilder
      set.foreach(i => {
        sb.append(i)
        sb.append(' ')
      })
      sb.toString
    }

    do {
      val ts = debox.set.Set.empty[Long]
      var i = 0l
      while (i < sz) {
        ts.add(i)
        i += 1l
      }
      println(sz)
      assert(ts.length == sz)
      println(str(ts)) // missing element
      println("" + sum(ts) + " -- " + (0l until sz).sum + "\n\n") // sum is wrong
      sz *= 2l
    } while (sz < 100)
  }
}

Depend on non/algebra instead of non/spire

It seems that the main reason for the spire dependency is to provide basic typeclass instances so things work out of the box. None of the advanced features of spire are being used.

But spire is quite a heavy dependency, which prevents people from using debox. See https://twitter.com/li_haoyi/status/677986085360689153

I would not want to have the typeclass instances in a separate library. But once spire depends on algebra, debox would only have to depend on algebra? That would reduce the size of the dependencies to something more reasonable.

If the required typeclass instances end up in cats-kernel or whatever, that would be fine too.

concurrent versions

as efficient as the Java Concurrent collections to support large collections

debox.Map.update() speed can be improved

Hi, I've believe that it's possible to improve the speed of debox.Map.update by using a freeBlock accumulator (see my pull request #22) and only updating the map when it finds a block with status == 0, therefore avoiding the !contains check.

However, I think it would be better to remove the !contains check completely and move that responsibility to the remove operation, by stopping the loop only when it finds a block with status == 0. I believe this would be better, as updates are more common than removals IMO.

I'm okay with sending a pull request with this changes, but I would like some feedback on what's approach do you think it's better.

Get -Xmx working correctly for Caliper

I am pretty sure that the benchmark memory settings are no longer being passed to Caliper.

This is a problem because we'd like to test gigantic arrays--currently some of the benchmarks overflow the heap.

javaOptions used to work, but seems to no longer.

Add reset method to collections

Right now clear deallocates the internals. It would be nice to have reset which resets the size without changing allocations.

Map update takes lambda

this would allow potentially atomic updates to an existing value combined with withDefault.

Publish packages

Thanks for this library. Would you consider publishing it for easier consumption?

dedicated small collections

I am finding myself wishing that there were dedicated implementations for small collections, like in the std lib... I have a really big Map containing Sets as values and the values are all really small (average 2 entries)

Implement Buffer.toSeq like `WrappedArray` but with length cut-off

The contract of Iterable[A] does not prescribe an order for the elements; thus methods take Seq[A] parameters to document this fact --- and not especially to prescribe fast linear/indexed access.

As debox.Buffer has its elements ordered, a toSeq array wrapper would be a great addition. Right now, collection.mutable.WrappedArray cannot be used because the underlying debox.Buffer.elems array could be longer than the buffer itself.

Thus I propose to create an ArrayView class implementing Seq, with private fields array, offset and length and a few methods overridden for speed: I'm thinking of head, tail, apply, foreach, iterator, drop, take. The implementation trait for the other methods would have to be carefully chosen from the Scala collections to use linear access.

I can try a proof of concept PR: any interest ?

What about immutable?

a specialized immutable list might be of use. Likewise Vector and HashTrieMap/Set. Especially with concurrency, sometimes the immutable versions are better (when you need to provide snapshotting without copying).

debox.Map.contains boxes (!)

debox.Map[Int, AnyRef].contains(key: Int) boxes, I do not know exactly why, but the performance suffers quite a bit:

untitled

This is on Scala 2.11.2. I wonder if it is the inner tail recursive method that causes specialization to fail.

Compiler warnings in debox

====================== Debox compiler warnings ===============================================================================================================================================================================
| Description                                                                                                                                                                          | Resource     | Path       | Location|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| method Select in trait Trees is deprecated: Use Select(tree, newTermName(name)) instead                                                                                              | Util.scala   | /debox     | line 58 |
| method Block in trait Trees is deprecated: Use the canonical Block constructor, explicitly specifying its expression if necessary. Flatten directly nested blocks manually if needed | Util.scala   | /debox     | line 62 |
| dead code following this construct                                                                                                                                                   | Macros.scala | /debox/map | line 34 |
| dead code following this construct                                                                                                                                                   | Macros.scala | /debox/set | line 17 |
| dead code following this construct                                                                                                                                                   | Macros.scala | /debox/map | line 34 |
| dead code following this construct                                                                                                                                                   | Unset.scala  | /debox     | line 21 |
| dead code following this construct                                                                                                                                                   | Macros.scala | /debox/map | line 16 |
| method Select in trait Trees is deprecated: Use Select(tree, newTermName(name)) instead                                                                                              | Util.scala   | /debox     | line 53 |
| dead code following this construct                                                                                                                                                   | Macros.scala | /debox/set | line 34 |
==============================================================================================================================================================================================================================

withFallback

that would be great, especially in combination with concurrency

Publish a scala 2.11 version

I am using scala 2.11-RC1 in my project, and would like to try debox.
Please publish a 2.11 compatible version.

Thanks!

Custom hashcode strategy

It is mentioned in the README that there is a ## operator for compute hashCode for the Map.

  1. How does this interoperate with the default hashCode method?
  2. If I wanted to override hashCode computation, for example for value classes which are just primitives underneath the hood, how do I do that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.