Git Product home page Git Product logo

immutable's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

immutable's Issues

Add NoPacking option for memory code

Aside from LargeBlock, SmallBlock, etc, which pack bits together, there should be a NoPacking option. Using this option, the memory array would be a full (uint32?) block for every item. This may be space-wasteful, but maybe speedy.

Using large memory block, read or write is getting corrupted

Example code; causes an issue with hashmaps with large data sets.

Width is set to 53, because large hashmaps contain significant low-order bytes, and we are attempting to use 64-bit hashes.

src := rand.NewSource(time.Now().UnixNano())
random := rand.New(src)

width := uint32(64 - 11)
max := int64(math.Pow(2.0, float64(width)))

count := uint64(8)
contents := make([]uint64, count)
for i := uint64(0); i < count; i++ {
	contents[i] = uint64(random.Int63n(max))
}

m := AllocateMemories(LargeBlock, width, 8)
for k, v := range contents {
	m.Assign(uint64(k), v)
}

for k, v := range contents {
	result := m.Read(uint64(k))
	if result != v {
		t.Fatalf("At %d\nexpected %064b\nreceived %064b\n", k, v, result)
	}
}

Improvements on Hashmap key-value-pair storage

Current hashmap uses a simple struct to store key-value-pairs:

type bucket struct {
  // ...
  entries []entry
}

type entry struct {
  key   Key
  value Value
}

Key and Value are both interfaces, which means that every entry contains a pointer to the key, rather than the value of the key itself. In the case of simple values like integers, that value could be stored directly in that memory space itself (or less).

If we replace the []entry property in bucket with a byte array, we can read & write directly into it, and even only as much as required (perhaps reducing memory usage).

To review:

  • How to get concrete key instances out of memory? Will we require a re-hydrate method on key?
  • If the key is actually a pointer to a struct or other thing, how does Go prevent the key from being garbage collected?

Sliding hash size

For small hash maps (under 1k? under 10k?), slide down to a 32bit hash value, instead of 64 bit. This might reduce memory footprint.

Introduce MediumBlock memory code

Code currently assumes that all memory allocation for hash map packed hob arrays should be done with 32-bit blocks (i.e, uint32). Introduce implementation for 16-bit (i.e., uint16) blocks.

Introduce List collection

HashMaps are only one type of immutable collection object. We want to allow for an array-like list as well.

Investigate hashing methods

Current performance for Get is between 2 and 7 times slower than native implementation for maps (depending on map size). There are many possible reasons, including the performance of the hashing mechanism.

The current code uses the built-in FNV32a hash methods, as supplied in the hash package. Using the benchmark tools, we can see that...

for i := 0; i < max; i++ {
  hasher := fnv.New32a()
  binary.Write(hasher, binary.LittleEndian, uint32(i))
  hash32 = hasher.Sum32()
}

...takes about 0.02ns / op, where max is 500,000.

Although it seems that there is little to gain here, we might find some slightly better performance by using a custom hashing mechanism.

Should a call which does not alter a collection return a new collection?

Given a hashmap with two elements, and a call to Remove with a key that does not match the collection, should the method return a reference to a new hashmap with identical contents, or a pointer to the same hashmap?

Same behavior for Add, when the key and value are the same as what already exists.

There is an ancillary question of consistency -- what happens if we cannot tell for certain that a collection has or has not changed? In the case of an add, if the value is a complex object, how to we determine its equivalence? Or do we assume that if the value is a pointer, than we simply compare the pointers?

What do other immutable libraries do?

Return list of keys for a hashmap

Given a Hashmap, should have function GetKeys which returns []Key. A nil hash map would return nil, and an empty hash map would return a 0-length array.

Optimize storage of keys and values

In a given bucket, the current code base stores all collection entries as entry, essentially a pair of interface{} objects. If the key or value is a simple data type, (int, bool, etc), storing a reference off to another memory block is a waste.

This can be optimized (for space) by detecting whether the key or value is a simple data type, and reading into and out of the memory block directly.

Ultimately, this could be implemented with a strategy for speed vs. space.

Implement methods of mutating Hashmaps

The current code starts with a given content collection, and the only mutations allowed are with Filter, Map, and Reduce. Should provide methods such as Set, Replace, etc.

Get fails with large data sets

The Get function of the hash map does not respect overflow buckets. If a hash results in more than 8 items in a bucket, then the Get method seems to just return the 8th entry.

Goroutine enabled iterative functions

Add varients of Map, Reduce, etc. such that predicates are called on Goroutines.

Note: must document to not use cross-dependant predicates; cannnot use a predicate which is dependant on the completion of a prior running of the same predicate.

Introduce Options to switch between strategies

Looking forward, it is important to evaluate different strategies for memory implementation, reallocation, etc., even if for purely academic reasons The collection objects should accept a pointer to an Options object which allows the user to select the strategy they want on a per-instance basis.

Investigate whether to remove `defer`

In the core iteration code, HashMap.iterate, we use defer close(ch), which has been shown to add some small execution time. We should perform some benchmarks to determine whether it is sensible to be explicit about the channel closing, rather than using defer.

Collection types

Prioritized collection types:

  • Hashmap

  • List

  • Set

  • Ordered Set

  • Ordered Hashmap

  • Tree?

Behavior for `nil` value

Given a hashmap, is it acceptable to have a key-value pair with a nil value?

If so, what needs to change to support the difference between a missed key and a valid key with a nil value (say, during a Get)?

Provide more out-of-the-box keys

The given IntKey is complete, but does not cover all usage scenarios. We should provide keys for more data types.

Examples:
StringKey (partially done)
DateKey
FloatKey

Improve performance for `GetKeys`

Given a wrapper hashmap returning the list of keys, it receives a list of unsafe.Pointer from the internal hashmap, which it then needs to translate. This can be improved by constructing one array, and passing a callback into the internal hashmap's GetKeys to return one key at a time.

Complete SmallBlock memory code

Code currently assumes that all memory allocation for hash map packed hob arrays should be done with 32-bit blocks (i.e, uint32). Complete (and fix) implementation for 8-bit (i.e., byte) blocks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.