Comments (6)
Over in LensKit, we take a somewhat different design:
- The valid keys for a sparse vector are fixed when the vector is created
- O(lg n) access & update (keys & values stored in parallel arrays)
- Sorted by key
- Very fast dot products & other pairwise vector-vector operations
The big things that we need are performance-related:
- Low memory use
- Minimize on-the-fly memory allocation
- Fast access/update (we even have an API that allows constant-time updates while iterating over a vector)
- Fast vector-vector operations, even with non-identical key sets
- Handle any
long
as valid keys (including negative values) - Mutable and immutable versions — while it would be possible to get away from this, it is fairly foundational to how we've designed and documented our APIs.
from vectorz.
@elehack thanks for the comments - very interesting. I need to have a think about how much of this we can support in Vectorz. Initial thoughts:
- longs as valid keys are probably OK for sparse vectors, but dense vectors can't be made bigger than int size for obvious reasons
- negative indexes are problematic: we're quite strict on 0... n-1 index ranges in the API contracts. How much of a big deal is this and why is it needed?
- immutable vector versions should be OK, it hasn't been a priority so far but the API is designed to allow it. it does imply a slight performance overhead though - curious as to why you need this?
- all the performance stuff is OK (and matches the overall Vectorz philosophy)
from vectorz.
- Yep, we use int keys for our dense vectors too (although our dense vectors aren't used much yet). And sometimes we map user/item ids down to consecutive indexes to store them in an array or dense vector.
- We support negative keys because we use sparse vectors to store lots of user/item vectors; user/item IDs come from the client-configured data access layer, and we impose no restrictions on that interface aside from the existence of unique long IDs. Most data sources have non-negative IDs, but I've been known to take advantage of that to use negative IDs as synthetic IDs in some test/evaluation code. It might not be a huge deal, or used anywhere, but it would be a notable new restriction on the behavior of the data access abstraction (which has been unchanged in this regard since the earliest releases).
- We use immutable vectors a lot so that classes can be guaranteed that they're allowed to hang on to a vector without copying it and not worry about some other component modifying it. LensKit has lots of decoupled components that don't know what each other are doing, and immutable vectors allow us to statically prohibit a lot of undesirable interactions.
Our vectors have 3 versions with 2 concrete implementations:
- Base class
SparseVector
— abstract, provides the read-only API. Many components that act on sparse vectors take aSparseVector
. If you have a reference to aSparseVector
, the type system says you can't modify it, but does not prohibit another component from modifying it. MutableSparseVector
— concrete mutable implementation.ImmutableSparseVector
— concrete immutable implementation. If a component has one of these, not only is it prohibited from changing it, but it is guaranteed that no other code can change it.
from vectorz.
Got it. The immutable / mutable approach is consistent with what we do in Vectorz, we actually have two distinct tests:
- isMutable : i.e. can the vector be modified at all. If false, you can depend on the vector being immutable.
- isFullyMutable : i.e. does the vector support mutable setting to any value in any position (so e.g. your
MutableSparseVector
would not meet this criteria since the key domain is limited)
Also Vectorz is getting some adoption as the main pure-JVM matrix library in Clojure, so immutability is an important priority.
The negative indexes I think can't be supported with the current Vectorz AVector
API - this is for a whole bunch of reasons: performance, the need to guarantee certain type invariants, the fact that Vectorz supports composite / concatenated vector views etc.
This still leaves open the option of a sparse vector implemented as a wrapper over a Vectorz dense vector. That might actually be a good solution: a lot of performance would come from the underlying dense vector, while the wrapper could take care of the mapping and indexing. Could be something like an enhanced variant of the current SparseIndexedVector
:
from vectorz.
FWIW, I've now added immutable vectors and matrices to the latest development branch of Vectorz. Will be in the 0.25.0 release
from vectorz.
OK we now have a SparseHashedVector
, which takes care of the fully mutable sparse vector case. Should be in 0.27.0
release.
Closing this specific issue as I think the original scope is now covered. Hopefully this solves the mutable sparse case for LensKit as well - if not then should probably be a new issue.
from vectorz.
Related Issues (20)
- Support for long indices HOT 1
- Performance issue (4x) with indexed sparse vectors and distanceSquared HOT 5
- Can't create SparseColumnMatrix HOT 1
- includeIndices causes memory leak when adding many times HOT 4
- Different implementations of innerProduct for row vs column matrices? HOT 2
- Creating sparse matrix in clojure HOT 1
- Slow SparseColumnMatrix innerProduct HOT 5
- Add possibility for sparse default value different than zero HOT 1
- Strange exception with join-along HOT 1
- sparse matrix equality HOT 6
- Will there be future support for single-precision floating point numbers? HOT 1
- Mistake in JoinedArray.get(int x, int y) HOT 2
- How to serialize a mikera.arrayz.INDArray? HOT 2
- Real angles in Vector2 rotateInPlace(), rotate() to copy original HOT 1
- Unable to create NDArray for multidimensional arrays HOT 2
- Equivalent to Numpy amin, argsort? HOT 1
- AVector.angle returns NaN HOT 4
- Negative dimension in NDArray.reshape HOT 1
- `asElementList` on identity matrices stack overflows
- Question about efficiency HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vectorz.