Comments (6)
NDArray
is not iterable (sofor (x of my_ndarray)
andArray.from(myNdArray)
throw errors)
Yes, maybe we should expose something like this:
[Symbol.iterator]() {
return new NDIter(this)
}
NDIter
's implementation of theiterable
protocol seems to iterate over only the linearized indices (i.e.Array.from(new NDIter(my_ndarray))
gives[0,1,...]
, even if the ndarray is multidimensional)
Yes, this is by design since most of the functions in core/ are using the linearised indices. The coordinates are currently accessible using iter.coords
. This might not be intuitive. I am open to discussing changing the API if can result in improved DX.
NDIter
has properties that are length 32 (V_MAXDIMS
) even when the array it's iterating over has fewer dimensions.
Yeah, I actually meant to change a while back but if I remember correctly it caused issues with NDMultiIter
. Might give it another shot soon.
NDIter
has extraneous properties which obscure how to use it.
Yeah, the documentation can be improved here. It is heavily inspired by numpy (see https://numpy.org/doc/stable/dev/internals.code-explanations.html#n-d-iterators).
NDIter
's string representation uses the namef
(I think because it is derived from minified code) and is excessively verbose.
Hmm, this is weird. Should be fixed.
from vectorious.
NDIter
's implementation of theiterable
protocol seems to iterate over only the linearized indices (i.e.Array.from(new NDIter(my_ndarray))
gives[0,1,...]
, even if the ndarray is multidimensional)Yes, this is by design since most of the functions in core/ are using the linearised indices. The coordinates are currently accessible using
iter.coords
. This might not be intuitive. I am open to discussing changing the API if can result in improved DX.
The linearized indices are not very useful. If that's what you really want, you can typically just iterate over a.data.keys()
. I figure if your array has a shape, you probably intend to use it, not just treat the data as one-dimensional.
Yeah, the documentation can be improved here. It is heavily inspired by numpy (see https://numpy.org/doc/stable/dev/internals.code-explanations.html#n-d-iterators).
In Python, nditer
is awkward to use and it doesn't translate well at all to JavaScript.
I'm thinking something along the lines of:
// if you need something fancy, yield an object handle to each element.
// note I wrote this so that the handles remain valid after the iterator advances.
// It might make sense to keep one handle and update it destructively.
NDArray.prototype.iter = function* () {
const ndim = this.shape.length
const data = this.data
const it = new NDIter(this)
while (!it.done()) {
const offset = it.pos
yield {
offset,
byteOffset: offset * data.BYTES_PER_ELEMENT,
index: it.index,
coords: it.coords.slice(0, ndim),
get: () => data[offset],
set: (value) => { data[offset] = value },
}
it.next()
}
}
// iterate over coordinates and values. I think this should be the default iterator:
NDArray.prototype[Symbol.iterator] = NDArray.prototype.entries = function* () {
for (const v of this.iter()){
yield [v.coords, v.get()];
}
}
I could also see an argument that iteration should be over a single axis, not all axes (e.g. should yield matrix rows).
from vectorious.
What is the performance of Symbol.iterator?
Iteration is a good programming idiom, and theoretically can have good performance, but in practise I don't know when it comes the latest benchmarks. I don't know how much of a function call overhead there is when using iteration, if it is substantial then I expect that over time the JavaScript compilers within engines such as V8 and SpiderMonkey will improve and there will be much less overhead for using this technique.
from vectorious.
What is the performance of Symbol.iterator?
I want to start out by saying this is the wrong question. Benchmarks don't matter if the API does not make it easy to write correct code.
There's a conceptual tension between an axis being:
- Geometric (e.g. i=0 is east, i=1 is north, i=2 is up. You might even have some covariant and some contravariant axes!)
- A grid-structured domain (images, GIS, cellular automata; i=0 is 0 meters, i=100 is 100 meters)
- A completely unstructured domain (particle simulation; i=0 is the 0th particle, i=1000 is the 1000th particle)
It's not clear which of these vectorious is trying to be good at, and what an iterator should expose depends heavily on this! It might even be some combination. For instance, a list of particles might each have some mass and some velocity.
Iteration is a good programming idiom, and theoretically can have good performance, but in practise I don't know when it comes the latest benchmarks. I don't know how much of a function call overhead there is when using iteration, if it is substantial then I expect that over time the JavaScript compilers within engines such as V8 and SpiderMonkey will improve and there will be much less overhead for using this technique.
Anyway, I spent way too much time microbenchmarking and here's what I found:
- The performance of
Symbol.iterator
itself is negligible. The performance of a generator is more overhead, but nothing my code would probably notice.
On a 1000 element Array
on my computer:
-
Looping with
for() data[i]
,for-of
, orreduce
is about 2 microseconds (must be some magic optimization with reduce!) -
Looping with
forEach
or an iterator object is about 7 microseconds -
Looping with a generator (
function *
) is about 40 microseconds. -
If instead of an
Array
, I loop over aFloat32Array
orFloat64Array
,for-of
andreduce
take about 12 microseconds, thoughfor() data[i]
still takes about 2 microseconds.
from vectorious.
To be fair, the iterator is written mainly for internal use. My plan was to write one version in JS and another with same API in C/rust/wasm for performance critical applications, similar to the nblas/nlapack-bindings but life (work) kind of got in the way.
It's an interesting idea to expose different iterators for different purposes.
Also, benchmarking with only 1000 elements might yield confusing results, try 1M for better comparisons.
from vectorious.
but life (work) kind of got in the way.
I kinda figured. It feels like NDArray was an attempt to generalize over vectors and matrices, but it doesn't seem like this library can do much with n>2!
It's an interesting idea to expose different iterators for different purposes.
Yeah. They also have very different memory patterns:
- linear algebra generally needs random access since different axes mix when you apply a linear transformation
- raster data needs local access for computing stuff in some local access
- discrete particles need sequential access to evolve independently (and maybe to update some spacial index or local structure)
I'm sure there's a very rich theory of API design waiting to be discovered!
Also, benchmarking with only 1000 elements might yield confusing results, try 1M for better comparisons.
1M elements is not something I really use. If I had something that big and needed performance, I'd probably (1) use a plain array instead of a typed array or a wrapper class and (2) use the GPU or worker threads to squeeze out that performance. But this was just a curiosity-driven exploration on very synthetic code, and the results did not seem confusing: https://gist.github.com/rotu/799f0608bd3ede48b9f8f1280876fb05
Maybe I could see 1M elements if I had highly structured data (e.g. tensor networks, joint probability distributions, or particle-particle interaction simulations) but this library can't do some things that would be necessary there (in particular, tensor contractions!)
from vectorious.
Related Issues (20)
- Write migration guide for version 5 -> 6
- I'm confused about the transpose of a matrix. HOT 4
- Update copy method to copy underlying data into c-contiguous order
- Add more examples
- Add usage guides/tutorials
- Use files property instead of .npmignore
- Split up NDIter and NDMultiIter into separate files
- Add benchmarks to iterators
- Matrix multiply ignores strides
- SIMD support HOT 2
- will support numpy's numpy.pad function? HOT 1
- Bypass coordinate counting in single iterator if data is C-contiguous
- Implement astype(dtype: DType) for simple type casting
- Support constructing array from JSON
- CodeClimate test reporter ID can be safely put in CI.yml HOT 1
- blas `axpy` test is not meaningful HOT 2
- Docs website is down HOT 1
- Remove old docs HOT 1
- Can't resolve nlapack, nblas and util in node_modules HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vectorious.