I'm trying to loop over an NDArray coordinates, and it seems like iteration needs an o

NDArray is not iterable (so <code class=

What is the performance of Symbol.iterator? <p dir="aut

Iteration is not usable about vectorious HOT 6 OPEN

rotu commented on June 2, 2024

Iteration is not usable

from vectorious.

Comments (6)

mateogianolio commented on June 2, 2024

NDArray is not iterable (so for (x of my_ndarray) and Array.from(myNdArray) throw errors)

Yes, maybe we should expose something like this:

[Symbol.iterator]() {
  return new NDIter(this)
}

NDIter's implementation of the iterable protocol seems to iterate over only the linearized indices (i.e. Array.from(new NDIter(my_ndarray)) gives [0,1,...], even if the ndarray is multidimensional)

Yes, this is by design since most of the functions in core/ are using the linearised indices. The coordinates are currently accessible using iter.coords. This might not be intuitive. I am open to discussing changing the API if can result in improved DX.

NDIter has properties that are length 32 (V_MAXDIMS) even when the array it's iterating over has fewer dimensions.

Yeah, I actually meant to change a while back but if I remember correctly it caused issues with NDMultiIter. Might give it another shot soon.

NDIter has extraneous properties which obscure how to use it.

Yeah, the documentation can be improved here. It is heavily inspired by numpy (see https://numpy.org/doc/stable/dev/internals.code-explanations.html#n-d-iterators).

NDIter's string representation uses the name f (I think because it is derived from minified code) and is excessively verbose.

Hmm, this is weird. Should be fixed.

from vectorious.

rotu commented on June 2, 2024

NDIter's implementation of the iterable protocol seems to iterate over only the linearized indices (i.e. Array.from(new NDIter(my_ndarray)) gives [0,1,...], even if the ndarray is multidimensional)

Yes, this is by design since most of the functions in core/ are using the linearised indices. The coordinates are currently accessible using iter.coords. This might not be intuitive. I am open to discussing changing the API if can result in improved DX.

The linearized indices are not very useful. If that's what you really want, you can typically just iterate over a.data.keys(). I figure if your array has a shape, you probably intend to use it, not just treat the data as one-dimensional.

Yeah, the documentation can be improved here. It is heavily inspired by numpy (see https://numpy.org/doc/stable/dev/internals.code-explanations.html#n-d-iterators).

In Python, nditer is awkward to use and it doesn't translate well at all to JavaScript.

I'm thinking something along the lines of:

// if you need something fancy, yield an object handle to each element.
// note I wrote this so that the handles remain valid after the iterator advances.
// It might make sense to keep one handle and update it destructively.
NDArray.prototype.iter = function* () {
	const ndim = this.shape.length
	const data = this.data
        const it = new NDIter(this)
	while (!it.done()) {
		const offset = it.pos
		yield {
                        offset,
			byteOffset: offset * data.BYTES_PER_ELEMENT,
			index: it.index,
			coords: it.coords.slice(0, ndim),
			get: () => data[offset],
			set: (value) => { data[offset] = value },
		}
		it.next()
	}
}

// iterate over coordinates and values. I think this should be the default iterator:
NDArray.prototype[Symbol.iterator] = NDArray.prototype.entries = function* () {
        for (const v of this.iter()){
                yield [v.coords, v.get()];
        }
}

I could also see an argument that iteration should be over a single axis, not all axes (e.g. should yield matrix rows).

from vectorious.

metabench commented on June 2, 2024

What is the performance of Symbol.iterator?

Iteration is a good programming idiom, and theoretically can have good performance, but in practise I don't know when it comes the latest benchmarks. I don't know how much of a function call overhead there is when using iteration, if it is substantial then I expect that over time the JavaScript compilers within engines such as V8 and SpiderMonkey will improve and there will be much less overhead for using this technique.

from vectorious.

rotu commented on June 2, 2024

What is the performance of Symbol.iterator?

I want to start out by saying this is the wrong question. Benchmarks don't matter if the API does not make it easy to write correct code.

There's a conceptual tension between an axis being:

Geometric (e.g. i=0 is east, i=1 is north, i=2 is up. You might even have some covariant and some contravariant axes!)
A grid-structured domain (images, GIS, cellular automata; i=0 is 0 meters, i=100 is 100 meters)
A completely unstructured domain (particle simulation; i=0 is the 0th particle, i=1000 is the 1000th particle)

It's not clear which of these vectorious is trying to be good at, and what an iterator should expose depends heavily on this! It might even be some combination. For instance, a list of particles might each have some mass and some velocity.

Iteration is a good programming idiom, and theoretically can have good performance, but in practise I don't know when it comes the latest benchmarks. I don't know how much of a function call overhead there is when using iteration, if it is substantial then I expect that over time the JavaScript compilers within engines such as V8 and SpiderMonkey will improve and there will be much less overhead for using this technique.

Anyway, I spent way too much time microbenchmarking and here's what I found:

The performance of Symbol.iterator itself is negligible. The performance of a generator is more overhead, but nothing my code would probably notice.

On a 1000 element Array on my computer:

Looping with for() data[i], for-of, or reduce is about 2 microseconds (must be some magic optimization with reduce!)
Looping with forEach or an iterator object is about 7 microseconds
Looping with a generator (function *) is about 40 microseconds.
If instead of an Array, I loop over a Float32Array or Float64Array, for-of and reduce take about 12 microseconds, though for() data[i] still takes about 2 microseconds.

from vectorious.

mateogianolio commented on June 2, 2024

To be fair, the iterator is written mainly for internal use. My plan was to write one version in JS and another with same API in C/rust/wasm for performance critical applications, similar to the nblas/nlapack-bindings but life (work) kind of got in the way.

It's an interesting idea to expose different iterators for different purposes.

Also, benchmarking with only 1000 elements might yield confusing results, try 1M for better comparisons.

from vectorious.

rotu commented on June 2, 2024

but life (work) kind of got in the way.

I kinda figured. It feels like NDArray was an attempt to generalize over vectors and matrices, but it doesn't seem like this library can do much with n>2!

It's an interesting idea to expose different iterators for different purposes.

Yeah. They also have very different memory patterns:

linear algebra generally needs random access since different axes mix when you apply a linear transformation
raster data needs local access for computing stuff in some local access
discrete particles need sequential access to evolve independently (and maybe to update some spacial index or local structure)

I'm sure there's a very rich theory of API design waiting to be discovered!

Also, benchmarking with only 1000 elements might yield confusing results, try 1M for better comparisons.

1M elements is not something I really use. If I had something that big and needed performance, I'd probably (1) use a plain array instead of a typed array or a wrapper class and (2) use the GPU or worker threads to squeeze out that performance. But this was just a curiosity-driven exploration on very synthetic code, and the results did not seem confusing: https://gist.github.com/rotu/799f0608bd3ede48b9f8f1280876fb05

Maybe I could see 1M elements if I had highly structured data (e.g. tensor networks, joint probability distributions, or particle-particle interaction simulations) but this library can't do some things that would be necessary there (in particular, tensor contractions!)

from vectorious.

Iteration is not usable about vectorious HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent