Git Product home page Git Product logo

Comments (6)

mateogianolio avatar mateogianolio commented on June 2, 2024

NDArray is not iterable (so for (x of my_ndarray) and Array.from(myNdArray) throw errors)

Yes, maybe we should expose something like this:

[Symbol.iterator]() {
  return new NDIter(this)
}

NDIter's implementation of the iterable protocol seems to iterate over only the linearized indices (i.e. Array.from(new NDIter(my_ndarray)) gives [0,1,...], even if the ndarray is multidimensional)

Yes, this is by design since most of the functions in core/ are using the linearised indices. The coordinates are currently accessible using iter.coords. This might not be intuitive. I am open to discussing changing the API if can result in improved DX.

NDIter has properties that are length 32 (V_MAXDIMS) even when the array it's iterating over has fewer dimensions.

Yeah, I actually meant to change a while back but if I remember correctly it caused issues with NDMultiIter. Might give it another shot soon.

NDIter has extraneous properties which obscure how to use it.

Yeah, the documentation can be improved here. It is heavily inspired by numpy (see https://numpy.org/doc/stable/dev/internals.code-explanations.html#n-d-iterators).

NDIter's string representation uses the name f (I think because it is derived from minified code) and is excessively verbose.

Hmm, this is weird. Should be fixed.

from vectorious.

rotu avatar rotu commented on June 2, 2024

NDIter's implementation of the iterable protocol seems to iterate over only the linearized indices (i.e. Array.from(new NDIter(my_ndarray)) gives [0,1,...], even if the ndarray is multidimensional)

Yes, this is by design since most of the functions in core/ are using the linearised indices. The coordinates are currently accessible using iter.coords. This might not be intuitive. I am open to discussing changing the API if can result in improved DX.

The linearized indices are not very useful. If that's what you really want, you can typically just iterate over a.data.keys(). I figure if your array has a shape, you probably intend to use it, not just treat the data as one-dimensional.

Yeah, the documentation can be improved here. It is heavily inspired by numpy (see https://numpy.org/doc/stable/dev/internals.code-explanations.html#n-d-iterators).

In Python, nditer is awkward to use and it doesn't translate well at all to JavaScript.

I'm thinking something along the lines of:

// if you need something fancy, yield an object handle to each element.
// note I wrote this so that the handles remain valid after the iterator advances.
// It might make sense to keep one handle and update it destructively.
NDArray.prototype.iter = function* () {
	const ndim = this.shape.length
	const data = this.data
        const it = new NDIter(this)
	while (!it.done()) {
		const offset = it.pos
		yield {
                        offset,
			byteOffset: offset * data.BYTES_PER_ELEMENT,
			index: it.index,
			coords: it.coords.slice(0, ndim),
			get: () => data[offset],
			set: (value) => { data[offset] = value },
		}
		it.next()
	}
}

// iterate over coordinates and values. I think this should be the default iterator:
NDArray.prototype[Symbol.iterator] = NDArray.prototype.entries = function* () {
        for (const v of this.iter()){
                yield [v.coords, v.get()];
        }
}

I could also see an argument that iteration should be over a single axis, not all axes (e.g. should yield matrix rows).

from vectorious.

metabench avatar metabench commented on June 2, 2024

What is the performance of Symbol.iterator?

Iteration is a good programming idiom, and theoretically can have good performance, but in practise I don't know when it comes the latest benchmarks. I don't know how much of a function call overhead there is when using iteration, if it is substantial then I expect that over time the JavaScript compilers within engines such as V8 and SpiderMonkey will improve and there will be much less overhead for using this technique.

from vectorious.

rotu avatar rotu commented on June 2, 2024

What is the performance of Symbol.iterator?

I want to start out by saying this is the wrong question. Benchmarks don't matter if the API does not make it easy to write correct code.

There's a conceptual tension between an axis being:

  1. Geometric (e.g. i=0 is east, i=1 is north, i=2 is up. You might even have some covariant and some contravariant axes!)
  2. A grid-structured domain (images, GIS, cellular automata; i=0 is 0 meters, i=100 is 100 meters)
  3. A completely unstructured domain (particle simulation; i=0 is the 0th particle, i=1000 is the 1000th particle)

It's not clear which of these vectorious is trying to be good at, and what an iterator should expose depends heavily on this! It might even be some combination. For instance, a list of particles might each have some mass and some velocity.

Iteration is a good programming idiom, and theoretically can have good performance, but in practise I don't know when it comes the latest benchmarks. I don't know how much of a function call overhead there is when using iteration, if it is substantial then I expect that over time the JavaScript compilers within engines such as V8 and SpiderMonkey will improve and there will be much less overhead for using this technique.

Anyway, I spent way too much time microbenchmarking and here's what I found:

  • The performance of Symbol.iterator itself is negligible. The performance of a generator is more overhead, but nothing my code would probably notice.

On a 1000 element Array on my computer:

  • Looping with for() data[i], for-of, or reduce is about 2 microseconds (must be some magic optimization with reduce!)

  • Looping with forEach or an iterator object is about 7 microseconds

  • Looping with a generator (function *) is about 40 microseconds.

  • If instead of an Array, I loop over a Float32Array or Float64Array, for-of and reduce take about 12 microseconds, though for() data[i] still takes about 2 microseconds.

from vectorious.

mateogianolio avatar mateogianolio commented on June 2, 2024

To be fair, the iterator is written mainly for internal use. My plan was to write one version in JS and another with same API in C/rust/wasm for performance critical applications, similar to the nblas/nlapack-bindings but life (work) kind of got in the way.

It's an interesting idea to expose different iterators for different purposes.

Also, benchmarking with only 1000 elements might yield confusing results, try 1M for better comparisons.

from vectorious.

rotu avatar rotu commented on June 2, 2024

but life (work) kind of got in the way.

I kinda figured. It feels like NDArray was an attempt to generalize over vectors and matrices, but it doesn't seem like this library can do much with n>2!

It's an interesting idea to expose different iterators for different purposes.

Yeah. They also have very different memory patterns:

  • linear algebra generally needs random access since different axes mix when you apply a linear transformation
  • raster data needs local access for computing stuff in some local access
  • discrete particles need sequential access to evolve independently (and maybe to update some spacial index or local structure)

I'm sure there's a very rich theory of API design waiting to be discovered!

Also, benchmarking with only 1000 elements might yield confusing results, try 1M for better comparisons.

1M elements is not something I really use. If I had something that big and needed performance, I'd probably (1) use a plain array instead of a typed array or a wrapper class and (2) use the GPU or worker threads to squeeze out that performance. But this was just a curiosity-driven exploration on very synthetic code, and the results did not seem confusing: https://gist.github.com/rotu/799f0608bd3ede48b9f8f1280876fb05

Maybe I could see 1M elements if I had highly structured data (e.g. tensor networks, joint probability distributions, or particle-particle interaction simulations) but this library can't do some things that would be necessary there (in particular, tensor contractions!)

from vectorious.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.