Comments (10)
Numpy and Julia devs keep saying that pairwise summation is much faster than Kahan so I believe it, but this makes me think there's something funny going on. Maybe I need to break out d8 and see what's going on under the hood.
from ndarray-blas-level1.
Yeah, that'd be great! Was thinking of this. First things first, I'm going to add failing tests for generic
ndarrays. Then I think this will be a good addition. I think the time has come to overhaul this library and expand these simple functions with some robustness.
Considerations:
- I think I recall that numpy uses pairwise summation… Ah, there here is… I think… the source takes some effort to trace through, but it looks like they're unrolling into blocks and performing pairwise summation.
- Should the user get any choice in the matter? Maybe
require('ndarray-blas-level1/nrm2.kahan')
? Or maybe that's the default and naive is an option? - There's also the option to optimistically work simd into this. Ideally that's just a fancy extension that doesn't change the API.
At any rate, I think the solution is to add tests which unstable algorithms fail (see: numpy/numpy#2448 for some discussion) and a benchmark to quantify just how much of an impact this has.
from ndarray-blas-level1.
- Yeah, it seems numpy cares about speed, even though pairwise summation is almost as good as Kahan summation. For the "normal" summation, I think it would good to do pairwise summation.
- I do think the user needs the choice to opt for the more expensive Kahan summation. It's that whole "don't use a cannon to kill a mosquito" philosophy.
- I like the option to use SIMD, but I think that's something that could be its own little
ndarray-blas-level1/simd
folder. A separateindex.js
just to make it easier on people writing for non-SIMD-enabled environments.
My thoughts are directed to how we would integrate it into the library and I'm caught in a loop (see what I did there?). We could write a generic function kahanSum.js
and call it all the time, but I heard function calls are somewhat slow (citation needed). Baking it right into the specific function would be faster in some ways, but then it would be hard to use a Duff's device for more speed. I also don't like creating a separate array just to feed into a kahanSum()
function but then we'd have to save some sort of state, which means object creation and calling a whole bunch of reset()
and sum()
functions, which leads me back to the beginning.
from ndarray-blas-level1.
I was trying to do some benchmark tests with pairwise summation vs Kahan, but I got that Kahan is always faster. I'm sure something's wrong in my implementation, but I can't see it. Maybe you can check out @rreusser:
https://gist.github.com/tab58/d230a60a3da2f2ccff428579d28a42b9
from ndarray-blas-level1.
Awesome. Will look ASAP.
from ndarray-blas-level1.
Hmm… yeah, that's surprising. Changing Math.floor(steps / 2)
to steps >> 1
increased the ops/sec from about 45 to 60 for me, so I wonder if there are other spurious floating point ops or additions/subtractions that could be pulled out to speed things up. Other than that, I thought function calls were supposed to be blazing fast, but maybe there's enough overhead to offset the fewer floating point ops.
I wonder if it could be written in such a way to transform the recursion into a stack…
from ndarray-blas-level1.
It's also interesting/surprising that switching kahan from get/set to pointers has virtually no effect. I've seen a fairly dramatic speedup in other contexts (> 1D?).
from ndarray-blas-level1.
Also: I'm sure there are a bunch of node modules that'll do it, but bc
, the arbitrary precision calculator can be an easy way to check extended-precision arithmetic.
from ndarray-blas-level1.
The more I look at it, the more simple it looks and the more unexpected it is that it'd be so much slower…
from ndarray-blas-level1.
@tab58 — see: https://github.com/rreusser/summation-algorithms
Still some more exploration to do there, but some interesting results at least 😄
Edit: Order of operations is slightly wrong. Retooled. Pairwise splits the tree in two. My non-recursive version just groups terms pairwise and adds up the sums. That means the result is ever so slightly different, but it's not clear to me that the accuracy is really any different.
from ndarray-blas-level1.
Related Issues (13)
- Add complex support. HOT 1
- Use cwise as a transform HOT 2
- Implement iamax (argmax) HOT 2
- Prevent overflow in nrm2 HOT 1
- Check for simplifications HOT 1
- Replace cwise with explicit loops
- Oops. Images ended up in wrong directory.
- Split into files HOT 4
- Style Guide HOT 5
- Node v4.0 and Beyond HOT 7
- Polyfills HOT 8
- Make work for regular and generic ndarrays
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ndarray-blas-level1.