Comments (6)
A relevant data point is how FFTW does this. They have the caller create a "plan" for FFTs based on the domain size, which then AFAIU does precompilation and preprocessing for the exact domains and FFT plan they end up using. http://www.fftw.org/fftw-paper.pdf
from algebra.
That sounds like a good idea to me.
- Even after fixing the above, there's still the issue that we only select the larger radix after exhausting all powers in radix-2. As suggested in scipr-lab/zexe#156 (comment), this can result in poor "resizing behaviour", which is demonstrated via the following example:
Required size: 10; GeneralEvaluationDomain: domain_size = 2^4 = 16 DynamicEvaluationDomain: domain_size = 2 * 5 = 10
One question regarding this part: I would imagine that a pure radix-2 FFT is more efficient than a mixed-radix one.
So, when the achievable domain sizes are close, a slightly larger radix-2 FFT might be preferable to a mixed-radix one, no?
I haven't had a look at the FFTW paper yet, but perhaps they do have some insights into the trade-offs here.
from algebra.
My five cents on the smoothness of the FFT domain and its approximation goodness for target sizes x
from a given interval I= [x_min, x_max]
.
Given an FFT domain of mixed order n = p_1^e_1 * ... * p_k^e_k
, with p_i
being the prime factors and e_i
their exponents, and let d(x,L) = min_{y in L} y/x
the minimal 'distance' of x
by numbers from the discrete set L
of all possible subdomain sizes y = p_ 1^{y_ 1}*...*p_k^{y_k}
with y_i
from {0,1,...,e_i}.
In our case the number of such combinations is small enough to find the best approximation via enumeration, and even plots of d(x,L)
varying over x are easily produced to get an impression for the uniform approximation goodness
delta = max_{x in I} d(x,L).
In general given I
one has to oversize the FFT domain to achieve a reasonable small approximation goodness (corresponding to a relative error < 5%, say) and the more prime factors of n
the better. However, having too large p_i
increases the computational complexity of the mixed FFT, which is
T(n) = n * Sum_i e_i (pi - 1) (M+A),
where M
and A
is for field multiplication and addition, respectively. In practice the increase is acceptable (typically a factor of 2 relative to the duadic case, see the table below) and is outweighted by the benefit of small delta
.
In connection with the task of searching suitable BLS24 parameter (with group sizes bits
of more than 256) I executed a script over a slightly larger set than the one of Pratyush (starting from parameter x = 2^32
). I filtered out those cases which allow sufficiently smooth FFT domains in both the base field as well the scalar field (having both log2(n_r)
and log(n_q) >= 30
bit and prime factors <= 17
). I further computed the goodnesses delta_r
and delta_q
for the interval I=[2^10, 2^25]
using an explicit algorithm to solve the discrete max-min problem, and determined the performances T_r
, T_q
of their FFT relative to the equivalent duadic case T(duadic)= T(2^log n)
using the above formula. These are the results:
x | bits | log2(n_r) |
log2(n_q) |
delta_r |
delta_q |
T(n_r)/T(duadic) |
T(n_q)/T(duadic) |
---|---|---|---|---|---|---|---|
-4346495999 | 257 | 41.8051 | 32.0172 | 1.00689 | 1.02262 | 2.00932 | 1.68659 |
-4505419775 | 257 | 43.3203 | 31.2246 | 1.02218 | 1.02602 | 1.66204 | 1.98561 |
-4518481967 | 257 | 30.9689 | 31.2908 | 1.02354 | 1.02963 | 1.74369 | 1.78966 |
4796508718 | 258 | 32.5132 | 31.1912 | 1.01425 | 1.02354 | 1.84541 | 1.82744 |
-4975851464 | 258 | 36.664 | 30.6266 | 1.0104 | 1.05488 | 1.85468 | 2.12234 |
-5032487474 | 258 | 48.4004 | 34.2411 | 1.01089 | 1.02001 | 2.0661 | 1.51864 |
-5131697570 | 259 | 39.3547 | 30.067 | 1.00782 | 1.04167 | 2.28689 | 2.46117 |
-5155487999 | 259 | 34.2635 | 32.2635 | 1.0111 | 1.02262 | 1.80951 | 1.85969 |
5372915626 | 259 | 41.4331 | 32.8062 | 1.01169 | 1.0242 | 2.26872 | 2.31663 |
-5482229024 | 259 | 49.8397 | 31.5071 | 1.01705 | 1.03822 | 2.00643 | 2.25346 |
-5712214364 | 260 | 32.277 | 32.1718 | 1.02262 | 1.02392 | 1.85891 | 2.33123 |
from algebra.
Thanks for the detailed analysis! It's a good point that we can easily compute the best fit for a given problem size. If I'm understanding correctly, the take away is that FFTs are at most 2x slower, right?
As an aside, there is a slight downside to consider fields of size > 255 bits: you have to add an another limb to all calculations on the scalar field, which includes constraint generation, witness generation, FFTs, etc. I don't know of a way to simplify this, beyond making a different BigInteger type (for example: BigInt264([u64; 4], u8)
, and even then, I don't know if it would make a concrete difference due to alignment issues). (I'm assuming that these curves still have base field < 320 bits, because otherwise one would lose the benefit of using BLS24 curves)
from algebra.
Yes, if one considers such small approximation goodness (2% rel. error) for that instance interval (with upper bound 2^25) one has to expect about a twice as slow FFT.
Actually it turned out that my script was starting at a too high x, searching only curves of min 257 bit. I now restarted it with modified range for x, and post my findings in few days.
About the sizes of the base field modulus, the first three rows in the table have 320 bit modulus, the others 321-323 bits.
from algebra.
Yes, if one considers such small approximation goodness (2% rel. error) for that instance interval (with upper bound 2^25) one has to expect about a twice as slow FFT.
Actually it turned out that my script was starting at a too high x, searching only curves of min 257 bit. I now restarted it with modified range for x, and post my findings in few days.About the sizes of the base field modulus, the first three rows in the table have 320 bit modulus, the others 321-323 bits.
first results from the running script:
In particular the last with smoothness bound 11 for both domains is interesting. It's approximation goodness comes with an rel. error < 1.7%, and the increase of FFT performance is only about 1.5.
from algebra.
Related Issues (20)
- Can arm64 also be optimized for field arithmetic operation? HOT 1
- Ark-ec pulls in full hashbrown without feature gate HOT 6
- New release? HOT 4
- Final exponentiation in BLS12-381not producing `(p ^ k - 1) / r` HOT 1
- Support mapping to all Weierstrass curves (Shallue-van de Woestijne method) HOT 1
- why is `Validate::No` hardcoded in some `impl`s? HOT 3
- know if a serialized piece of Arkworks structure has been compressed HOT 3
- Support for ristretto255 / sr25519 curve HOT 3
- `Affine - Projective` produces incorrect results
- `SparseMultillinearExtension::evaluate` is slow HOT 1
- Constant time curve arithmetic HOT 1
- Release with #794 HOT 1
- Necessity to retire "derivative" crate HOT 1
- Deserialization of array can panic HOT 3
- Decoding `BigUInt` causes unbound allocation HOT 3
- Easily set the size of a polynomial coefficient
- Add Mul for DenseMultilinearExtension * Scalar
- GF(2^128) operations HOT 1
- `HashToField` does not follow RFC9380 depending on the curve modulus
- Off one in the doc?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from algebra.