Git Product home page Git Product logo

Comments (6)

ValarDragon avatar ValarDragon commented on September 19, 2024

A relevant data point is how FFTW does this. They have the caller create a "plan" for FFTs based on the domain size, which then AFAIU does precompilation and preprocessing for the exact domains and FFT plan they end up using. http://www.fftw.org/fftw-paper.pdf

from algebra.

paberr avatar paberr commented on September 19, 2024

That sounds like a good idea to me.

  • Even after fixing the above, there's still the issue that we only select the larger radix after exhausting all powers in radix-2. As suggested in scipr-lab/zexe#156 (comment), this can result in poor "resizing behaviour", which is demonstrated via the following example:
Required size: 10;
GeneralEvaluationDomain: domain_size = 2^4 = 16
DynamicEvaluationDomain: domain_size = 2 * 5 = 10

One question regarding this part: I would imagine that a pure radix-2 FFT is more efficient than a mixed-radix one.
So, when the achievable domain sizes are close, a slightly larger radix-2 FFT might be preferable to a mixed-radix one, no?
I haven't had a look at the FFTW paper yet, but perhaps they do have some insights into the trade-offs here.

from algebra.

UlrichHaboeck75 avatar UlrichHaboeck75 commented on September 19, 2024

My five cents on the smoothness of the FFT domain and its approximation goodness for target sizes x from a given interval I= [x_min, x_max].

Given an FFT domain of mixed order n = p_1^e_1 * ... * p_k^e_k, with p_i being the prime factors and e_i their exponents, and let d(x,L) = min_{y in L} y/x the minimal 'distance' of x by numbers from the discrete set L of all possible subdomain sizes y = p_ 1^{y_ 1}*...*p_k^{y_k} with y_i from {0,1,...,e_i}. In our case the number of such combinations is small enough to find the best approximation via enumeration, and even plots of d(x,L) varying over x are easily produced to get an impression for the uniform approximation goodness

delta = max_{x in I} d(x,L).

In general given I one has to oversize the FFT domain to achieve a reasonable small approximation goodness (corresponding to a relative error < 5%, say) and the more prime factors of n the better. However, having too large p_i increases the computational complexity of the mixed FFT, which is

T(n) = n * Sum_i e_i (pi - 1) (M+A),

where M and A is for field multiplication and addition, respectively. In practice the increase is acceptable (typically a factor of 2 relative to the duadic case, see the table below) and is outweighted by the benefit of small delta.

In connection with the task of searching suitable BLS24 parameter (with group sizes bits of more than 256) I executed a script over a slightly larger set than the one of Pratyush (starting from parameter x = 2^32). I filtered out those cases which allow sufficiently smooth FFT domains in both the base field as well the scalar field (having both log2(n_r) and log(n_q) >= 30 bit and prime factors <= 17). I further computed the goodnesses delta_r and delta_q for the interval I=[2^10, 2^25] using an explicit algorithm to solve the discrete max-min problem, and determined the performances T_r , T_q of their FFT relative to the equivalent duadic case T(duadic)= T(2^log n) using the above formula. These are the results:

x bits log2(n_r) log2(n_q) delta_r delta_q T(n_r)/T(duadic) T(n_q)/T(duadic)
-4346495999 257 41.8051 32.0172 1.00689 1.02262 2.00932 1.68659
-4505419775 257 43.3203 31.2246 1.02218 1.02602 1.66204 1.98561
-4518481967 257 30.9689 31.2908 1.02354 1.02963 1.74369 1.78966
4796508718 258 32.5132 31.1912 1.01425 1.02354 1.84541 1.82744
-4975851464 258 36.664 30.6266 1.0104 1.05488 1.85468 2.12234
-5032487474 258 48.4004 34.2411 1.01089 1.02001 2.0661 1.51864
-5131697570 259 39.3547 30.067 1.00782 1.04167 2.28689 2.46117
-5155487999 259 34.2635 32.2635 1.0111 1.02262 1.80951 1.85969
5372915626 259 41.4331 32.8062 1.01169 1.0242 2.26872 2.31663
-5482229024 259 49.8397 31.5071 1.01705 1.03822 2.00643 2.25346
-5712214364 260 32.277 32.1718 1.02262 1.02392 1.85891 2.33123

from algebra.

Pratyush avatar Pratyush commented on September 19, 2024

Thanks for the detailed analysis! It's a good point that we can easily compute the best fit for a given problem size. If I'm understanding correctly, the take away is that FFTs are at most 2x slower, right?

As an aside, there is a slight downside to consider fields of size > 255 bits: you have to add an another limb to all calculations on the scalar field, which includes constraint generation, witness generation, FFTs, etc. I don't know of a way to simplify this, beyond making a different BigInteger type (for example: BigInt264([u64; 4], u8), and even then, I don't know if it would make a concrete difference due to alignment issues). (I'm assuming that these curves still have base field < 320 bits, because otherwise one would lose the benefit of using BLS24 curves)

from algebra.

UlrichHaboeck75 avatar UlrichHaboeck75 commented on September 19, 2024

Yes, if one considers such small approximation goodness (2% rel. error) for that instance interval (with upper bound 2^25) one has to expect about a twice as slow FFT.
Actually it turned out that my script was starting at a too high x, searching only curves of min 257 bit. I now restarted it with modified range for x, and post my findings in few days.

About the sizes of the base field modulus, the first three rows in the table have 320 bit modulus, the others 321-323 bits.

from algebra.

UlrichHaboeck75 avatar UlrichHaboeck75 commented on September 19, 2024

Yes, if one considers such small approximation goodness (2% rel. error) for that instance interval (with upper bound 2^25) one has to expect about a twice as slow FFT.
Actually it turned out that my script was starting at a too high x, searching only curves of min 257 bit. I now restarted it with modified range for x, and post my findings in few days.

About the sizes of the base field modulus, the first three rows in the table have 320 bit modulus, the others 321-323 bits.

first results from the running script:
Screenshot from 2020-11-04 13-11-02

In particular the last with smoothness bound 11 for both domains is interesting. It's approximation goodness comes with an rel. error < 1.7%, and the increase of FFT performance is only about 1.5.

from algebra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.