Git Product home page Git Product logo

Comments (4)

baihacker avatar baihacker commented on August 19, 2024

Thanks. The fix is applied.

I also integrate ntl (Windows: WinNTL-11_4_1.zip) to pe: pe_ntt_ntl

A static library of ntl is built with toptions: Makefile

Here is the test result

from pe.

 avatar commented on August 19, 2024

Thank you. (... I don't know what makes a difference to NTL timings 😢 )

My environment:

  • OS: Linux version 4.15.0-65-generic
  • CPU: Ryzen 7 2700x
  • Compiler: g++ 7.4.0 (-std=c++14 -O3 -mtune=native -march=native -lntl)

I used NTL 11.3.3 and the following code:

#include <cstdio>
#include <NTL/lzz_pX.h>

using namespace std;

using i64 = int64_t;

int main() {
  for (const i64 mod : {i64(100019), i64(1000003), i64(1000000007), i64(100000000003), i64(316227766016779)}) {
    NTL::zz_p::init(mod);
    NTL::zz_pX f, g, h;
    for (int N : {int(1e6), (1 << 20) - 1, (1 << 20), int(1e7)}) {
      NTL::random(f, N + 1);
      NTL::random(g, N + 1);
      // for (int i = 0; i <= N; ++i) f[i] = NTL::zz_p(1);
      // for (int i = 0; i <= N; ++i) g[i] = NTL::zz_p(1);
      clock_t a = clock();
      NTL::mul(h, f, g);
      clock_t b = clock();
      // printf("%ld\n", NTL::rep(h[N]));
      printf("N: %d mod: %ld time: %.10f\n", N, mod, double(b - a) / CLOCKS_PER_SEC);
    }
  }
}

NTL timings:

N: 1000000 mod: 100019 time: 0.2667860000
N: 1048575 mod: 100019 time: 0.2642880000
N: 1048576 mod: 100019 time: 0.3078360000
N: 10000000 mod: 100019 time: 3.3209330000
N: 1000000 mod: 1000003 time: 0.2582430000
N: 1048575 mod: 1000003 time: 0.2583760000
N: 1048576 mod: 1000003 time: 0.2997540000
N: 10000000 mod: 1000003 time: 3.1903780000
N: 1000000 mod: 1000000007 time: 0.2548150000
N: 1048575 mod: 1000000007 time: 0.2521720000
N: 1048576 mod: 1000000007 time: 0.2927100000
N: 10000000 mod: 1000000007 time: 3.1535760000

N: 1000000 mod: 100000000003 time: 0.2718860000
N: 1048575 mod: 100000000003 time: 0.2825600000
N: 1048576 mod: 100000000003 time: 0.3074430000
N: 10000000 mod: 100000000003 time: 3.2640600000
N: 1000000 mod: 316227766016779 time: 0.3868790000
N: 1048575 mod: 316227766016779 time: 0.3805880000
N: 1048576 mod: 316227766016779 time: 0.4547470000
N: 10000000 mod: 316227766016779 time: 4.8863610000

My poly_mul_mod timings: (60-bit NTT x 2 + CRT)

N: 1000000 mod: 100019 time: 0.303038
N: 1048575 mod: 100019 time: 0.300431
N: 1048576 mod: 100019 time: 0.591726
N: 10000000 mod: 100019 time: 5.190098
N: 1000000 mod: 1000003 time: 0.290746
N: 1048575 mod: 1000003 time: 0.290084
N: 1048576 mod: 1000003 time: 0.573480
N: 10000000 mod: 1000003 time: 5.129372
N: 1000000 mod: 1000000007 time: 0.294748
N: 1048575 mod: 1000000007 time: 0.290667
N: 1048576 mod: 1000000007 time: 0.571745
N: 10000000 mod: 1000000007 time: 5.132078

from pe.

baihacker avatar baihacker commented on August 19, 2024

Reason:

  1. I choose a big integer based version in ntl, and your code is based on native word version.
  NTL::ZZ tmp(0);
  ZZFromBytes(tmp, reinterpret_cast<const unsigned char*>(&mod), sizeof(T));
  NTL::ZZ_p::init(tmp);

  NTL::ZZ_pX x, y, z;
  x.SetLength(n);
  y.SetLength(m);
  ...
  1. I use openmp to speed up in ntt_min25::poly_mul_ntt_internal

Some test result:

  1. The result of your code running on my machine (larger N or larger mod is not support due to building environment):
N: 1000000 mod: 100019 time: 0.2410000000
N: 1048575 mod: 100019 time: 0.2420000000
N: 1048576 mod: 100019 time: 0.2800000000
N: 1000000 mod: 1000003 time: 0.2400000000
N: 1048575 mod: 1000003 time: 0.2450000000
N: 1048576 mod: 1000003 time: 0.2770000000
N: 1000000 mod: 1000000007 time: 0.2400000000
N: 1048575 mod: 1000000007 time: 0.2420000000
N: 1048576 mod: 1000000007 time: 0.2760000000
  1. Modified ntt test (more moduli are added), openmp is enabled
mod = 100019
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.000  0.000  0.003  0.005  0.012  0.030  0.062  0.163  0.322
flint p  0.002  0.002  0.004  0.007  0.015  0.029  0.060  0.122  0.254  0.533  1.101
ntt32 s  0.001  0.002  0.003  0.007  0.016  0.033  0.050  0.105  0.217  0.422  0.862
ntt32 l  0.002  0.002  0.003  0.008  0.017  0.035  0.054  0.117  0.232  0.460  0.920
ntt64 s  0.001  0.003  0.004  0.012  0.026  0.055  0.082  0.165  0.337  0.682  1.383
ntt64 l  0.002  0.003  0.005  0.012  0.027  0.055  0.081  0.174  0.352  0.711  1.441
Min_25 s 0.000  0.000  0.000  0.001  0.002  0.004  0.006  0.012  0.026  0.054  0.115
Min_25 l 0.000  0.000  0.001  0.002  0.003  0.007  0.009  0.017  0.031  0.061  0.130
libbf    0.000  0.001  0.002  0.005  0.010  0.023  0.047  0.074  0.150  0.312  0.653
ntl      0.001  0.001  0.002  0.004  0.009  0.018  0.038  0.076  0.156  0.315  0.640
mod = 1000003
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.001  0.001  0.003  0.006  0.014  0.028  0.081  0.155  0.317
flint p  0.001  0.001  0.003  0.007  0.014  0.030  0.063  0.130  0.260  0.527  1.091
ntt32 s  0.000  0.002  0.004  0.008  0.015  0.033  0.050  0.106  0.211  0.421  0.853
ntt32 l  0.000  0.001  0.003  0.007  0.017  0.035  0.053  0.117  0.222  0.454  0.916
ntt64 s  0.002  0.003  0.006  0.011  0.026  0.053  0.081  0.166  0.336  0.676  1.380
ntt64 l  0.000  0.003  0.006  0.013  0.026  0.055  0.082  0.173  0.354  0.701  1.442
Min_25 s 0.000  0.000  0.001  0.001  0.001  0.003  0.006  0.013  0.026  0.054  0.113
Min_25 l 0.000  0.001  0.000  0.002  0.003  0.007  0.010  0.017  0.031  0.061  0.129
libbf    0.001  0.000  0.003  0.005  0.009  0.022  0.047  0.073  0.150  0.312  0.661
ntl      0.000  0.001  0.002  0.005  0.008  0.017  0.037  0.076  0.155  0.314  0.644
mod = 1000000007
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.001  0.002  0.004  0.010  0.018  0.046  0.101  0.244  0.519
flint p  0.002  0.003  0.005  0.011  0.021  0.044  0.095  0.191  0.399  0.858  1.749
ntt32 l  0.002  0.002  0.004  0.008  0.018  0.037  0.059  0.117  0.229  0.454  0.917
ntt64 l  0.001  0.003  0.006  0.012  0.026  0.057  0.085  0.178  0.354  0.709  1.441
Min_25 l 0.000  0.000  0.001  0.002  0.004  0.009  0.013  0.020  0.034  0.065  0.135
libbf    0.001  0.001  0.002  0.004  0.009  0.022  0.047  0.077  0.159  0.331  0.701
ntl      0.000  0.001  0.002  0.004  0.008  0.017  0.036  0.075  0.154  0.317  0.645
mod = 100000000003
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.001  0.002  0.005  0.010  0.022  0.045  0.098  0.302  0.648
flint p  0.001  0.003  0.005  0.011  0.020  0.044  0.091  0.191  0.400  0.840  1.742
ntt32 l  0.001  0.002  0.004  0.009  0.017  0.036  0.054  0.116  0.228  0.460  0.921
ntt64 l  0.001  0.003  0.006  0.013  0.027  0.056  0.085  0.175  0.354  0.710  1.440
Min_25 l 0.000  0.000  0.001  0.002  0.004  0.008  0.013  0.020  0.034  0.064  0.135
libbf    0.000  0.001  0.002  0.004  0.009  0.021  0.046  0.077  0.159  0.331  0.693
ntl      0.001  0.001  0.003  0.006  0.012  0.024  0.051  0.108  0.221  0.454  0.924
mod = 316227766016779
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.001  0.002  0.006  0.012  0.029  0.071  0.153  0.299  0.630
flint p  0.002  0.003  0.005  0.011  0.023  0.049  0.105  0.211  0.432  0.920  1.942
ntt64 l  0.002  0.003  0.006  0.013  0.027  0.057  0.086  0.176  0.354  0.712  1.448
Min_25 l 0.000  0.001  0.001  0.002  0.004  0.008  0.013  0.020  0.034  0.066  0.135
libbf    0.001  0.001  0.002  0.004  0.009  0.021  0.047  0.078  0.159  0.332  0.696
ntl      0.001  0.002  0.003  0.007  0.014  0.027  0.055  0.117  0.240  0.492  1.006

openmp is disabled

mod = 100019
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.001  0.001  0.001  0.002  0.006  0.012  0.030  0.062  0.162  0.321
flint p  0.001  0.002  0.003  0.007  0.014  0.028  0.060  0.124  0.255  0.534  1.101
ntt32 s  0.001  0.003  0.007  0.013  0.031  0.063  0.136  0.284  0.598  1.265  2.674
ntt32 l  0.002  0.004  0.010  0.021  0.044  0.096  0.203  0.426  0.893  1.886  4.008
ntt64 s  0.001  0.003  0.006  0.012  0.025  0.054  0.115  0.245  0.515  1.090  2.301
ntt64 l  0.002  0.005  0.011  0.024  0.052  0.110  0.235  0.495  1.044  2.208  4.664
Min_25 s 0.000  0.001  0.000  0.000  0.001  0.003  0.007  0.014  0.028  0.060  0.125
Min_25 l 0.000  0.000  0.000  0.002  0.003  0.007  0.014  0.029  0.061  0.129  0.269
libbf    0.000  0.001  0.002  0.004  0.010  0.022  0.047  0.073  0.151  0.314  0.668
ntl      0.001  0.001  0.002  0.005  0.009  0.019  0.037  0.078  0.157  0.319  0.650
mod = 1000003
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.000  0.001  0.002  0.006  0.014  0.029  0.082  0.153  0.319
flint p  0.000  0.001  0.004  0.006  0.014  0.030  0.062  0.129  0.262  0.529  1.087
ntt32 s  0.002  0.003  0.006  0.014  0.030  0.064  0.135  0.284  0.598  1.265  2.661
ntt32 l  0.002  0.004  0.010  0.021  0.044  0.095  0.203  0.426  0.896  1.898  4.009
ntt64 s  0.001  0.003  0.005  0.011  0.025  0.054  0.115  0.243  0.515  1.088  2.303
ntt64 l  0.002  0.005  0.011  0.024  0.051  0.110  0.234  0.494  1.045  2.208  4.677
Min_25 s 0.000  0.000  0.000  0.001  0.001  0.003  0.007  0.014  0.028  0.060  0.125
Min_25 l 0.000  0.000  0.001  0.001  0.003  0.006  0.014  0.030  0.061  0.129  0.269
libbf    0.000  0.001  0.002  0.005  0.009  0.022  0.047  0.073  0.150  0.313  0.659
ntl      0.000  0.001  0.002  0.004  0.009  0.019  0.037  0.077  0.156  0.321  0.649
mod = 1000000007
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.000  0.000  0.002  0.003  0.010  0.019  0.047  0.101  0.243  0.521
flint p  0.001  0.002  0.005  0.010  0.021  0.044  0.097  0.189  0.399  0.854  1.761
ntt32 l  0.002  0.005  0.010  0.021  0.045  0.096  0.205  0.431  0.900  1.914  4.042
ntt64 l  0.002  0.005  0.011  0.024  0.052  0.112  0.238  0.502  1.054  2.240  4.726
Min_25 l 0.000  0.000  0.001  0.001  0.004  0.009  0.018  0.037  0.077  0.161  0.332
libbf    0.000  0.001  0.002  0.005  0.009  0.022  0.047  0.078  0.161  0.333  0.698
ntl      0.001  0.001  0.002  0.005  0.009  0.017  0.036  0.076  0.158  0.319  0.654
mod = 100000000003
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.000  0.001  0.001  0.002  0.004  0.010  0.022  0.046  0.098  0.303  0.651
flint p  0.001  0.002  0.005  0.010  0.020  0.044  0.092  0.191  0.401  0.844  1.741
ntt32 l  0.002  0.005  0.010  0.021  0.045  0.096  0.204  0.431  0.905  1.914  4.045
ntt64 l  0.003  0.005  0.011  0.024  0.052  0.112  0.238  0.502  1.059  2.239  4.724
Min_25 l 0.000  0.000  0.001  0.002  0.004  0.008  0.018  0.038  0.077  0.160  0.332
libbf    0.000  0.001  0.002  0.005  0.009  0.022  0.052  0.078  0.160  0.333  0.696
ntl      0.001  0.001  0.003  0.006  0.012  0.025  0.051  0.108  0.222  0.455  0.925
mod = 316227766016779
log2(n)  10     11     12     13     14     15     16     17     18     19     20
flint n  0.001  0.001  0.001  0.002  0.006  0.012  0.029  0.072  0.155  0.301  0.636
flint p  0.001  0.003  0.005  0.011  0.023  0.048  0.104  0.209  0.435  0.926  1.934
ntt64 l  0.002  0.005  0.012  0.025  0.052  0.112  0.238  0.503  1.057  2.241  4.734
Min_25 l 0.000  0.000  0.001  0.002  0.004  0.008  0.019  0.038  0.077  0.162  0.340
libbf    0.001  0.000  0.002  0.005  0.009  0.021  0.047  0.078  0.160  0.332  0.699
ntl      0.001  0.001  0.004  0.007  0.013  0.027  0.055  0.119  0.242  0.495  1.010

from pe.

baihacker avatar baihacker commented on August 19, 2024

I changed the compiling flags of ntl, and the performance isimproved by ~5% to ~10%.

Changes:

CXXFLAGS=-O3 --std=c++14
# Flags for the C++ compiler

CXXAUTOFLAGS= -pthread -march=k8-sse3 -mtune=skylake
# Flags for the C++ compiler, automatically generated by configuration script

from pe.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.