supranational / sppark Goto Github PK

Zero-knowledge template library

License: Apache License 2.0

C++ 28.94% Cuda 64.01% Rust 6.94% Shell 0.12%

cuda bls12-377 bls12-381 pasta-curves zero-knowledge zero-knowledge-proofs zk-snarks zk-starks ntt

sppark's Issues

failed to run custom build command for `sppark v0.1.5'

error occurred: Command "nvcc" "-ccbin=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe" "-Xcompiler" "-nologo" "-Xcompiler" "-MD" "-G" "-Xcompiler" "-Z7" "-Xcompiler" "-Brepro" "-I" "D:\hash\aleo\sppark-0.1.5" "-Xcompiler" "-W4" "-o" "D:\hash\aleo\sppark-0.1.5\rust\target\debug\build\sppark-9d1ea8dff2ab2e4d\out\src/lib.o" "-c" "src/lib.cpp" with args "nvcc" did not execute successfully (status code exit code: 2).
windows10 ,nvcc=12.3，gcc = 13.2, rustc = 1.75.0 gpu= rtx4060,

NTT curve support

// Maximum domain size supported. Can be adjusted at will, but with the
// target field in mind. Most fields handle up to 2^32 elements, BLS12-377
// can handle up to 2^47, alt_bn128 - 2^28...

I noted that there are some comments about NTT implementation in curve alt_bn128, but it's currently undone. Is there anything special (parameter) to cope with compared with BLS_377 and BLS_381? Will this part of implementation be published soon?

missing headers in `ff` module

# include <blst_t.hpp> in sppark/ff/bls12-381.hpp fails as this file is nowhere defined.

a new complile error in gpu_t.cuh file when the cuda is 11.8

The following warnings were emitted during compilation:

warning: In file included from /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh:13,
warning: from src/lib.cpp:2:
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh: In member function ‘void event_t::wait(cudaStream_t)’:
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh:45:50: error: too few arguments to function ‘cudaError_t cudaStreamWaitEvent(cudaStream_t, cudaEvent_t, unsigned int)’
warning: { CUDA_OK(cudaStreamWaitEvent(stream, event)); }
warning: ^
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/exception.cuh:13:24: note: in definition of macro ‘CUDA_OK’
warning: cudaError_t code = expr;
warning: ^~~~
warning: In file included from /usr/include/channel_descriptor.h:61,
warning: from /usr/include/cuda_runtime.h:95,
warning: from src/lib.cpp:1:
warning: /usr/include/cuda_runtime_api.h:2240:58: note: declared here
warning: extern host cudart_builtin cudaError_t CUDARTAPI cudaStreamWaitEvent(cudaStream_t stream, cudaEvent_t event, unsigned int flags);
warning: ^~~~~~~~~~~~~~~~~~~
warning: In file included from /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh:13,
warning: from src/lib.cpp:2:
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh: In member function ‘void* stream_t::Dmalloc(size_t) const’:
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh:72:17: error: ‘cudaMallocAsync’ was not declared in this scope
warning: CUDA_OK(cudaMallocAsync(&d_ptr, sz, stream));
warning: ^~~~~~~~~~~~~~~
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/exception.cuh:13:24: note: in definition of macro ‘CUDA_OK’
warning: cudaError_t code = expr;
warning: ^~~~
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh:72:17: note: suggested alternative: ‘cudaMallocArray’
warning: CUDA_OK(cudaMallocAsync(&d_ptr, sz, stream));
warning: ^~~~~~~~~~~~~~~
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/exception.cuh:13:24: note: in definition of macro ‘CUDA_OK’
warning: cudaError_t code = expr;
warning: ^~~~
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh: In member function ‘void stream_t::Dfree(void*) const’:
warning: /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/sppark-0.1.6/sppark/util/gpu_t.cuh:76:17: error: ‘cudaFreeAsync’ was not declared in this scope
warning: { CUDA_OK(cudaFreeAsync(d_ptr, stream)); }

missing definition of basic types

For instance, the type vec384 misses a definition

sppark/ff/bls12-381.hpp

Line 82 in fffd734

static const vec384 BLS12_381_P = {

fork safety issue

I am trying to use Ray (https://github.com/ray-project/ray) to wrap the RISC Zero prover, which internally uses sppark. An error that I encountered is as follows:

(async_join_segment_receipts pid=2928) *** SIGSEGV received at time=1706179489 on cpu 0 *** [repeated 3x across cluster]
(async_join_segment_receipts pid=2928) PC: @     0x7f79a0fd870c  (unknown)  select_gpu() [repeated 3x across cluster]
(async_join_segment_receipts pid=2928)     @     0x7f7edbcb2090  (unknown)  (unknown) [repeated 2x across cluster]
(async_join_segment_receipts pid=2928)     @     0x7f79a0fd57d0        160  sppark_init [repeated 3x across cluster]
(async_join_segment_receipts pid=2928)     @     0x7f79a09a6529  (unknown)  risc0_zkp::hal::cuda::CudaHal$LT$CH$GT$::new::hfb16181478125328 [repeated 4x across cluster]
(async_join_segment_receipts pid=2928) [2024-01-25 02:44:49,148 E 2928 2928] logging.cc:361: *** SIGSEGV received at time=1706179489 on cpu 0 *** [repeated 3x across cluster]
(async_join_segment_receipts pid=2928) [2024-01-25 02:44:49,148 E 2928 2928] logging.cc:361: PC: @     0x7f79a0fd870c  (unknown)  select_gpu() [repeated 3x across cluster]
(async_join_segment_receipts pid=2928) [2024-01-25 02:44:49,148 E 2928 2928] logging.cc:361:     @     0x7f7edbcb2090  (unknown)  (unknown) [repeated 2x across cluster]
(async_join_segment_receipts pid=2928) [2024-01-25 02:44:49,148 E 2928 2928] logging.cc:361:     @     0x7f79a0fd57d0        160  sppark_init [repeated 3x across cluster]
(async_join_segment_receipts pid=2928) [2024-01-25 02:44:49,148 E 2928 2928] logging.cc:361:     @     0x7f79a09a6529  (unknown)  risc0_zkp::hal::cuda::CudaHal$LT$CH$GT$::new::hfb16181478125328 [repeated 4x across cluster]
(async_join_segment_receipts pid=2928) Fatal Python error: Segmentation fault [repeated 3x across cluster]
(async_join_segment_receipts pid=2928)  [repeated 3x across cluster]
(async_join_segment_receipts pid=2928) Stack (most recent call first): [repeated 3x across cluster]
(async_join_segment_receipts pid=2928)   File "ray_test.py", line 29 in async_join_segment_receipts [repeated 3x across cluster]
(async_join_segment_receipts pid=2928)   File "/home/ubuntu/.pyenv/versions/pyo3/lib/python3.8/site-packages/ray/_private/worker.py", line 847 in main_loop [repeated 3x across cluster]
(async_join_segment_receipts pid=2928)   File "/home/ubuntu/.pyenv/versions/pyo3/lib/python3.8/site-packages/ray/_private/workers/default_worker.py", line 282 in <module> [repeated 3x across cluster]
packages/ray/_private/workers/default_worker.py", line 282 in <module> [repess cluster]

The expectation is that Ray forks processes. And therefore, if a parent process initializes gpus already, the child would use the ones that are already initialized by the parent. However, NVIDIA doesn't work when this happens---the contexts and many other data structures are somewhat specific to the processes.

I am still checking if this is the reason of the error happening.

Upcoming error in rust crate

Revealed in 06564db, though the error might have been there before (perhaps the macro was unused).

huitseeker@tencendur➜tmp/sppark/rust(main)» cargo +1.60.0 check                                                                                                                                                                     [16:42:30]
   Compiling libc v0.2.147
   Compiling either v1.8.1
   Compiling cc v1.0.79
   Compiling which v4.4.0
   Compiling sppark v0.1.3 (/home/huitseeker/tmp/sppark/rust)
error[E0308]: mismatched types
  --> src/lib.rs:26:59
   |
26 |             let c_str = unsafe { std::ffi::CStr::from_ptr(str.as_ptr()) };
   |                                                           ^^^^^^^^^^^^ expected `u8`, found `i8`
   |
   = note: expected raw pointer `*const u8`
              found raw pointer `*mut i8`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `sppark` due to previous error
huitseeker@tencendur➜tmp/sppark/rust(main)» cargo +1.68.2 check                                                                                                                                                                     [16:42:49]
   Compiling sppark v0.1.3 (/home/huitseeker/tmp/sppark/rust)
error[E0308]: mismatched types
   --> src/lib.rs:26:59
    |
26  |             let c_str = unsafe { std::ffi::CStr::from_ptr(str.as_ptr()) };
    |                                  ------------------------ ^^^^^^^^^^^^ expected `u8`, found `i8`
    |                                  |
    |                                  arguments to this function are incorrect
    |
    = note: expected raw pointer `*const u8`
               found raw pointer `*mut i8`
note: associated function defined here
   --> /home/huitseeker/.rustup/toolchains/1.68.2-aarch64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ffi/c_str.rs:256:25
    |
256 |     pub const unsafe fn from_ptr<'a>(ptr: *const c_char) -> &'a CStr {
    |                         ^^^^^^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `sppark` due to previous error
huitseeker@tencendur➜tmp/sppark/rust(main)» cargo +1.70.0 check                                                                                                                                                                     [16:42:58]
   Compiling sppark v0.1.3 (/home/huitseeker/tmp/sppark/rust)
error[E0308]: mismatched types
  --> src/lib.rs:26:59
   |
26 |             let c_str = unsafe { std::ffi::CStr::from_ptr(str.as_ptr()) };
   |                                  ------------------------ ^^^^^^^^^^^^ expected `*const u8`, found `*mut i8`
   |                                  |
   |                                  arguments to this function are incorrect
   |
   = note: expected raw pointer `*const u8`
              found raw pointer `*mut i8`
note: associated function defined here
  --> /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ffi/c_str.rs:260:25

For more information about this error, try `rustc --explain E0308`.
error: could not compile `sppark` (lib) due to previous error

The msm results are different between VariableBaseMSM::multi_scalar_mul with multi_scalar_mult_arkworks when generated scalars’ lengths are not divisible by 32

@dot-asm

I noticed the sppark code on accelerating msm on GPU, it's great. but I met a problem in that the case msm_correctness failed after changing the generated points length.

the command is : cargo test --features=bls12_377

the resuts is:

the code is:

fn msm_correctness() {
    let test_npow = std::env::var("TEST_NPOW").unwrap_or("15".to_string());
    let npoints_npow = i32::from_str(&test_npow).unwrap();

    let (points, scalars) =
        **//util::generate_points_scalars::<G1Affine>(1usize << npoints_npow);
        util::generate_points_scalars::<G1Affine>(237);**

    let msm_result = multi_scalar_mult_arkworks(points.as_slice(), unsafe {
        std::mem::transmute::<&[_], &[BigInteger256]>(scalars.as_slice())
    })
    .into_affine();

    let arkworks_result =
        VariableBaseMSM::multi_scalar_mul(points.as_slice(), unsafe {
            std::mem::transmute::<&[_], &[BigInteger256]>(scalars.as_slice())
        })
        .into_affine();

    assert_eq!(msm_result, arkworks_result);
}

Implementing Display for error

Currently the cuda errors impl From<Error> for String. If instead it would implement Display (see https://rust-lang.github.io/api-guidelines/interoperability.html?highlight=display#error-types-are-meaningful-and-well-behaved-c-good-err for more information), you could e.g. also call err.to_string() or even use it like println!("error was : {}", err).

If I should provide a pull request, let me know.

Clean up MSM poc

This repository contains a proof of concept for using the MSM functionality from Rust. It's in a directory called blst-cuda, although the crate is called blst-msm. That poc only works with arkworks types, not with blst(rs) types. All this is confusing. Hence I propose:

renaming blst-cuda to arkworks-msm
renaming the crate from blst-msm to arkworks-msm
create a new crate called blst-msm that is within a directory of the same name. That crate would work with blst types.

This issue was triggered by #4.

A question about Montgomery mult (form ff/mont_t.cuh)

When i observed the implementation of Montgomery multiplication over finite fields, I'm very confused about

static inline void mad_n_redc(uint32_t even[], uint32_t odd[],
                                    const uint32_t a[], uint32_t bi, bool first=false)
        {
            if (first) {
                mul_n(odd, a+1, bi);
                mul_n(even, a,  bi);
            } else {
                asm("add.cc.u32 %0, %0, %1;" : "+r"(even[0]) : "r"(odd[1]));
                madc_n_rshift(odd, a+1, bi);
                cmad_n(even, a, bi);
                asm("addc.u32 %0, %0, 0;" : "+r"(odd[n-1]));
            }

            uint32_t mi = even[0] * M0;

            cmad_n(odd, MOD+1, mi);
            cmad_n(even, MOD,  mi);
            asm("addc.u32 %0, %0, 0;" : "+r"(odd[n-1]));
        }

third to last row
cmad_n(odd, MOD+1, mi);

As you commented on the last line in the function cmad_n “”// return carry flag“”
But obviously the carry in the first “cmad_n” function will not be received by the second “cmad_n” function
Why is the overflow not considered?

If you think overflow must not happen here, please tell me why as well，THANKs

how to understand the implementation of `reduce(uint32_t temp[4])`

the rust code below is quite easy to understand. i.e reduce x_hi_hi, then x_hi_lo.

fn reduce128(x: u128) -> GoldilocksField {
    let (x_lo, x_hi) = split(x); // This is a no-op
    let x_hi_hi = x_hi >> 32;
    let x_hi_lo = x_hi & EPSILON;

    let (mut t0, borrow) = x_lo.overflowing_sub(x_hi_hi);
    if borrow {
        branch_hint(); // A borrow is exceedingly rare. It is faster to branch.
        t0 -= EPSILON; // Cannot underflow.
    }
    let t1 = x_hi_lo * EPSILON;
    let t2 = unsafe { add_no_canonicalize_trashing_input(t0, t1) };
    GoldilocksField(t2)
}

however the below cuda inline assembly code of 128 reduction is not easy to understand; possible to provide some docs? thanks

1    inline void reduce(uint32_t temp[4])
 2   {
3        uint32_t carry;
4 # if __CUDA_ARCH__ >= 700
5        asm("sub.cc.u32 %0, %0, %3; subc.cc.u32 %1, %1, %4; subc.u32 %2, 0, 0;"
6            : "+r"(temp[0]), "+r"(temp[1]), "=r"(carry)
7            : "r"(temp[2]), "r"(temp[3]));
8        asm("add.cc.u32 %0, %0, %2; addc.u32 %1, %1, %3;"
9            : "+r"(temp[1]), "+r"(carry)
10            : "r"(temp[2]), "r"(temp[3]));
11
 12       asm("mad.lo.cc.u32 %0, %3, %4, %0; madc.hi.cc.u32 %1, %3, %4, %1; addc.u32 %2, 0, 0;"
  13          : "+r"(temp[0]), "+r"(temp[1]), "=r"(temp[2])
 14           : "r"(carry), "r"(gl64_device::W));
 15       asm("mad.lo.cc.u32 %0, %2, %3, %0; madc.hi.u32 %1, %2, %3, %1;"
 16           : "+r"(temp[0]), "+r"(temp[1])
 17           : "r"(temp[2]), "r"(gl64_device::W));
18  # else
19        uint32_t b0, b1;
 20       asm("add.cc.u32 %0, %2, %3; addc.u32 %1, 0, 0;"
 21           : "=r"(b0), "=r"(b1)
 22           : "r"(temp[2]), "r"(temp[3]));
  23      asm("sub.cc.u32 %0, %0, %3; subc.cc.u32 %1, %1, %4; subc.u32 %2, 0, 0;"
24            : "+r"(temp[0]), "+r"(temp[1]), "=r"(carry)
25            : "r"(b0), "r"(b1));
26        asm("add.cc.u32 %0, %0, %2; addc.u32 %1, %1, %3;"
27            : "+r"(temp[0]), "+r"(temp[1])
 28           : "r"(-carry), "r"(carry));
 29       asm("add.cc.u32 %0, %0, %1; addc.u32 %1, 0, 0;"
30            : "+r"(temp[1]), "+r"(temp[2]));

31 #  if __CUDA_ARCH__ >= 700
 32       asm("mad.lo.cc.u32 %0, %2, %3, %0; madc.hi.u32 %1, %2, %3, %1;"
 33           : "+r"(temp[0]), "+r"(temp[1])
  34          : "r"(temp[2]), "r"(gl64_device::W));
35 #  else
 36       asm("add.cc.u32 %0, %0, %2; addc.u32 %1, %1, 0;"
 37           : "+r"(temp[0]), "+r"(temp[1])
 38           : "r"(-temp[2]));
39 #  endif
40 # endif
41        asm("mov.b64 %0, {%1, %2};" : "=l"(val) : "r"(temp[0]), "r"(temp[1]));
 42   }

scalar is not constant

Hi!
I have noticed that scalars argument in the pippenger funciton has a type of const scalar_t*

sppark/msm/pippenger.cuh

Lines 44 to 45 in d967007

 void pippenger(const affine_t* points, size_t npoints, 

 const scalar_t* scalars, bool mont,

but the elements of scalars will be convert to scalar_T and write back inplace in the pippenger function.

sppark/msm/pippenger.cuh

Lines 158 to 176 in d967007

 if (mont) { 

 uint32_t np = (npoints+WARP_SZ-1) & (0U-WARP_SZ); 

 #pragma unroll 1 

 for (uint32_t i = NTHREADS*bid + tid; i < np; i += NTHREADS*NWINS) { 

 scalar_t s = scalars_[i]; 

 s.from(); 

 scalars[i] = s; 

 } 

 cooperative_groups::this_grid().sync(); 

 } else { // if (typeid(scalars) != typeid(scalars_)) { 

 uint32_t np = (npoints+WARP_SZ-1) & (0U-WARP_SZ); 

 #pragma unroll 1 

 for (uint32_t i = NTHREADS*bid + tid; i < np; i += NTHREADS*NWINS) { 

 scalar_t s = scalars_[i]; 

 __syncwarp(); 

 scalars[i] = s; 

 } 

 cooperative_groups::this_grid().sync(); 

 }

After the pippenger function executed, scalars are not the same as before. The type definition of scalars may be a little misleading.

Support halo2curves

May I ask your guys if sppark plans to support halo2curves (github)[https://github.com/privacy-scaling-explorations/halo2curves]

Race condition in msm/sort.cuh

There is a race condition between lines 267 and 278 in the msm/sort.cuh file. If a warp with a larger warpid executes line 278 before the warp with warpid=0 executes line 267, the calculation result will be wrong. In fact, on the ampere architecture, due to the scheduling strategy of the warp scheduler, it is impossible for warp (id>0) to execute line 278 before warp(id=0) executing line 278 and therefore the test always passes. However, there is indeed a logical error.

Compilation error (observed from pasta-msm)

TL;DR: compilation error on a `cargo check`

I upgraded nvcc to 12.3 according to the hints in supranational/pasta-msm@8ccdc45 and got a compilation error on a simple cargo check:

The warnings are several instances of the following warning :

warning: [email protected]: /usr/include/c++/13.2.1/bits/std_function.h: In instantiation of ‘std::function<_Res(_ArgTypes ...)>::_Requires<std::function<_Res(_ArgTypes ...)>::_Callable<_Functor>, std::function<_Res(_ArgTypes ...)>&> std::function<_Res(_ArgTypes ...)>::operator=(_Functor&&) [with _Functor = std::function<void()>&; _Res = void; _ArgTypes = {}; _R
equires<_Callable<_Functor>, std::function<_Res(_ArgTypes ...)>&> = std::function<void()>&; typename std::enable_if<(! std::is_same<typename std::remove_cv<typename std::remove_reference<_Tuple>::type>::type, std::function<_Res(_ArgTypes ...)> >::value), std::decay<_Func> >::type::type = std::function<void()>; typename std::enable_if<(! std::is_same<typename s
td::remove_cv<typename std::remove_reference<_Tuple>::type>::type, std::function<_Res(_ArgTypes ...)> >::value), std::decay<_Func> >::type = std::decay<std::function<void()>&>; typename std::remove_cv<typename std::remove_reference<_Tuple>::type>::type = std::function<void()>&; typename std::remove_reference<_Tuple>::type = std::function<void()>&]’:
warning: [email protected]: /home/huitseeker/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sppark-0.1.5/sppark/util/thread_pool_t.hpp:164:24:   required from here
warning: [email protected]: /usr/include/c++/13.2.1/bits/std_function.h:534:44: error: call of overloaded ‘forward<std::function<void()>&>(std::function<void()>&)’ is ambiguous
warning: [email protected]:   534 |           function(std::forward<_Functor>(__f)).swap(*this);
warning: [email protected]:       |                   ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
warning: [email protected]: /usr/include/c++/13.2.1/bits/move.h:70:1: note: candidate: ‘constexpr _Tp&& std::forward(typename remove_reference<_Functor>::type&) [with _Tp = function<void()>&; typename remove_reference<_Functor>::type = function<void()>&]’
warning: [email protected]:    70 |     forward(typename std::remove_reference<_Tp>::type& __t) noexcept
warning: [email protected]:       | ^   ~~~
warning: [email protected]: /usr/include/c++/13.2.1/bits/move.h:82:1: note: candidate: ‘constexpr _Tp&& std::forward(typename remove_reference<_Functor>::type&&) [with _Tp = function<void()>&; typename remove_reference<_Functor>::type = function<void()>&]’
warning: [email protected]:    82 |     forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
warning: [email protected]:       | ^   ~~~
warning: [email protected]: /usr/include/c++/13.2.1/bits/vector.tcc: In instantiation of ‘std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {std::thread&}; _Tp = std::thread; _Alloc = std::allocator<std::thread>; reference = std::thread&]’:
warning: [email protected]: /usr/include/c++/13.2.1/bits/stl_vector.h:1296:15:   required from ‘void std::vector<_Tp, _Alloc>::push_back(value_type&&) [with _Tp = std::thread; _Alloc = std::allocator<std::thread>; value_type = std::thread]’
warning: [email protected]: /home/huitseeker/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sppark-0.1.5/sppark/util/thread_pool_t.hpp:79:20:   required from here
...

Full output below.

Details

on pasta-msm main at 8ccdc45ca5938168b0a3f67462765926a0da9b40
Full (cargo) compilation output at https://gist.github.com/huitseeker/e22b6f7f21057bfd3f45601724f4c04d

Version info

huitseeker@sleipnir➜huitseeker/tmp/pasta-msm(main)» nvcc --version                                                                                                                                                                   [10:04:07]
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0
huitseeker@sleipnir➜huitseeker/tmp/pasta-msm(main)» gcc --version                                                                                                                                                                    [10:06:00]
gcc (GCC) 13.2.1 20230730
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
huitseeker@sleipnir➜huitseeker/tmp/pasta-msm(main)» uname -a                                                                                                                                                                         [10:06:30]
Linux sleipnir 6.1.24-1-lts #1 SMP PREEMPT_DYNAMIC Thu, 13 Apr 2023 17:22:35 +0000 x86_64 GNU/Linux

what is the definition of `const uint32_t& M0` in mont_t.cuh

any documentation on how to understand the M0 in the Montgomery Reduction? Or which Montgomery Reduction algorithm is implemented? As of my shallow understanding of normal Mongt Implementation. There is no M0 involved. Maybe you are using another trick or algorithm implementation here?

Linker issue with blst-msm

I get an error like

  = note: /usr/bin/ld: /tmp/blstmsmlinkerissue/target/debug/deps/libblst_msm-f154847bd3acdcfa.rlib(blst_msm-f154847bd3acdcfa.blst_msm.14511a1f-cgu.6.rcgu.o): in function `blst_msm::multi_scalar_mult':
          /home/vmx/.cargo/git/checkouts/sppark-766e0a6fee265121/f939f36/poc/blst-cuda/src/lib.rs:49: undefined reference to `mult_pippenger'
          collect2: error: ld returned 1 exit status
          
  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)

when trying to use blst-msm as a dependency. Steps to reproduce (you can copy&paste that into your terminal and execute it directly):

cargo init blstmsmlinkerissue
cd blstmsmlinkerissue
cargo add blst
cargo add blst-msm --git https://github.com/supranational/sppark --features bls12_381
cat > src/main.rs << EOF
use blst::{blst_scalar, blst_p1_affine, blst_fp};

fn main() {                                      
    let bases = vec![blst_scalar{ b: [0; 32] }];
    let exponents = vec![blst_p1_affine{ x: blst_fp { l: [0; 6] }, y: blst_fp { l: [0; 6]} }];
    let point = blst_msm::multi_scalar_mult(&exponents, &bases);                              
    println!("point: {:?}", point);                             
}                                  
EOF
cargo build

I'm on Debian.

WebGPU support

Instead of building for alone, WebGPU provides the a wrapper which would be compatible with vulkan, metal, and more. This also allows the library to be called from the web supporting more web based GPU acceleration, opening more edge processing setups.

Is this something that would be worthwhile to build for sppark?

Sppark error

I am pretty new to this, I was trying to setup a repo which has sppark as a dependency I guess and I am getting the below error

error: failed to run custom build command for `sppark v0.1.2 ([https://github.com/supranational/sppark?rev=48f6ac6#48f6ac60)`](https://github.com/supranational/sppark?rev=48f6ac6#48f6ac60))

Caused by:
  process didn't exit successfully: `/home/aditya/aleo-prover/target/release/build/sppark-329937d95720d5f1/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/ec
  cargo:rerun-if-changed=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/ff
  cargo:rerun-if-changed=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/ntt
  cargo:rerun-if-changed=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/msm
  cargo:rerun-if-changed=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util
  cargo:ROOT=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6
  TARGET = Some("x86_64-unknown-linux-gnu")
  OPT_LEVEL = Some("3")
  HOST = Some("x86_64-unknown-linux-gnu")
  cargo:rerun-if-env-changed=CXX_x86_64-unknown-linux-gnu
  CXX_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXX_x86_64_unknown_linux_gnu
  CXX_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXX
  HOST_CXX = None
  cargo:rerun-if-env-changed=CXX
  CXX = None
  cargo:rerun-if-env-changed=NVCC_x86_64-unknown-linux-gnu
  NVCC_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=NVCC_x86_64_unknown_linux_gnu
  NVCC_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_NVCC
  HOST_NVCC = None
  cargo:rerun-if-env-changed=NVCC
  NVCC = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64-unknown-linux-gnu
  CXXFLAGS_x86_64-unknown-linux-gnu = None
  cargo:rerun-if-env-changed=CXXFLAGS_x86_64_unknown_linux_gnu
  CXXFLAGS_x86_64_unknown_linux_gnu = None
  cargo:rerun-if-env-changed=HOST_CXXFLAGS
  HOST_CXXFLAGS = None
  cargo:rerun-if-env-changed=CXXFLAGS
  CXXFLAGS = None
  cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
  CRATE_CC_NO_DEFAULTS = None
  DEBUG = Some("false")
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  running: "nvcc" "-ccbin=c++" "-Xcompiler" "-O3" "-Xcompiler" "-ffunction-sections" "-Xcompiler" "-fdata-sections" "-Xcompiler" "-fPIC" "-m64" "-I" "/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6" "-Xcompiler" "-Wall" "-Xcompiler" "-Wextra" "-o" "/home/aditya/aleo-prover/target/release/build/sppark-ce7f4abd18d08224/out/src/lib.o" "-c" "--device-c" "src/lib.cpp"
  cargo:warning=In file included from /home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh:9,
  cargo:warning=                 from src/lib.cpp:2:
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh: In member function ‘void* stream_t::Dmalloc(size_t) const’:
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh:29:17: error: ‘cudaMallocAsync’ was not declared in this scope; did you mean ‘cudaMallocArray’?
  cargo:warning=   29 |         CUDA_OK(cudaMallocAsync(&d_ptr, sz, stream));
  cargo:warning=      |                 ^~~~~~~~~~~~~~~
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/exception.cuh:32:24: note: in definition of macro ‘CUDA_OK’
  cargo:warning=   32 |     cudaError_t code = expr;                                \
  cargo:warning=      |                        ^~~~
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh: In member function ‘void stream_t::Dfree(void*) const’:
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh:33:17: error: ‘cudaFreeAsync’ was not declared in this scope; did you mean ‘cudaFreeArray’?
  cargo:warning=   33 |     {   CUDA_OK(cudaFreeAsync(d_ptr, stream));   }
  cargo:warning=      |                 ^~~~~~~~~~~~~
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/exception.cuh:32:24: note: in definition of macro ‘CUDA_OK’
  cargo:warning=   32 |     cudaError_t code = expr;                                \
  cargo:warning=      |                        ^~~~
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh: In constructor ‘pin_t<T>::pin_t(const T*, size_t)’:
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/gpu_t.cuh:230:59: error: ‘cudaHostRegisterReadOnly’ was not declared in this scope; did you mean ‘cudaHostRegisterDefault’?
  cargo:warning=  230 |                                  cudaHostRegisterPortable|cudaHostRegisterReadOnly));
  cargo:warning=      |                                                           ^~~~~~~~~~~~~~~~~~~~~~~~
  cargo:warning=/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6/util/exception.cuh:32:24: note: in definition of macro ‘CUDA_OK’
  cargo:warning=   32 |     cudaError_t code = expr;                                \
  cargo:warning=      |                        ^~~~
  exit status: 1

  --- stderr


  error occurred: Command "nvcc" "-ccbin=c++" "-Xcompiler" "-O3" "-Xcompiler" "-ffunction-sections" "-Xcompiler" "-fdata-sections" "-Xcompiler" "-fPIC" "-m64" "-I" "/home/aditya/.cargo/git/checkouts/sppark-766e0a6fee265121/48f6ac6" "-Xcompiler" "-Wall" "-Xcompiler" "-Wextra" "-o" "/home/aditya/aleo-prover/target/release/build/sppark-ce7f4abd18d08224/out/src/lib.o" "-c" "--device-c" "src/lib.cpp" with args "nvcc" did not execute successfully (status code exit status: 1).


warning: build failed, waiting for other jobs to finish...

gl64, the + result is incorrect when both operands > p

for example, if GL64_PARTIALLY_REDUCED not defined.

let mut a: u64 = 0xffffffff_f0000fff;
let mut b: u64 = 0xffffffff_f0000fff;

the expected output should be 0x1e0001ffc. however, the actual result is 0xffffffffe0001ffe, after reduction, it is 0xe0001ffd, result is not matched

pasta_curve parameters

I am wondering how can I get the parameters of pasta_curve like
forward_roots_of_unity,inverse_roots_of_unity,domain_size_inverse. Because I am based on the pasta curve.

Rust binding planned?

This is more of a question than an issue, but I ask it here to prevent double efforts.

Are there plans to create a Rust binding for sppark, similar to the binding blst has? Or is the plan that Rust users would always create their own custom crate similar to the PoCs or pasta-msm?

Documentation of CUDA MSM limitations

When I call mult_pippenger() from Rust with <= 512 bases I get a segfault: signal: 11, SIGSEGV: invalid memory reference. It would be cool if such limitations would be documented.

vec256 type declaration

Hi,
I am trying to build ntt part of sppark as independent C/C++ project for a project. However, the type declaration/definition for vec256 is no where to be found under sppark/ project. I even search into the codebase of [blst](https://github.com/supranational/blst/tree/master) and still I couldn't find it either.

Building sppark from cargo is no problem at all. However, as I mentioned that I am trying to build the ntt portion independently as C/C++/Cuda project and the type declaration for vec256 is no where to be found which would enable me to be able to successfully build.

I would greatly appreciate your response

sppark/ff/mont_t.cuh(721): error: identifier "i" is undefined

c++ -DSECTOR_SIZE_512MiB -g -O2 -Ideps/spdk-v22.09/include -Ideps/spdk-v22.09/isa-l/.. -Ideps/spdk-v22.09/dpdk/build/include -D__ADX__ -fno-omit-frame-pointer -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -fno-strict-aliasing -march=native -Wformat -Wformat-security -D_GNU_SOURCE -fPIC -fstack-protector -fno-common -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -DSPDK_GIT_COMMIT=4be6d3043 -pthread -Ideps/spdk-v22.09/include -Ideps/spdk-v22.09/isa-l/.. -Ideps/spdk-v22.09/dpdk/build/include -Ideps/sppark -Ideps/sppark/util -Ideps/blst/src -c sealing/supra_seal.cpp -o obj/supra_seal.o -Wno-subobject-linkage
deps/sppark/ff/mont_t.cuh(721): error: identifier "i" is undefined
even[i] = even[i+1] = 0;
^

deps/sppark/ff/mont_t.cuh(729): error: expected a ">"
for (i = 0; i < n; i += 2) {
^

deps/sppark/ff/mont_t.cuh(735): error: expression must be a modifiable lvalue
: "+r"(odd[n+i-1]), "+r"(carry));
^

deps/sppark/ff/mont_t.cuh(735): error: an asm operand must have scalar type
: "+r"(odd[n+i-1]), "+r"(carry));
^

deps/sppark/ff/mont_t.cuh(742): error: expression must be a modifiable lvalue
: "+r"(even[n+i+1]), "+r"(carry));
^

deps/sppark/ff/mont_t.cuh(742): error: an asm operand must have scalar type
: "+r"(even[n+i+1]), "+r"(carry));
^

6 errors detected in the compilation of "pc2/cuda/pc2.cu".
deps/sppark/ff/mont_t.cuh(721): error: identifier "i" is undefined
even[i] = even[i+1] = 0;
^