marton78 / pffft Goto Github PK

View Code? Open in Web Editor NEW

240.0 11.0 56.0 334 KB

A fork of Julien Pommier's Pretty Fast FFT (PFFFT) library, with several additions

License: Other

C 74.94% CMake 6.38% Shell 0.32% C++ 18.36%

cpp c fft pffft fft-library

pffft's People

Contributors

Stargazers

Watchers

pffft's Issues

Algorithm verification

Dear People

Thanks for all effort, like this project. I have build the project in VS2019 with cmake (C++11)
I have update internally some project cmake files to work with Visual studio (MVC Compiler):

target_optimizations.cmake:
line 62: set(TARGET_C_ARCH "none" CACHE STRING "msvc target C architecture (/arch): SSE2/AVX/AVX2/AVX512")
changed to: ADD_DEFINITIONS(/arch:AVX2)
Compiler accept only one value. AVX, AVX2 or AVX512
The same for line 109
set(TARGET_CXX_ARCH "none" CACHE STRING "msvc target C++ architecture (/arch): SSE2/AVX/AVX2/AVX512")
ADD_DEFINITIONS(/arch:AVX2)

Peter

My code works without SIMD, but gives the wrong answer when enabled

Hi, thanks for a cool library.

I've been testing my code with TCC (tiny c compiler) where there's no SIMD headers, so I disabled it. It's a Yin Pitch Detection algorithm, and I get a fairly accurate pitch when SIMD is disabled, but once I enable it in GCC and MSVC, the detected pitch is quite far from the non-SIMD pitch. There are no other code using SIMD, and I've verified that the values in the function difference are... different with/without SIMD.

I'm using doubles, but it's the same with floats.

The result is supposed to be as close as possible to the frequency (3398) and without SIMD I get 3443.696594 while SIMD gets me 3793.724228 in both GCC and MSVC. All compilers return 3443.696594 without SIMD.

Can someone tell me what I'm doing wrong?

The code is attached below and is compiled like so:

tcc -Ilib lib/pffft/*.c main.c -o main.exe -D_USE_MATH_DEFINES -DPFFFT_SIMD_DISABLE -D__GNUC__ -D__MINGW32__

gcc -Ilib lib/pffft/*.c main.c -o main.exe -march=native -D_USE_MATH_DEFINES

cl main.c lib/pffft/*.c /Fe:main.exe /I lib

#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <math.h>

#include "pffft/pffft_double.h"

#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif

#define YIN_THRESHOLD 0.20

void sinewave(double frequency, int samplerate, int size, double *output)
{
    int lut_size = size;
    double delta_phi = frequency * lut_size * 1.0 / samplerate;
    double phase = 0.0;
    double min = DBL_MAX;
    double max = -DBL_MAX;
    int *lut = malloc(lut_size * sizeof(int));

    for (int i = 0; i < lut_size; ++i)
        lut[i] = (int)roundf(0x7FFF * sin(2.0 * M_PI * i / lut_size));

    for (int i = 0; i < size; ++i)
    {
        int val = (double)lut[(int)phase];
        max = fmax(max, val);
        min = fmin(min, val);
        output[i] = val;
        phase += delta_phi;
        if (phase >= lut_size)
        phase -= lut_size;
    }

    free(lut);
}

void difference(double *audio_buffer, int audio_buffer_size, double *yin_buffer)
{
    int yin_buffer_size = audio_buffer_size / 2;
    PFFFTD_Setup *setup = pffftd_new_setup(audio_buffer_size, PFFFT_COMPLEX);
    double *data = pffftd_aligned_malloc(2 * audio_buffer_size * sizeof(double));
    double *power_terms = malloc(yin_buffer_size * sizeof(double));
    double *kernel = pffftd_aligned_malloc(2 * audio_buffer_size * sizeof(double));

    for (int j = 0; j < yin_buffer_size; ++j)
        power_terms[0] += audio_buffer[j] * audio_buffer[j];
    
    for (int tau = 1; tau < yin_buffer_size; ++tau)
        power_terms[tau] =
            power_terms[tau-1] - audio_buffer[tau-1] * audio_buffer[tau-1] +
            audio_buffer[tau+yin_buffer_size] * audio_buffer[tau+yin_buffer_size];

    for (int i = 0; i < audio_buffer_size; ++i)
    {
        data[2*i+0] = audio_buffer[i];
        data[2*i+1] = 0;
    }

    pffftd_transform(setup, data, data, 0, PFFFT_FORWARD);

    for (int j = 0; j < yin_buffer_size; ++j)
    {
        kernel[2*j+0] = audio_buffer[(audio_buffer_size / 2 - 1) - j];
        kernel[2*j+1] = 0;
        kernel[2*j+audio_buffer_size+0] = 0;
        kernel[2*j+audio_buffer_size+1] = 0;
    }
    
    pffftd_transform(setup, kernel, kernel, 0, PFFFT_FORWARD);
    
    for (int j = 0; j < audio_buffer_size; ++j)
    {
        data[2*j+0] = data[2*j+0] * kernel[2*j] - data[2*j+1] * kernel[2*j+1];
        data[2*j+1] = data[2*j+1] * kernel[2*j] + data[2*j+0] * kernel[2*j+1];
    }

    pffftd_transform(setup, data, data, 0, PFFFT_BACKWARD);

    for (int j = 0; j < yin_buffer_size; ++j)
        yin_buffer[j] =
            power_terms[0] + power_terms[j] - 2 * data[2 * (yin_buffer_size - 1 + j)];

    free(power_terms);
    pffftd_aligned_free(data);
    pffftd_aligned_free(kernel);
    pffftd_destroy_setup(setup);
}

void cumulative_mean_normalized_difference(double *yin_buffer, int yin_buffer_size)
{
    double running_sum = 0.0;

    yin_buffer[0] = 1;

    for (int tau = 1; tau < yin_buffer_size; tau++) {
        running_sum += yin_buffer[tau];
        yin_buffer[tau] *= tau / running_sum;
    }
}

int absolute_threshold(double *yin_buffer, int yin_buffer_size)
{
    int tau;

    for (tau = 2; tau < yin_buffer_size; tau++)
        if (yin_buffer[tau] < YIN_THRESHOLD)
        {
            while (tau + 1 < yin_buffer_size && yin_buffer[tau + 1] < yin_buffer[tau])
                tau++;
            break;
        }

    return (tau == yin_buffer_size || yin_buffer[tau] >= YIN_THRESHOLD) ? -1 : tau;
}

double parabolic_interpolation(int tau_estimate, double *yin_buffer, int yin_buffer_size)
{
    double better_tau;
    int x0;
    int x2;

    if (tau_estimate < 1)
        x0 = tau_estimate;
    else
        x0 = tau_estimate - 1;
    if (tau_estimate + 1 < yin_buffer_size)
        x2 = tau_estimate + 1;
    else
        x2 = tau_estimate;

    if (x0 == tau_estimate)
        if (yin_buffer[tau_estimate] <= yin_buffer[x2])
            better_tau = tau_estimate;
        else
            better_tau = x2;
    else if (x2 == tau_estimate)
        if (yin_buffer[tau_estimate] <= yin_buffer[x0])
            better_tau = tau_estimate;
        else
            better_tau = x0;
    else
    {
        double s0, s1, s2;
        s0 = yin_buffer[x0];
        s1 = yin_buffer[tau_estimate];
        s2 = yin_buffer[x2];
        better_tau = tau_estimate + (s2 - s0) / (2 * (2 * s1 - s2 - s0));
    }

    return better_tau;
}

double yin_pitch(double *audio_buffer, int audio_buffer_size, int samplerate)
{
    int yin_buffer_size = audio_buffer_size / 2;
    double *yin_buffer = malloc(yin_buffer_size * sizeof(double));
    int tau_estimate;
    double better_tau;

    difference(audio_buffer, audio_buffer_size, yin_buffer);
    cumulative_mean_normalized_difference(yin_buffer, yin_buffer_size);
    tau_estimate = absolute_threshold(yin_buffer, yin_buffer_size);
    better_tau = parabolic_interpolation(tau_estimate, yin_buffer, yin_buffer_size);

    free(yin_buffer);

    return samplerate / better_tau;
}

int main()
{
    int audio_buffer_size = 8192;
    double frequency = 3398.0;
    int samplerate = 48000;
    double *audio_buffer = malloc(audio_buffer_size * sizeof(double));
    double pitch;

    sinewave(frequency, samplerate, audio_buffer_size, audio_buffer);

    pitch = yin_pitch(audio_buffer, audio_buffer_size, samplerate);

    printf("result=%f\n", pitch);
    printf("Success.\n");

    free(audio_buffer);

    return 0;
}

(Code is influenced by https://github.com/JorenSix/TarsosDSP/blob/master/core/src/main/java/be/tarsos/dsp/pitch/FastYin.java and https://github.com/sevagh/pitch-detection/blob/master/src/yin.cpp)

have C examples - besides the C++ ones

dsp code taken from csdr without attribution

Looking over a lot of the DSP code it is very obviously taken from libcsdr without any attribution to the original project. There is even some mention of the original variable names in a comment here:

pffft/pf_mixer.h

Line 119 in e0bf595

* size must be multiple of CSDR_SHIFT_LIMITED_SIMD (= 4)

apple support

i have an experimental branch with support for Apple M1 and Raspberry 400.
it is here https://github.com/unevens/pffft/tree/m1
i'm not sure about the stuff that i commented regarding "-mfpu=neon", which may need to be conditionally enabled on other platforms.

There is an SSE instruction in pf_neon_float.h.

/* reverse/flip all floats */
#  define VREV_S(a)    _mm_shuffle_ps(a, a, _MM_SHUFFLE(0,1,2,3))
/* reverse/flip complex floats */
#  define VREV_C(a)    _mm_shuffle_ps(a, a, _MM_SHUFFLE(1,0,3,2))

Perhaps the following is correct.

/* reverse/flip all floats */
#  define VREV_S(a)    vcombine_f32(vrev64_f32(vget_high_f32(a)), vrev64_f32(vget_low_f32(a)))
/* reverse/flip complex floats */
#  define VREV_C(a)    vextq_f32(a, a, 2)

I consulted the following site.

https://stackoverflow.com/questions/32536265/how-to-convert-mm-shuffle-ps-sse-intrinsic-to-neon-intrinsic

Building on M1 computer failed: CMake warning: unsupported CMAKE_SYSTEM_PROCESSOR 'arm64'

When I run this command: CC=/usr/bin/clang CXX=/usr/bin/clang++ cmake -DCMAKE_BUILD_TYPE=Debug ../
I get the following warning:

CMake Warning at cmake/target_optimizations.cmake:57 (message):
  unsupported CMAKE_SYSTEM_PROCESSOR 'arm64'

Does that mean arm64 architecture is not supported by the make file? I have zero experience with modifying makefile, but what are the necessary changes to make it build for arm64? Thank you!

clang-cl (windows) confused by std::complex

Hi,

clang-cl (windows) is confused by std::complex<> you need to add the :: in front to fix that issue:

@@ -492,8 +492,8 @@
 template<>
-class Setup< std::complex<float> >
+class Setup< ::std::complex<float> >
 {
   PFFFT_Setup* self;
 
 public:
@@ -496,8 +496,8 @@
 {
   PFFFT_Setup* self;
 
 public:
-  typedef std::complex<float> value_type;
+  typedef ::std::complex<float> value_type;
   typedef Types< value_type >::Scalar Scalar;
 
   Setup()

Anonymous namespace causing warning

I'm getting the following warning (or error in this case, as I have warnings as errors enabled) while compiling pffft with GCC on ARM64, with strict warnings enabled:

[build] /home/pi/convolution-thing/source/../externals/pffft/pffft.hpp: In instantiation of ‘class pffft::Fft<float>’:
[build] /home/pi/convolution-thing/source/./math/FFTpffft.h:25:20:   required from here
[build] /home/pi/convolution-thing/source/../externals/pffft/pffft.hpp:124:7: error: ‘pffft::Fft<float>’ has a field ‘pffft::Fft<float>::setup’ whose type uses the anonymous namespace [-Werror=subobject-linkage]

I looked into the issue and looks like the problem is that each source file including the header will get its own copy of the anonymous namespace, which means the type will have a different definition in different compile units. I was unable to fix the issue myself as I don't have a good grasp of the structure of pffft, so can't attach a pull request either.

Here's some discussion on the issue:
https://stackoverflow.com/questions/37722850/how-to-silence-whose-type-uses-the-anonymous-namespace-werror-gcc-version-4

benchmarking with cmake

idea is, to get benchmarking run (without a shell script bench_all.sh) with MSVC on Windows, too

please have a look at https://github.com/hayguen/pffft/tree/bench_with_cmake
it is work in progress .. but probably there are already comments on how to improve?

Missing September 2016 commit

So something really weird is going on... First, thanks for hosting a mirror of sorts.

A Software called VCVRack requires pffft library.
I've got a github action which builds that software from source for a few versions across all three major OS's.
It's run successfully in 2020 fetching a https://bitbucket.org/jpommier/pffft/get/29e4f76ac53b.zip which does not seem to exist any more. The files in that archive are from September 22 2016, which is not a commit I can see on bitbucket or here.

Why does it seem like the original author and this repo have missing commits?

Question about real forward transform output size

Hi, I tried to use pffft in my project and I found output size is different between pffft and fftw.
In a N sample real forward fft, the output of pffft is of length N, with re and im interleaved. While in a fftw, the output is a fftw_complex which holds N samples real and N samples image.
Could anyone tell the different?

benchmark/compare non-power-of-2 transforms against next-power-of-2 FFT sizes

Where is the pffft_nearest_transform_size and pffft_is_valid_size implementation?

Note that I only looked at the code. I'm planning to use the original pffft and wanted to pick these functions from this version (without the extra baggage). I searched the code but couldn't find an implementation for these functions.

double precision support with avx instructions

I've a fork of this library in which I added support for double precision floating point numbers using AVX instructions.
I'd be happy to send you a pull request if you are interested.

Build fails on powerpc64le

LLVM 11.0.1 on FreeBSD 13.0-RELEASE

/usr/bin/cc -DPFFFT_EXPORTS -DPFFFT_SCALVEC_ENABLED=1 -D_USE_MATH_DEFINES  -O2 -pipe  -fstack-protector-strong -fno-strict-aliasing -O2 -pipe  -fstack-protector-strong -fno-strict-aliasing -fPIC -std=c99 -MD -MT CMakeFiles/PFFFT.dir/pffft.c.o -MF CMakeFiles/PFFFT.dir/pffft.c.o.d -o CMakeFiles/PFFFT.dir/pffft.c.o -c /wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft.c
In file included from /wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft.c:98:
In file included from /wrkdirs/usr/ports/math/pffft/work/pffft-9603871/simd/pf_float.h:64:
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/simd/pf_altivec_float.h:41:9: warning: /wrkdirs/usr/ports/math/pffft/work/pffft-9603871/simd/pf_altivec_float.h: ALTIVEC float macros are defined [-W#pragma-messages]
#pragma message( __FILE__ ": ALTIVEC float macros are defined" )
        ^
In file included from /wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft.c:132:
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:1937:15: warning: implicit declaration of function 'VLOAD_ALIGNED' is invalid in C99 [-Wimplicit-function-declaration]
        C.v = VLOAD_ALIGNED( ptr );
              ^
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:1937:13: error: assigning to 'v4sf' (vector of 4 'float' values) from incompatible type 'int'
        C.v = VLOAD_ALIGNED( ptr );
            ^ ~~~~~~~~~~~~~~~~~~~~
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:1943:15: warning: implicit declaration of function 'VLOAD_UNALIGNED' is invalid in C99 [-Wimplicit-function-declaration]
        C.v = VLOAD_UNALIGNED( ptr );
              ^
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:1943:13: error: assigning to 'v4sf' (vector of 4 'float' values) from incompatible type 'int'
        C.v = VLOAD_UNALIGNED( ptr );
            ^ ~~~~~~~~~~~~~~~~~~~~~~
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:2186:11: warning: implicit declaration of function 'VREV_S' is invalid in C99 [-Wimplicit-function-declaration]
    C.v = VREV_S(A.v);
          ^
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:2186:9: error: assigning to 'v4sf' (vector of 4 'float' values) from incompatible type 'int'
    C.v = VREV_S(A.v);
        ^ ~~~~~~~~~~~
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:2206:11: warning: implicit declaration of function 'VREV_C' is invalid in C99 [-Wimplicit-function-declaration]
    C.v = VREV_C(A.v);
          ^
/wrkdirs/usr/ports/math/pffft/work/pffft-9603871/pffft_priv_impl.h:2206:9: error: assigning to 'v4sf' (vector of 4 'float' values) from incompatible type 'int'
    C.v = VREV_C(A.v);
        ^ ~~~~~~~~~~~
5 warnings and 4 errors generated.

confusion about memory alignment.

Hello, I have some questions about this part of the code related to memory alignment.

static void * Valigned_malloc(size_t nb_bytes) {
  void *p, *p0 = malloc(nb_bytes + MALLOC_V4SF_ALIGNMENT);
  if (!p0) return (void *) 0;
  p = (void *) (((size_t) p0 + MALLOC_V4SF_ALIGNMENT) & (~((size_t) (MALLOC_V4SF_ALIGNMENT-1))));
  *((void **) p - 1) = p0;
  return p;
}

When p0 is allocated an address of xxxxx63, the aligned address after alignment would be xxxxx64, and *((void **) p - 1) would exceed the space of p0.
This is my understanding, is it correct?

Support for 2D?

Thanks for this great library! Is there a plan to support also 2D transforms? It would be quite helpful.

github's about text

Github's main page for https://github.com/marton78/pffft shows following about text in upper right corner:

A GitHub mirror of Julien Pommier's PFFFT: a pretty fast FFT.

With the many changes and additions, i would suggest changing to something like

A fork of pretty fast FFT (PFFFT) with several additions

ARM compiler options are wrong

16:37:39 [ 8%] Building C object PFFFT/CMakeFiles/PFFFT.dir/pffft_double.c.o
16:37:39 arm-buildroot-linux-gnueabihf-gcc.br_real: error: unrecognized command-line option '-msse2'

Starting from line 170 in CMakeLists.txt there is no check for arm platforms, like this:

elseif(CMAKE_COMPILER_IS_GNUCC AND NOT USE_SIMD_NEON)

Comparison with KissFFT

Hello, I just discovered this library thanks to GitHub Explore feature.
It reminds me of https://github.com/mborgerding/kissfft which was developed with the same approach. Simple code and good performance. Have you ever tried it and compared some benchmarks?

Automatic Build with Travis

Looks, Travis supports more platforms than GitHub Actions ..
Especially, ppc64le and arm64 sound interesting, which are listed at https://docs.travis-ci.com/user/multi-cpu-architectures/.
ppc64le necessary for Issue #55 anyways ..

Prime decomposition

It seems that this implementation does not support N decomposition with prime > 5 which was initialy supported by FFTPACK.
ifac variable in decompose output is incorrect and only contains primes <= 5.
Ex: N=55
FFTPACK legacy can decompose 55 as ifac {5,11}.
PFFFT decompose reports ifac={5}
There is no return code nor assertion that reports this limitation, and the limitation is only documented in pffft.h file => u could add the limitation to README ?
Or add bluestein support ?

add OOURA FFT to benchmarks?

performance of OOURA FFT commented in
avaneev/r8brain-free-src#3 (comment)
sounds quite good.

should we add it to the benchmarks?

Maybe give another name to the library?

Hi,

It seems that your project diverged from the original and your changes didn't make it back into the main.
At this point would it make sense to rename your library to avoid confusion?

Cheers,
Alex

ZCONVOLVE_USING_INLINE_NEON_ASM is bugged

There are two bugs involving ZCONVOLVE_USING_INLINE_NEON_ASM.

First of all, there's a typo which results in the hand-written assembler version never getting used. See

# ifndef __clang__
#   define ZCONVOLVE_USING_INLINE_NEON_ASM
# endif

#ifdef ZCONVOLVE_USING_INLINE_ASM

However, if that's fixed, we get a lot of complaints from GCC:

[build] /tmp/ccN5565L.s: Assembler messages:
[build] /tmp/ccN5565L.s:5723: Error: operand 1 must be an integer register -- `mov r8,x7'
[build] /tmp/ccN5565L.s:5724: Error: unknown mnemonic `vdup.f32' -- `vdup.f32 q15,x1'
[build] /tmp/ccN5565L.s:5726: Error: unknown mnemonic `pld' -- `pld [x5,#64]'
[build] /tmp/ccN5565L.s:5727: Error: unknown mnemonic `pld' -- `pld [x6,#64]'
[build] /tmp/ccN5565L.s:5728: Error: unknown mnemonic `pld' -- `pld [x7,#64]'
[build] /tmp/ccN5565L.s:5729: Error: unknown mnemonic `pld' -- `pld [x5,#96]'
[build] /tmp/ccN5565L.s:5730: Error: unknown mnemonic `pld' -- `pld [x6,#96]'
[build] /tmp/ccN5565L.s:5731: Error: unknown mnemonic `pld' -- `pld [x7,#96]'
[build] /tmp/ccN5565L.s:5732: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {q0,q1},[x5,:128]!'
[build] /tmp/ccN5565L.s:5733: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {q4,q5},[x6,:128]!'
[build] /tmp/ccN5565L.s:5734: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {q2,q3},[x5,:128]!'
[build] /tmp/ccN5565L.s:5735: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {q6,q7},[x6,:128]!'
[build] /tmp/ccN5565L.s:5736: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {q8,q9},[r8,:128]!'
[build] /tmp/ccN5565L.s:5737: Error: unknown mnemonic `vmul.f32' -- `vmul.f32 q10,q0,q4'
[build] /tmp/ccN5565L.s:5738: Error: unknown mnemonic `vmul.f32' -- `vmul.f32 q11,q0,q5'
[build] /tmp/ccN5565L.s:5739: Error: unknown mnemonic `vmul.f32' -- `vmul.f32 q12,q2,q6'
[build] /tmp/ccN5565L.s:5740: Error: unknown mnemonic `vmul.f32' -- `vmul.f32 q13,q2,q7'
[build] /tmp/ccN5565L.s:5741: Error: unknown mnemonic `vmls.f32' -- `vmls.f32 q10,q1,q5'
[build] /tmp/ccN5565L.s:5742: Error: unknown mnemonic `vmla.f32' -- `vmla.f32 q11,q1,q4'
[build] /tmp/ccN5565L.s:5743: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {q0,q1},[r8,:128]!'
[build] /tmp/ccN5565L.s:5744: Error: unknown mnemonic `vmls.f32' -- `vmls.f32 q12,q3,q7'
[build] /tmp/ccN5565L.s:5745: Error: unknown mnemonic `vmla.f32' -- `vmla.f32 q13,q3,q6'
[build] /tmp/ccN5565L.s:5746: Error: unknown mnemonic `vmla.f32' -- `vmla.f32 q8,q10,q15'
[build] /tmp/ccN5565L.s:5747: Error: unknown mnemonic `vmla.f32' -- `vmla.f32 q9,q11,q15'
[build] /tmp/ccN5565L.s:5748: Error: unknown mnemonic `vmla.f32' -- `vmla.f32 q0,q12,q15'
[build] /tmp/ccN5565L.s:5749: Error: unknown mnemonic `vmla.f32' -- `vmla.f32 q1,q13,q15'
[build] /tmp/ccN5565L.s:5750: Error: unknown mnemonic `vst1.f32' -- `vst1.f32 {q8,q9},[x7,:128]!'
[build] /tmp/ccN5565L.s:5751: Error: unknown mnemonic `vst1.f32' -- `vst1.f32 {q0,q1},[x7,:128]!'
[build] /tmp/ccN5565L.s:5752: Error: operand 2 must be an integer or stack pointer register -- `subs x4,#2'

This is on a Raspberry Compute Module 4 with the beta 64-bit operating system and GCC 8.3.0:
Linux convolutionpi 5.10.17-v8+ #1414 SMP PREEMPT Fri Apr 30 13:23:25 BST 2021 aarch64 GNU/Linux

Support for unaligned arrays

Do I get it right that the library only supports aligned arrays as input (and output)? Is there a way to make it work with a non-aligned array?

Background: I'm trying to wrap the library in Java Native Interface, but it keeps crashing with a segfault at a vmovapd instruction. Inspecting the register dump, I can see that the library is indeed trying to aligned-load to an AVX register an address that's only aligned to 16 bytes. Java's double arrays are normally aligned to 32 bytes, but they also have a 16-byte header.

add 'convolution' and/or 'fast-convolution' to topics list below About

@marton78 : looks, i have not the right to do so ..

crashes on macOS in threads with buffer size > 64k

not sure if you want to deal with issues since this is technically a mirror, but you are ahead of the bitbucket repo for fixes ...

I've managed to find an issue where pffft crashes with an address boundary error when pffft is used in a thread with a buffer size > 64k:

#include <pthread.h>
#include <stdio.h>

#include "pffft.h"

void *thread_test(void *f) {
  PFFFT_Setup* setup = pffft_new_setup(65536, PFFFT_REAL);

  float *in = pffft_aligned_malloc(sizeof(float) * 65536 * 2);
  float *out = pffft_aligned_malloc(sizeof(float) * 65536 * 2);
  pffft_transform_ordered(setup, in, out, NULL, PFFFT_FORWARD);
  pffft_transform_ordered(setup, in, out, NULL, PFFFT_BACKWARD);

  pffft_destroy_setup(setup);

  return NULL;
}

int main() {
  fprintf(stderr, "does not crash\n");
  thread_test(NULL);
  fprintf(stderr, "does crash\n");
  pthread_t thread_id;
  pthread_create(&thread_id, NULL, thread_test, NULL);

  pthread_join(thread_id, NULL);

  return 0;
}

seems to work in linux.

$ clang --version
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

marton78 / pffft Goto Github PK

pffft's People

Contributors

Stargazers

Watchers

Forkers

pffft's Issues

Recommend Projects

Recommend Topics

Recommend Org