mpip / pfft Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 22.0 715 KB

Parallel fast Fourier transforms

License: GNU General Public License v3.0

C 61.17% Shell 5.16% Perl 1.08% Fortran 7.94% Makefile 0.84% M4 20.69% Objective-C 3.12%

pfft's People

Contributors

Stargazers

Watchers

pfft's Issues

2D on 2D

Is it possible to do 2D transforms on a 2D procmesh?
I tried this and it seem to have ran into a divide by zero error. Here is the stack trace. There are test cases in the source code doing 3D on 3D, so I imaging it may not be very difficult to extend the library to support 2D on 2D?


#0  0x00007fffe2bc744a in pfft_num_blocks (global_block_size=0, 
    global_array_size=0) at ../../pfft-1.0.8-alpha2-fftw3/kernel/block.c:84
#1  pfft_local_block_offset (which_block=0, global_block_size=0, 
    global_array_size=0) at ../../pfft-1.0.8-alpha2-fftw3/kernel/block.c:58
#2  pfft_local_block_size_and_offset (global_array_size=0, 
    global_block_size=0, which_block=0, 
    local_block_size=local_block_size@entry=0x108f230, 
    local_block_start=local_block_start@entry=0x108f250)
    at ../../pfft-1.0.8-alpha2-fftw3/kernel/block.c:37
#3  0x00007fffe2bbdc98 in pfft_decompose_1d (local_n_start=0x108f250, 
    local_n=0x108f230, which_block=<optimized out>, 
    block_size=<optimized out>, pn=<optimized out>)
    at ../../pfft-1.0.8-alpha2-fftw3/util/util.c:58
#4  pfft_decompose (pn=<optimized out>, block=<optimized out>, 
    rnk_pm=<optimized out>, coords_pm=<optimized out>, 
    local_n=<optimized out>, local_start=<optimized out>)
    at ../../pfft-1.0.8-alpha2-fftw3/util/util.c:50
#5  0x00007fffe2bd4f33 in decompose_nontransposed (
    local_start=<optimized out>, local_n=<optimized out>, 
    trafo_flag=<optimized out>, coords_pm=<optimized out>, 
    rnk_pm=<optimized out>, blk=<optimized out>, n=<optimized out>, 
    rnk_n=<optimized out>)
    at ../../pfft-1.0.8-alpha2-fftw3/kernel/partrafo-transposed.c:381
---Type <return> to continue, or q <return> to quit---
#6  local_size_transposed (rnk_n=2, ni=<optimized out>, no=0x10db6f0, iblock=0x10db7f0, oblock=0x10db810, rnk_pm=2, coords_pm=0x10db850, trafo_flag=8194, 
    transp_flag=2, local_ni=0x108f210, local_i_start=0x108f230, local_no=0x108f220, local_o_start=0x108f240)
    at ../../pfft-1.0.8-alpha2-fftw3/kernel/partrafo-transposed.c:358
#7  0x00007fffe2bd54cf in pfft_local_size_partrafo_transposed (rnk_n=rnk_n@entry=2, n=n@entry=0x10db630, ni=ni@entry=0x109ccc0, no=no@entry=0x10db6f0, 
    howmany=howmany@entry=1, iblock=iblock@entry=0x10db7f0, oblock=0x10db810, rnk_pm=2, comms_pm=0x109cc60, transp_flag=2, trafo_flags=0x10db770, 
    local_ni=0x108f210, local_i_start=0x108f230, local_no=0x108f220, local_o_start=0x108f240) at ../../pfft-1.0.8-alpha2-fftw3/kernel/partrafo-transposed.c:108
#8  0x00007fffe2bc9906 in pfft_local_size_partrafo (rnk_n=2, n=0x10af590, ni=0x10af590, no=0x10af590, howmany=howmany@entry=1, 
    iblock_user=iblock_user@entry=0x0, oblock_user=0x0, comm=-2080374780, trafo_flag_user=8194, pfft_flags=2050, local_ni=0x108f210, local_i_start=0x108f230, 
    local_no=<optimized out>, local_o_start=0x108f240) at ../../pfft-1.0.8-alpha2-fftw3/kernel/partrafo.c:261
#9  0x00007fffe2bd1551 in pfft_local_size_many_dft_r2c (rnk_n=<optimized out>, n=<optimized out>, ni=<optimized out>, no=<optimized out>, 
    howmany=howmany@entry=1, iblock=iblock@entry=0x0, oblock=0x0, comm_cart=-2080374780, pfft_flags=2050, local_ni=0x108f210, local_i_start=0x108f230, 
    local_no=0x108f220, local_o_start=0x108f240) at ../../pfft-1.0.8-alpha2-fftw3/api/api-adv.c:54
#10 0x00007fffe2bc3008 in pfft_local_size_dft_r2c (rnk_n=<optimized out>, n=<optimized out>, comm_cart=<optimized out>, pfft_flags=<optimized out>, 
    local_ni=<optimized out>, local_i_start=<optimized out>, local_no=0x108f220, local_o_start=0x108f240) at ../../pfft-1.0.8-alpha2-fftw3/api/api-basic.c:910

Unclear relation betwen padded r2c / c2r and in-place transform.

The documentation seems to suggest it is possible to do in-place r2c/c2r without PFFT_PADDED_R2C.
I don't think it's possible, or is it?

pfft 3D data decomposition

Hi,
I have been testing the scaling of pfft on our cluster (a few thousands of broadwell nodes with 28 cores each). Although the scaling for my problem (3D grid of 128^3, quite a small grid indeed!) is satisfying, I find that its overall performance with respect to a code such as fftwpp (from Bowen's group), which uses a 2D data decomposition, is poor. Indeed, up until 64 CPU's I find that pfft is consistently 10 times slower that fftwpp. I am not surprised that using a 3D data decomposition with respect to a 2D, the performance would downgrade because of extra communications (as explained in the original paper). But the loss of performance buffles me, and I frankly think I might be doing something wrong somewhere in compiling pfft or in linking it to the system fftw3-mpi. Could you give me some clue on this?

Thank you in advance
Max.

PFFT with Intel MKL

I have been trying to compile PFFT with the MKL version of FFTW, which is supposed to be optimised instead of FFTW3 compiled by me. The compilation fails with
"""
checking for fftw_mpi_init in -lfftw3_mpi... no
configure: error: You do not seem to have the MPI part of the FFTW-3.3 library installed.
"""
I tried to tweak the configure script to change the library name (configure line 18201)
fftw3_mpi_LIBS="-lmkl_core -lmkl_intel_lp64 -lmkl_sequential -lpthread -lm"
but this didn't work.

Is there something I should do I'm not doing?

c2r new array gives wrong result.

it is due to line 953 in api-basic.c

complex_conjugate(conj_in, conj_out, ths->rnk_n, ths->local_ni);

This line overwrites the input with the output.
This wasn't caught by the test case due to pull request #3

Fortran interface to pfft_plan_with_nthreads missing

The Fortran interface to pfft_plan_with_nthreads is missing. I use this:

  interface
    subroutine pfft_plan_with_nthreads(nthreads) bind(C, name="pfft_plan_with_nthreads")
      import
      integer(C_INT), value :: nthreads
    end subroutine
  end interface

Broken symlink (file removed in tests but symlink to it kept in doc)

Hi, you have a broken symlink:

$ find -xtype l
./doc/code/manual_min_c2c.c

Looks like it has a whole history:

$ git log -- $(find -xtype l)
commit 5b625ef4454eaafe4df38b8ead339de5e6b4d6f5
Author: Michael Pippig <[email protected]>
Date:   Wed Jul 17 11:00:36 2013 +0200

    add pfft manual to build system

$ git log --stat -- $(readlink -f $(find -xtype l))
commit 3c1246470290add168dc9b4965d282471ef94a12
Author: Michael Pippig <[email protected]>
Date:   Thu Jun 12 16:04:12 2014 +0200

    create 1d procmesh for non-Cartesian communicator per default

 tests/manual_min_c2c.c | 52 ----------------------------------------------------
 1 file changed, 52 deletions(-)

commit a938fdc7c86f96cc8de2c85ec8ab6e44bff99d73
Author: Michael Pippig <[email protected]>
Date:   Wed Jun 11 16:28:54 2014 +0200

    manual: introduction, tutorial

 tests/manual_min_c2c.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

commit f9b86da68433c8cba545c7ee6f060a6fef807d2c
Author: Benedikt Morbach <[email protected]>
Date:   Tue Jan 21 10:56:37 2014 +0100

    pfft: add c2r / c2c comparison tests

 tests/manual_min_c2c.c | 52 ----------------------------------------------------
 1 file changed, 52 deletions(-)

commit 6f45f1c3972fa25e1b4cd2950246e34d9f795ac9
Author: Michael Pippig <[email protected]>
Date:   Fri Jan 17 15:45:46 2014 +0100

    remove padding of r2c inputs from interface and testcases

 tests/manual_min_c2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

commit ad68b4e4d6126952aa21c1d2ced4ed94223aa6c1
Author: Michael Pippig <[email protected]>
Date:   Thu Jul 18 02:40:57 2013 +0200

    PFFT manual: save intermediate state

 tests/manual_min_c2c.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

commit 5b625ef4454eaafe4df38b8ead339de5e6b4d6f5
Author: Michael Pippig <[email protected]>
Date:   Wed Jul 17 11:00:36 2013 +0200

    add pfft manual to build system

 tests/manual_min_c2c.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

New many core systems.

The next generation of Intel will have something like 70+ cores (Knight Landing); mass deployment to major computing facilities will be next year.

It may be a good case if PFFT can both scale out and scale in.

Pfft does not link itself to `fftw3-mpi.so`

For example, the library built from easybuild produces:

$ readelf -d $EBROOTPFFT/lib/libpfft.so

Dynamic section at offset 0x2ed38 contains 30 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libmpi.so.40]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libpfft.so.0]
 0x000000000000001d (RUNPATH)            Library runpath: [/opt/EasyBuild/2022a/software/OpenMPI/4.1.4-GCC-11.3.0/lib:/opt/EasyBuild/2022a/software/hwloc/2.7.1-GCCcore-11.3.0/lib:/opt/EasyBuild/2022a/software/libevent/2.1.12-GCCcore-11.3.0/lib]
 0x000000000000000c (INIT)               0x8000
 0x000000000000000d (FINI)               0x26abc
 0x0000000000000019 (INIT_ARRAY)         0x2fd28
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x2fd30
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000004 (HASH)               0x200
 0x000000006ffffef5 (GNU_HASH)           0xcb0
 0x0000000000000005 (STRTAB)             0x3e40
 0x0000000000000006 (SYMTAB)             0x1710
 0x000000000000000a (STRSZ)              9421 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x30000
 0x0000000000000002 (PLTRELSZ)           5832 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x68f8
 0x0000000000000007 (RELA)               0x66b8
 0x0000000000000008 (RELASZ)             576 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x6658
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0x630e
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

But pfft itself is calling functions specific to fftw3-mpi.so:

/opt/EasyBuild/2022a/software/binutils/2.38-GCCcore-11.3.0/bin/ld: /opt_buildbot/linux-debian11/sandybridge/EasyBuild/2022a/software/PFFT/1.0.8-alpha-foss-2022a/lib64/libpfft.so: undefined reference to `fftw_mpi_init'
/opt/EasyBuild/2022a/software/binutils/2.38-GCCcore-11.3.0/bin/ld: /opt_buildbot/linux-debian11/sandybridge/EasyBuild/2022a/software/PFFT/1.0.8-alpha-foss-2022a/lib64/libpfft.so: undefined reference to `fftw_mpi_execute_r2r'
/opt/EasyBuild/2022a/software/binutils/2.38-GCCcore-11.3.0/bin/ld: /opt_buildbot/linux-debian11/sandybridge/EasyBuild/2022a/software/PFFT/1.0.8-alpha-foss-2022a/lib64/libpfft.so: undefined reference to `fftw_mpi_plan_many_transpose'
/opt/EasyBuild/2022a/software/binutils/2.38-GCCcore-11.3.0/bin/ld: /opt_buildbot/linux-debian11/sandybridge/EasyBuild/2022a/software/PFFT/1.0.8-alpha-foss-2022a/lib64/libpfft.so: undefined reference to `fftw_mpi_local_size_many_transposed'
/opt/EasyBuild/2022a/software/binutils/2.38-GCCcore-11.3.0/bin/ld: /opt_buildbot/linux-debian11/sandybridge/EasyBuild/2022a/software/PFFT/1.0.8-alpha-foss-2022a/lib64/libpfft.so: undefined reference to `fftw_mpi_cleanup'

Fails to find -lfftw3_mpi because of undefined references ompi_xx

On FreeBSD I am getting:

configure:20297: checking for fftw_mpi_init in -lfftw3_mpi
configure:20330: cc -o conftest -O2 -pipe -fno-omit-frame-pointer  -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing   -fno-omit-frame-pointer -isystem /usr/local/include   -fstack-protector-strong -L/usr/local/lib   conftest.c -lfftw3_mpi  -lfftw3 -lm -lmpi  >&5
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_op_sum
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_comm_null
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_unsigned
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to MPI_Comm_f2c
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_op_land
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_unsigned_long
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_op_lor
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_char
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_int
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_double
ld: error: /usr/local/lib/libfftw3_mpi.so: undefined reference to ompi_mpi_op_max
cc: error: linker command failed with exit code 1 (use -v to see invocation)

Memory Layout of Ghost Cell API is undocumented.

The Ghost Cell part of the API never explicitly confirmed the ghost cell data is appended to the end of the allocated storage space.

local_start of 'empty' ranks is inconsistent.

The local_start of an empty rank is always set to 0. This is causes unnecessary branching in downstream code. The logical model is simpler if we just think of these 'stencils' as with a size of zero, but offsetted the same way as others.

For example the local_i_start of a 3d r2c transform on a 2x53 domain decomposition(this set-up is sub-optimal) is currently:

([   0,  512]),
([   0,   20,   40,   60,   80,  100,  120,  140,  160,  180,  200,
        220,  240,  260,  280,  300,  320,  340,  360,  380,  400,  420,
        440,  460,  480,  500,  520,  540,  560,  580,  600,  620,  640,
        660,  680,  700,  720,  740,  760,  780,  800,  820,  840,  860,
        880,  900,  920,  940,  960,  980, 1000, 1020,    0]

I would suggest to change the last 0 to 1020.

pkg-config file is not installed

The pfft.pc is created but not installed. This should probably got to $PREFIX/lib/pkg-config, so that this works out of the box.

pfft on Android

Hello,
I try to compile PFFT for Android. I was expecting to build it for the multiple architecture that I have to support (armeabi-v7a and arm64-v8a) by adding first parameters to the configure command like this:

./configure CXX="/Users/XXX/Library/Android/sdk/ndk/25.1.8937393/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ -target aarch64-linux-android" --prefix=/Users/XXX/Downloads/pfft-master/arm64-v8a

This command will replace gcc by clang as gcc is not able to compile for the two architectures that I need. the configure command works well but when I try to execute make, it complaines that symbol(s) not found for architecture x86_64.

I looked into the generated makefile and other files and it seems that gcc is used anyway, even if I pass clang in the CC parameters of the configure command.

Any idea on how I could generate the library for the two mentioned architectures?

manual compilation issues

My latex compiler (Fedora 19) complains about
1: dsfont -- Fedora didn't package it neither.
2: subfigure is deprecated, and new documents shall use subfig

Large errors in single precision roundtrips.

With a 16x16x16 mesh and dividing to 4 processes (1x4, 4x1, 2x2),
the roundtrip error on a gaussian initial field (r->c c->r) can be as big as 0.001.

The error is also large even on 1 process (comparing with numpy.fft), typically around 2e-5 in forward, and cummulates to ~ 0.002 after backward.

I wonder how this compares with FFTWF. And is there anything we can do about it.

Strided Input and output array.

The guru FFTW interface allows arbitrarily strided input and output array. PFFT does not.

This is a useful use case in a particle mesh code where the local mesh contains a 'ghost region' that is shared by other processes, but do not participate in the FFT.

Triggering MPICH_NO_BUFFER_ALIAS_CHECK.

If I use the patched version of FFTW3 directly, I get this error with a r2c in place transformation:

PMPI_Alltoall(925): Buffers must not be aliased. Consider using MPI_IN_PLACE or setting MPICH_NO_BUFFER_ALIAS_CHECK
Rank 118 [Mon Sep 14 16:51:53 2015] [c2-3c2s10n2] Fatal error in PMPI_Alltoall: Invalid buffer pointer, error stack:
PMPI_Alltoall(966): MPI_Alltoall(sbuf=0x2aab0cf9a040, scount=524800, MPI_FLOAT, rbuf=0x2aab0cf9a040, rcount=524800, MPI_FLOAT, comm=0xc4000002) failed

I wonder why at all we are doing a Alltoall from the same address? Is it safe to just skip this transformation if I == O?

Fix documentation of _skipped interface

"set skip_trafos[t]=1 if the t-th serial transformation should be computed, otherwise set skip_trafos[t]=0" is the wrong way around
for 3d-FFT with 3d decomposition skip_trafos is of length rnk_pm (manual tells that it is always of length rnk_pm+1)

Crash in pfft_plan_dft

The attached program for 2D c2c transform with a 1D parallel decomposition crashes. Encountered on several computers and slightly different versions of PFFT. Last test done with 1.0.6-alpha. In this form it crashes with PFFT_PRESERVE_INPUT. When I use in-place transform, it crashes also with PFFT_DESTROY_INPUT.

> mpicc -Wall -std=c99 pfft_crash.c -lpfft -lfftw3 -lfftw3_mpi -g
> mpirun -n 12 valgrind./a.out 
==13960== Invalid read of size 8
==13960==    at 0x5092B69: fftw_plan_destroy_internal (in /usr/lib64/libfftw3.so.3.4.4)
==13960==    by 0x5472453: ??? (in /usr/lib64/libfftw3_mpi.so.3.4.4)
==13960==    by 0x5094172: ??? (in /usr/lib64/libfftw3.so.3.4.4)
==13960==    by 0x5162969: ??? (in /usr/lib64/libfftw3.so.3.4.4)
==13960==    by 0x5162B63: fftw_mkapiplan (in /usr/lib64/libfftw3.so.3.4.4)
==13960==    by 0x546ED5B: fftw_mpi_plan_many_transpose (in /usr/lib64/libfftw3_mpi.so.3.4.4)
==13960==    by 0x4E4DF2F: pfft_plan_global_transp (in /usr/local/lib64/libpfft.so.0.0.0)
==13960==    by 0x4E41B03: pfft_plan_partrafo_transposed (in /usr/local/lib64/libpfft.so.0.0.0)
==13960==    by 0x4E47536: pfft_plan_partrafo (in /usr/local/lib64/libpfft.so.0.0.0)
==13960==    by 0x4E5472F: pfft_plan_many_dft (in /usr/local/lib64/libpfft.so.0.0.0)
==13960==    by 0x4E540AB: pfft_plan_dft (in /usr/local/lib64/libpfft.so.0.0.0)
==13960==    by 0x400E66: main (pfft_crash.c:44)
==13960==  Address 0xb98c240 is 0 bytes inside a block of size 168 free'd
==13960==    at 0x4C2A37C: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==13960==    by 0x5471EFF: fftw_mpi_mkplans_posttranspose (in /usr/lib64/libfftw3_mpi.so.3.4.4)
==13960==    by 0x54720C1: ??? (in /usr/lib64/libfftw3_mpi.so.3.4.4)
==13960==    by 0x5094172: ??? (in /usr/lib64/libfftw3.so.3.4.4)
==13960==    by 0x5162969: ??? (in /usr/lib64/libfftw3.so.3.4.4)
==13960==    by 0x5162B63: fftw_mkapiplan (in /usr/lib64/libfftw3.so.3.4.4)

program:


#include <complex.h>
#include <pfft.h>

int main(int argc, char **argv)
{
  int np[1];
  ptrdiff_t n[2];
  ptrdiff_t alloc_local;
  ptrdiff_t local_ni[2], local_i_start[2];
  ptrdiff_t local_no[2], local_o_start[2];
  double err;
  pfft_complex *in, *out;
  pfft_plan plan_forw=NULL, plan_back=NULL;
  MPI_Comm comm_cart_1d;

  /* Set size of FFT and process mesh */
  n[0] = 200; n[1] = 200;
  np[0] = 12;

  /* Initialize MPI and PFFT */
  MPI_Init(&argc, &argv);
  pfft_init();

  /* Create two-dimensional process grid of size np[0] x np[1], if possible */
  if( pfft_create_procmesh(1, MPI_COMM_WORLD, np, &comm_cart_1d) ){
    pfft_fprintf(MPI_COMM_WORLD, stderr, "Error: This test file only works with %d processes.\n", np[0]);
    MPI_Finalize();
    return 1;
  }

  /* Get parameters of data distribution */
  alloc_local = pfft_local_size_dft(2, n, comm_cart_1d, PFFT_TRANSPOSED_NONE,
      local_ni, local_i_start, local_no, local_o_start);

  /* Allocate memory */
  in  = pfft_alloc_complex(alloc_local);
  out = pfft_alloc_complex(alloc_local);

  /* Plan parallel forward FFT */
  plan_forw = pfft_plan_dft(
      2, n, in, out, comm_cart_1d, PFFT_FORWARD, PFFT_TRANSPOSED_NONE| PFFT_ESTIMATE| PFFT_PRESERVE_INPUT);

  /* Plan parallel backward FFT */
  plan_back = pfft_plan_dft(
      2, n, out, in, comm_cart_1d, PFFT_BACKWARD, PFFT_TRANSPOSED_NONE| PFFT_ESTIMATE| PFFT_PRESERVE_INPUT);

  /* Initialize input with random numbers */
  pfft_init_input_complex(2, n, local_ni, local_i_start,
      in);

  /* execute parallel forward FFT */
  pfft_execute(plan_forw);

  /* clear the old input */
  pfft_clear_input_complex(2, n, local_ni, local_i_start,
      in);

  /* execute parallel backward FFT */
  pfft_execute(plan_back);

  /* Scale data */
  for(ptrdiff_t l=0; l < local_ni[0] * local_ni[1]; l++)
    in[l] /= (n[0]*n[1]);

  /* Print error of back transformed data */
  err = pfft_check_output_complex(2, n, local_ni, local_i_start, in, comm_cart_1d);
  pfft_printf(comm_cart_1d, "Error after one forward and backward trafo of size n=(%td, %td):\n", n[0], n[1]); 
  pfft_printf(comm_cart_1d, "maxerror = %6.2e;\n", err);

  /* free mem and finalize */
  pfft_destroy_plan(plan_forw);
  pfft_destroy_plan(plan_back);
  MPI_Comm_free(&comm_cart_1d);
  pfft_free(in); pfft_free(out);
  MPI_Finalize();
  return 0;
}

GPU support.

There is already a GPU enabled parallel FFT library https://github.com/amirgholami/accfft/ .

AccFFT is written in C++. I wonder if we can port the GPU related code to C and use in pfft as a GPU backend.

Installation fails to link openmpi "PFFT requires an MPI C compiler"

Here is the log:

[avmo@kthxps pfft-1.0.8-alpha]$ 
[avmo@kthxps pfft-1.0.8-alpha]$ ls
api/        doc/    kernel/  tests/  aclocal.m4  bootstrap.sh*  config.h.in  configure.ac  COPYING       INSTALL      Makefile.in  pfft.pc.in  TODO
build-aux/  gcell/  m4/      util/   AUTHORS     ChangeLog      configure*   CONVENTIONS   fconfig.h.in  Makefile.am  NEWS         README
[avmo@kthxps pfft-1.0.8-alpha]$ export LANG=C
[avmo@kthxps pfft-1.0.8-alpha]$ ./bootstrap.sh 
PLEASE IGNORE WARNINGS AND ERRORS
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: linking file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: linking file 'm4/libtool.m4'
libtoolize: linking file 'm4/ltoptions.m4'
libtoolize: linking file 'm4/ltversion.m4'
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/bin/autoconf --force
autoreconf: running: /usr/bin/autoheader --force
autoreconf: running: automake --add-missing --copy --force-missing
configure.ac:139: installing 'build-aux/compile'
configure.ac:55: installing 'build-aux/missing'
api/Makefile.am: installing 'build-aux/depcomp'
autoreconf: Leaving directory `.'
[avmo@kthxps pfft-1.0.8-alpha]$ ./configure --prefix=/home/avmo/src/spack/opt/spack/linux-archrolling-x86_64/gcc-7.3.0/pfft-1.0.8-alpha-vg4mvddn4ybvvasdceoodnlxh3xfxv4d CC=/usr/bin/gcc MPICC=/usr/bin/mpicc FC=/usr/bin/gfortran MPIFC=/usr/bin/mpif90
configure: ****************************************************************
configure: *      Configuring in common/pfft                              *
configure: ****************************************************************
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for style of include used by make... GNU
checking for gcc... /usr/bin/gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /usr/bin/gcc accepts -g... yes
checking for /usr/bin/gcc option to accept ISO C89... none needed
checking whether /usr/bin/gcc understands -c and -o together... yes
checking dependency style of /usr/bin/gcc... gcc3
checking for function MPI_Init... no
checking for function MPI_Init in -lmpi... no
checking for function MPI_Init in -lmpich... no
configure: error: in `/home/avmo/tmp/pfft-1.0.8-alpha':
configure: error: PFFT requires an MPI C compiler.
See `config.log' for more details
[avmo@kthxps pfft-1.0.8-alpha]$ pacman -Qi openmpi
Name            : openmpi
Version         : 3.0.0-1
Description     : High performance message passing library (MPI)
Architecture    : x86_64
URL             : https://www.open-mpi.org
Licenses        : custom:OpenMPI
Groups          : None
Provides        : None
Depends On      : libltdl  hwloc  openssh
Optional Deps   : gcc-fortran: fortran support [installed]
Required By     : arpack  hdf5-openmpi  icet  ospray  python-mpi4py  python2-mpi4py
Optional For    : boost-libs  valgrind  vtk  vtk-visit  vtk6
Conflicts With  : None
Replaces        : None
Installed Size  : 9.52 MiB
Packager        : Levente Polyak <[email protected]>
Build Date      : Wed 20 Dec 2017 10:26:45 AM CET
Install Date    : Mon 08 Jan 2018 02:31:37 PM CET
Install Reason  : Installed as a dependency for another package
Install Script  : No
Validated By    : Signature

Compiling using Cray compilers

I have problems compiling the library using the Cray compiler. It is first detected to be a gcc compiler and then the configure has troubles finding the correct flags and options.

I was able to fix the unknown flag for extending the line width in Fortran by supplying -N 255 in FCFLAGS.

However, the configure than tries to link a C program with a Fortran object with -lgfortran, which fails.

The actual way how to link them with Cray is just

  ftn sub.f90 -c -o sub.o
  cc sub.o main.c

The configure fails in this step:

 configure: error: linking to Fortran libraries from C fails
 See `config.log' for more details.

and config.log contains several variations to

    configure:6540: checking for dummy main to link with Fortran libraries
    configure:6574: cc -o conftest -g   conftest.c   -L/opt/cray/cce/8.3.7/CC/x86-64/lib/x86-64 -L/opt/gcc/4.8.1/snos/lib64 /opt/cray/cce/8.3.7/craylibs/x86-64/libmodules.a /opt/cray/cce/8.3.7/craylibs/x86-64/libomp.a
    /opt/cray/cce/8.3.7/craylibs/x86-64/libopenacc.a -L/opt/cray/fftw/3.3.4.1/sandybridge/lib -L/opt/cray/dmapp/default/lib64 -L/opt/cray/mpt/7.1.1/gni/mpich2-cray/83/lib -L/opt/cray/libsci/13.0.1/CRAY/83/sandybridge
    /lib -L/opt/cray/rca/1.0.0-2.0501.48090.7.46.ari/lib64 -L/opt/cray/alps/5.1.1-2.0501.8507.1.1.ari/lib64 -L/opt/cray/xpmem/0.1-2.0501.48424.3.3.ari/lib64 -L/opt/cray/dmapp/7.0.1-1.0501.8315.8.4.ari/lib64 -L/opt/cra
    y/pmi/5.0.6-1.0000.10439.140.2.ari/lib64 -L/opt/cray/ugni/5.0-1.0501.8253.10.22.ari/lib64 -L/opt/cray/udreg/2.3.2-1.0501.7914.1.13.ari/lib64 -L/opt/cray/atp/1.7.5/lib -L/opt/cray/cce/8.3.7/craylibs/x86-64 -L/opt/c
    ray/wlm_detect/1.0-1.0501.47908.2.2.ari/lib64 -lfftw3f_mpi -lfftw3f_threads -lfftw3f -lfftw3_mpi -lfftw3_threads -lfftw3 -lAtpSigHandler -lAtpSigHCommData -lsci_cray_mpi_mp -lsci_cray_mp -lmpichf90_cray -lmpich_cr
    ay -lpgas-dmapp -lcray-c++-rts -lcraystdc++ -lxpmem -ldmapp -lpmi -ludreg -lalpslli -lalpsutil -lrca -lwlm_detect -lugni -lomp -lcraymp -lmodules -lfi -lf -lpthread -lcraymath -lm -lgfortran -lquadmath -lu -lrt -l
    csup -ltcmalloc_minimal -lstdc++ -L/opt/gcc/4.8.1/snos/lib/gcc/x86_64-suse-linux/4.8.1 -L/opt/cray/cce/8.3.7/cray-binutils/x86_64-unknown-linux-gnu/lib -L//usr/lib64 >&5
    CC-1254 craycc: WARNING 
      The environment variable "CPATH" is not supported.


    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):(.bss+0x13): multiple definition of `FLAG__namespace_do_not_use_directly_use_DECLARE_bool_instead::FLAGS_tcmalloc_abort_on_large_alloc'
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):(.bss+0x13): first defined here
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o): In function `TCMallocImplementation::GetAllocatedSize(void*)':
    tcmalloc.cc:(.text+0x1f0): multiple definition of `TCMallocImplementation::GetAllocatedSize(void*)'
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):tcmalloc.cc:(.text+0x1f0): first defined here
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o): In function `TCMallocImplementation::MarkThreadBusy()':
    tcmalloc.cc:(.text+0xe40): multiple definition of `TCMallocImplementation::MarkThreadBusy()'
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):tcmalloc.cc:(.text+0xe40): first defined here
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):(.bss+0x11): multiple definition of `FLAG__namespace_do_not_use_directly_use_DECLARE_bool_instead::FLAGS_tcmalloc_pad_cacheline'
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):(.bss+0x11): first defined here
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o): In function `tc_version':
    tcmalloc.cc:(.text+0x4e40): multiple definition of `tc_version'
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):tcmalloc.cc:(.text+0x4e40): first defined here
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o): In function `tc_set_new_mode':
    tcmalloc.cc:(.text+0x4e70): multiple definition of `tc_set_new_mode'
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o):tcmalloc.cc:(.text+0x4e70): first defined here
    /opt/cray/cce/8.3.7/craylibs/x86-64/libtcmalloc_minimal.a(tcmalloc.o): In function `tc_malloc':

How to fix the configure to accept the Cray compilers?

How do I install locally without privilege escalation?

I was trying to install this library on a supercomputing cluster. However, I can only install it locally in my home directory, since I do not have authorisation to install it elsewhere.

After loading the modules intel/2020.4, intelmpi/2020.4, openucx/1.13.1 and fftw/3.3.10, I cloned the git repository and followed the installation instructions and ran ./bootstap.h followed by ./configure and make install.

The Makefile does its thing for some time, then it fails with the following error:

make[2]: Nothing to be done for 'install-exec-am'.
 /usr/bin/mkdir -p '/usr/local/include'
 /usr/bin/install -c -m 644 pfft.h '/usr/local/include'
/usr/bin/install: cannot create regular file '/usr/local/include/pfft.h': Permission denied
make[2]: *** [Makefile:506: install-includeHEADERS] Error 1

Please advise what I must do to have it installed without needing admin privileges? Thank you.

Possible Memory Corruption.

Running a 4x4x4 transformation on 2x1 decomposition, valgrind gives the following error.

==18470== Uninitialised byte(s) found during client check request
==18470== at 0x3CE045CFE1: ??? (in /usr/lib64/openmpi/lib/libopen-pal.so.6.2.1)
==18470== by 0x3CE1075F74: PMPI_Sendrecv (in /usr/lib64/openmpi/lib/libmpi.so.1.6.0)
==18470== by 0x4301FE: transpose_chunks (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x4303F3: apply (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x407222: execute_transposed.isra.1 (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x407D4C: pfft_execute_full (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x402A61: pm_c2r (pmpfft.c:159)
==18470== by 0x4032E5: main (pmpfft.c:279)
==18470== Address 0xa3a53e0 is 0 bytes inside a block of size 192 alloc'd
==18470== at 0x4A08D84: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==18470== by 0x436824: fftw_malloc_plain (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x4300CD: transpose_chunks (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x4303F3: apply (in /home/yfeng1/source/cola_halo/a.out)
==18470== by 0x407222: execute_transposed.isra.1 (in /home/yfeng1/source/cola_halo/a.out)

Distributed Matrix Transposes

Is there a PFFT way to transpose pencil-decomposed 3D array globally?

Is using a pfft_plan_many_*_skipped function, and skipping all three transforms a viable option?

HTML manual

The current manual is written strictly in Latex, and it may be hard to compile it to HTML pages.

HTML documentation is easier to access than PDF documentation, because 1) it doesn't need a PDF reader and 2) is easier to search; and 3) is paginated by sections rather than by pages.

For HTML generation, I would recommend looking into restructured text and sphinx.
It will produce HTML and pdf documents.

If this is desired I can start a PR start porting tex to .rst .

#include <complex.h>
#include <pfft.h>

int main(int argc, char **argv)
{
  int np[2];
  ptrdiff_t n[3];
  ptrdiff_t alloc_local;
  ptrdiff_t local_ni[3], local_i_start[3];
  ptrdiff_t local_no[3], local_o_start[3];
  double err;
  pfft_complex *in, *out;
  pfft_plan plan_forw=NULL, plan_back=NULL;
  MPI_Comm comm_cart_2d;
  double data[] = {
        -0.51503939,  0.59189672,  0.0478734,  -0.48840469, -0.35495284, -0.39181335,
  1.86426106, -1.37148975,  2.22627536, -0.11810965,  0.11984837,  0.18259889,
 -0.65773926, -1.64623164, -1.14158407, -1.43908939,
  } ;
  /* Set size of FFT and process mesh */
//  n[0] = 29; n[1] = 27; n[2] = 31;
  n[0] = 2; n[1] = 3; n[2] = 2;
  np[0] = 2; np[1] = 1;

  /* Initialize MPI and PFFT */
  MPI_Init(&argc, &argv);
  pfft_init();
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  /* Create two-dimensional process grid of size np[0] x np[1], if possible */
  if( pfft_create_procmesh_1d(MPI_COMM_WORLD, np[0], &comm_cart_2d) ){
    pfft_fprintf(MPI_COMM_WORLD, stderr, "Error: This test file only works with %d processes.\n", np[0]*np[1]);
    MPI_Finalize();
    return 1;
  }

  /* Get parameters of data distribution */
  alloc_local = pfft_local_size_dft_3d(n, comm_cart_2d, PFFT_TRANSPOSED_NONE,
      local_ni, local_i_start, local_no, local_o_start);

  /* Allocate memory */
  in  = pfft_alloc_complex(alloc_local);
  out = pfft_alloc_complex(alloc_local);
  out = in;
  /* Plan parallel forward FFT */
  plan_forw = pfft_plan_dft_3d(
      n, in, out, comm_cart_2d, PFFT_FORWARD, PFFT_TRANSPOSED_NONE| PFFT_MEASURE| PFFT_DESTROY_INPUT);

  /* Plan parallel backward FFT */
  plan_back = pfft_plan_dft_3d(
      n, out, in, comm_cart_2d, PFFT_BACKWARD, PFFT_TRANSPOSED_NONE| PFFT_MEASURE| PFFT_DESTROY_INPUT);

  /* Initialize input with random numbers */
  pfft_init_input_complex_3d(n, local_ni, local_i_start,
      in);

  memcpy(in, data, sizeof(double) * alloc_local * 2);
  /* execute parallel forward FFT */
  pfft_execute(plan_forw);

  ptrdiff_t l;

  int r;
  for (r = 0; r < 2; r++) {
      MPI_Barrier(MPI_COMM_WORLD);
      if (r != rank) continue;
      printf ("out on rank %d :", rank);
      for(l=0; l < alloc_local * 2; l ++) {
            printf("%g\n", ((double*)out)[l]);
      }
  }

  /* clear the old input */
//  pfft_clear_input_complex_3d(n, local_ni, local_i_start,
 //     in);

  /* execute parallel backward FFT */
  pfft_execute(plan_back);

  /* Scale data */
  for (r = 0; r < 2; r++) {
      MPI_Barrier(MPI_COMM_WORLD);
      if (r != rank) continue;
      printf ("on rank %d :", rank);
      for(l=0; l < alloc_local * 2; l ++) {
            printf("%g %g\n", ((double*)in)[l], data[l]);
      }
  }
  for(l=0; l < local_ni[0] * local_ni[1] * local_ni[2]; l++)
    in[l] /= (n[0]*n[1]*n[2]);
  /* Print error of back transformed data */
  err = pfft_check_output_complex_3d(n, local_ni, local_i_start, in, comm_cart_2d);
  pfft_printf(comm_cart_2d, "Error after one forward and backward trafo of size n=(%td, %td, %td):\n", n[0], n[1], n[2]); 
  pfft_printf(comm_cart_2d, "maxerror = %6.2e;\n", err);

  /* free mem and finalize */
  pfft_destroy_plan(plan_forw);
  pfft_destroy_plan(plan_back);
  MPI_Comm_free(&comm_cart_2d);
 // pfft_free(in); pfft_free(out);
  MPI_Finalize();
  return 0;
}

1d transforms with pfft?

Is it possible to plan for 1d transforms with pfft with size-1 communicator?

I am seeing a divide by zero error in pfft_local_size_dft functions if I pass in a procmesh constructed with rnk_n=1.

mpip / pfft Goto Github PK

pfft's People

Contributors

Stargazers

Watchers

Forkers

pfft's Issues

Recommend Projects

Recommend Topics

Recommend Org