gschramm / parallelproj Goto Github PK

code for parallel TOF and NONTOF projections

License: MIT License

CMake 1.78% C 28.27% Cuda 23.01% Python 46.86% Dockerfile 0.07%

parallelproj's Introduction

OpenMP and CUDA libraries + python interface for 3D Joseph non-TOF and TOF forward and back projectors.

This project provides OpenMP and CUDA implementations and a python interface of 3D Joseph non-TOF and TOF forward and back projectors that can be e.g. used for image reconstruction. The input to the projectors are a list of start and end points for line of responses (LORs) such that they are very flexible and suitable for sinogram and listmode processing.

If you are using parallelproj, we recommend to read and cite our publication

G. Schramm, K. Thielemans: "PARALLELPROJ - An open-source framework for fast calculation of projections in tomography", Front. Nucl. Med., Volume 3 - 2023, doi: 10.3389/fnume.2023.1324562, link to paper, link to arxiv version

Installation, Documentation & Examples

Please refer to the official documentation here.

Building the OpenMP and CUDA libraries from source (developers only)

Dependencies

cmake>=3.16 (3.16 version needed to detect CUDA correctly), we recommend to use cmake>=3.23
a recent c compiler with OpenMP support (tested with gcc, msvc and clang)
cuda toolkit (optional, tested with >= 10)

Notes

If cuda is not available on the build system, the build of the cuda library is skipped (only the C/OpenMP library is build)
If you are building on a recent version of Ubuntu with cuda, we recommend to use cuda >= 11.7. See here why.

Building using cmake

We use CMake to auto generate a Makefile / Visual Studio project file which is used to compile the libraries. Make sure that cmake and the desired C compiler are on the PATH. The CMakeLists.txt is configured to search for CUDA. If CUDA is not present, compilation of the CUDA lib is skipped.

To build and install the libraries execute:

cd my_project_dir
mkdir build
cd build
cmake ..
cmake --build . --target install --config release

where my_project_dir is the directory that contains this file and the CMakeLists.txt file. Note that for the default installation directory, you usually need admin priviledges. To change the install directory, replace the 1st call to cmake by

cmake -DCMAKE_INSTALL_PREFIX=/foo/bar/myinstalldir ..

To build the documentation (doxygen required) run

cmake --build . --target docs

To run all unit tests execute:

ctest -VV

Setting CMAKE_CUDA_ARCHITECTURES

If you have CUDA available on your system (even if there is no physical CUDA GPU), the default for CMAKE_CUDA_ARCHITECTURES depends on the cmake version you are using.

cmake version >= 3.23: If you are using cmake >= 3.23, then by default CMAKE_CUDA_ARCHITECTURES=all which means that the code is build for all CUDA architectures.
3.16 <= cmake version < 3.23: If you are using cmake < 3.23, then the default of CMAKE_CUDA_ARCHITECTURES is set to the architecture that is present on your system. This means that if you are compiling on a system without physical CUDA GPU and using cmake < v3.23, you have to set it manually, e.g. via -DCMAKE_CUDA_ARCHITECTURES=75.

parallelproj's People

Contributors

Stargazers

Watchers

Forkers

kristhielemans danieldeidda prescientimaging devhliu lyconan jjgomezcadenas r-santo icrcompany

parallelproj's Issues

Separate malloc, compute and free

parallelproj is now in STIR (yay!).

However, due to STIR design, its still quite slow within it.
This is because STIR breaks up the problem a lot before the time to call the GPU projectors, and by then, it only requests small chunks of e.g. forward projection views.

At its current stage, this means that each of these calls, we copy the entire image into the GPU, then forward project the entire detector (this is a STIR issue that it asks too much to the GPU code) and finally once returned to the CPU, just copies the chunks that needs.

Of course, this is super wasteful, most of the times we make copies that are not needed and we ask for computation that is wasted. We have ideas on how to improve this. The part where we ask for more computation than required is more of a STIR problem, but the unnecessary copies need to be handled via this code. Ideally, we'd like STIR to be able to ask for few forward projections from an image that is already stored on the GPU. An easy solution to this would be for STIR to store the pointer of the GPU memory image, and then call the GPU code with an already allocated and copied GPU image.

All this to say: it would be great to have the forward and backprojection codes separated into 3 functions: allocate+copy, compute, free. And to have all these 3 functions exposed.

All this also applies to the backprojection.

OpenMP parallel for allocates new variable at each loop

Here (but also in all other forward and backward methods) the definition of a large number of variables is within the #pragma omp parallel for section, forcing to reallocate the variables at each iteration of a supposedly very large for loop.

I did not have the chance to check the implication for what regards the execution time of the code, but I presume that with many iterations this will be sensible. I suggest to split the #pragma omp in two sections:

#pragma omp parallel
{
    float d0, d1, d2, d0_sq, d1_sq, d2_sq; 
    float lsq, cos0_sq, cos1_sq, cos2_sq;
    unsigned short direction; 
    int i0, i1, i2;
    int i0_floor, i1_floor, i2_floor;
    int i0_ceil, i1_ceil, i2_ceil;
    float x_pr0, x_pr1, x_pr2;
    float tmp_0, tmp_1, tmp_2;
   
    float u0, u1, u2, d_norm;
    float x_m0, x_m1, x_m2;    
    float x_v0, x_v1, x_v2;    

    short it = tof_bin[0];
    float dtof, tw;

    // correction factor for cos(theta) and voxsize
    float cf;
    float toAdd;

    float sig_tof   = sigma_tof[0];
    float tc_offset = tofcenter_offset[0];

    float xstart0 = xstart[0];
    float xstart1 = xstart[1];
    float xstart2 = xstart[2];

    float xend0 = xend[0];
    float xend1 = xend[1];
    float xend2 = xend[2];

    float voxsize0 = voxsize[0];
    float voxsize1 = voxsize[1];
    float voxsize2 = voxsize[2];

    float img_origin0 = img_origin[0];
    float img_origin1 = img_origin[1];
    float img_origin2 = img_origin[2];

    unsigned char intersec;
    float t1, t2;
    float istart_f, iend_f, tmp;
    int   istart, iend;
    float istart_tof_f, iend_tof_f;
    int   istart_tof, iend_tof;

# pragma omp for schedule(static)
  for(i = 0; i < nlors; i++)
  {
    it = tof_bin[i];
    
    sig_tof   = sigma_tof[i];
    tc_offset = tofcenter_offset[i];

    xstart0 = xstart[i*3 + 0];
    xstart1 = xstart[i*3 + 1];
    xstart2 = xstart[i*3 + 2];

    xend0 = xend[i*3 + 0];
    xend1 = xend[i*3 + 1];
    xend2 = xend[i*3 + 2];

    ...
  }
}

Like this, the allocation of the variables happens a number-of-threads times, rather than nlors times.

Also, I do not think redefinitions of variables like xstart0 really simplify the code for the reader. I'd suggest to create a counter variable for the location in the array like pos0 = i * 3 + 0.

document that the forward projector accumulates

Admittedly, I didn't read the documentation... But it surprised me that forward projection adds to the existing data (but it's fine, as long as we know)

error when import parallelproj

i'm running parallelproj on m1 mac with python 3.8. the error message.
it showed up when I update parallelproj to version 1.7.1. is it because of my python version or os system

The error message:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/py4work/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-1e3692be8fc6>", line 20, in <module>
    import parallelproj
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/opt/anaconda3/envs/py4work/lib/python3.8/site-packages/parallelproj/__init__.py", line 1, in <module>
    from .backend import (
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/opt/anaconda3/envs/py4work/lib/python3.8/site-packages/parallelproj/backend.py", line 25, in <module>
    cuda_present = distutils.spawn.find_executable("nvidia-smi") is not None
AttributeError: module 'distutils' has no attribute 'spawn'

Torch example: "requires_grad=True" and CUDA out of memory

Hello!

In the current version of examples/07_torch/01_run_projection_layer.py the error
"ValueError: gradcheck expects at least one input tensor to require gradient, but none of the them have requires_grad=True."
occurs when a ListModePETProjector is used instead of RegularPolygonPETProjector.

Is this behavior to be expected?
Thank you in advance for your answer.

My environment (Anaconda):
Python 3.9
Pytorch 2.2.1
Pytorch CUDA 11.8
Installation of parallelproj via "conda install -c conda-forge parallelproj"

add const in most places

We need forward_proj(const float image, float* sino, ...), i.e. declare objects pointed to as constas much as possible. Best practice, but will also prevent ugly copies andconst_cast` in STIR

Installing parallelproj for STIR errors

Hi,

This is related to UCL/STIR#840.

I am trying to install parallelproj for use in the new STIR GPU projector on a cluster system (note I do not have admin privilages).

Copy/Paste:

I have installed parallelproj using the method detailed here, by calling:

python build_libs_for_python.py

I set parallelproj_DIR=/wherever/lib_Linux_64bit/cmake (it seems it creates lib_Linux_64bit instead of lib) and add it to my $PATH. The cmake configuration reports:

 Found parallelproj (will use its CUDA support)
 Parallelproj projector support enabled.

STIR installs as normal but then I get the following errors when calling many STIR executables.

[******@login13 build]$ forward_project 
forward_project: error while loading shared libraries: libparallelproj_c.so: cannot open shared object file: No such file or directory

Should I be attempting install with python or should I do my own custom cmake install? Note, I do not have admin privilage so I need to install everything to a custom location.

Cuda architecture problem in docker on GitHub Actions

SyneRBI/SIRF-SuperBuild#769 failed to build parallelproj in docker on GHA, probably due to using CMake 3.24. Full discussion there

linking error with clang

We have a (non-CUDA) Action compiling with clang but get
https://github.com/UCL/STIR/actions/runs/3803702720/jobs/6470298010#step:4:405

[100%] Linking C executable parallelproj_nontof_fwd_back_test
/usr/bin/ld: ../libparallelproj_c.so.1.2.9: undefined reference to `floorf'
/usr/bin/ld: ../libparallelproj_c.so.1.2.9: undefined reference to `sqrtf'
/usr/bin/ld: ../libparallelproj_c.so.1.2.9: undefined reference to `ceilf'

moving doxygen to .h files

I suggest moving the doxygen comments to the .h files. The C files are not installed, so not seen by a user.

chunking in TOF fwd projector is broken

if num_chunks is > 1, the TOF forwad (and probably also the back projection) are not the same as with num_chunks = 0 when using libparallelproj_cuda.so

Typo in include

https://github.com/gschramm/parallelproj/blob/master/c/include/parallelproj_c.h#L284
Has a ; at the end. Should be a comma?

Issue Regarding Installation Conflicts Between parallelproj and PyTorch

Dear developer,

I tried to install both parallelproj and pytorch-CUDA on Linux, but I encountered some issues.

If I follow the instructions in the documentation using conda install -c conda-forge parallelproj pytorch, torch will be the CPU version.

However, when I try to install them separately (regardless of whether I install parallelproj first or pytorch first), the installation process of the second package always prompts dependency conflicts and fails to install after checking.

The anaconda environment I tried to install is as follows: python==3.9, cudatoolkit==11.8, pytorch==2.2.1 / 2.3.1.

I would like to know if parallelproj and pytorch-CUDA need to be in specific versions to coexist, and if so, what the compatible version combination is?

correct behavior of fwd and back projectors when start and end points are within the image cube

right now, the fwd and back projectors step through all planes which means that voxels in the image cube that are "outside" on the LOR also contribute. This should be fixed by adapting the ray_cube_intersection functions.

Not reasonable result

Hi, I have a follow up question.

Currently I have 3D PET volume with size (344, 127, 344) from brainWeb and want get sinograms from it, do you have recommend settings for scanner and LOR descriptor? I tried some but not look reasonable. Code here:

`num_rings = 5
scanner = parallelproj.RegularPolygonPETScannerGeometry(
xp,
dev,
radius=400.0,
num_sides=32,
num_lor_endpoints_per_side=16,
lor_spacing=4.3,
ring_positions=xp.linspace(-70, 70, 36),
symmetry_axis=1,
)

lor_desc = parallelproj.RegularPolygonPETLORDescriptor(
scanner,
radial_trim=10,
max_ring_difference=2,
sinogram_order=parallelproj.SinogramSpatialAxisOrder.RVP,
)

proj = parallelproj.RegularPolygonPETProjector(
lor_desc, img_shape=(344, 127, 344), voxel_size=(2.0, 2.0, 2.0)
)`

And get this result:

Thanks for your time!

create CMake export files for use of parallelproj by external projects

See https://cmake.org/Wiki/CMake/Tutorials/Exporting_and_Importing_Targets

relevant STIR code is at https://github.com/UCL/STIR/blob/master/CMakeLists.txt#L215-L244 which could serve as an example

Questions regarding the positive and negative signs of the TOF-bin numbering

Dear Professor Georg Schramm,

I have some questions regarding the PET TOF listmode projector and the simulation of listmode events with TOF in the convert_sinogram_to_listmode function. I noticed that the TOF-bins for these listmode events are arranged as [-k, ..., 0, ..., k]. Could you please clarify the convention for the positive and negative signs of the TOF-bin numbering? Specifically, does TOF-bin[-k] represent the bin closer to one end of the LOR? For instance, does TOF-bin[-k] indicate the bin is closer to the xstart end, and TOF-bin[k] closer to the xend end?

Thank you for your assistance.

Number of bins in TOF Projectors

I may have missed some stuff in the documentation, but I suspect the following may be valuable to specify:

The argument tof_bins in joseph3d_fwd_tof_lm (and other corresponding functions) should be explicitly specified as Array[int] . In particular, it should be noted that only integer values can be specified here. .

The reason I say this is I was using float values for the tof_bins argument (namely ...-1.5,-0.5, 0.5, 1.5,...) as I was binning using an even number of bins, and so there was no "central" bin. In the C implementation, I see that that the tof_bins must have been converted to shorts. So I think an error should be thrown if the datatype of tof_bins is not integer.

Installation of parallelproj fails

Hello,
I am trying to install parallelproj in an ubuntu 22.03 machine with a NVIDIA GPU and CUDA libraries installed. I have followed the installation in the docs (using conda) to install parallelproj, cupy and pytorch. Everything is fine, but the first example fails.

>>> import array_api_compat.cupy as xp
>>> import parallelproj
/home/investigator/anaconda3/envs/epprj/lib/python3.12/site-packages/parallelproj/backend.py:18: UserWarning: The numpy.array_api submodule is still experimental. See NEP 47.
  from numpy.array_api._array_object import Array
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/investigator/anaconda3/envs/epprj/lib/python3.12/site-packages/parallelproj/__init__.py", line 1, in <module>
    from .backend import (
  File "/home/investigator/anaconda3/envs/epprj/lib/python3.12/site-packages/parallelproj/backend.py", line 171, in <module>
    raise ImportError(
ImportError: Cannot find parallelproj cuda lib. Consider settting the environment variable PARALLELPROJ_CUDA_LIB.

I have checked that, indeed, the library libparallelproj_c is installed but not libparallelproj_cuda (this is the reason why the error appears).

Thanks for your consideration and congratulations for what appears to be a very nice (and necessary) piece of software.

Issue regarding the negative Poisson log-likelihood function using sinogram data and list-mode data

Dear Professor Georg Schramm,

Thank you very much for your previous response regarding my question about the log-likelihood function! Based on my understanding of your reply, the Ote version of the list-mode negative Poisson log-likelihood function can be rewritten as:

$logL = \sum_i \bar{y_i}(x) - y_i \log( \bar{y_i}(x)) $

However, I noticed that this form is consistent with the negative Poisson log-likelihood function using sinogram data mentioned in this example(https://parallelproj.readthedocs.io/en/stable/auto_examples/05_algorithms/01_run_mlem_basic.html#sphx-glr-auto-examples-05-algorithms-01-run-mlem-basic-py), but differs from the list-mode negative Poisson log-likelihood function described here(https://parallelproj.readthedocs.io/en/stable/auto_examples/06_listmode_algorithms/01_listmode_mlem.html#sphx-glr-auto-examples-06-listmode-algorithms-01-listmode-mlem-py):

$logL = \sum_i \bar{y}_i (x) - \bar{y}_i (x) \log(y_i)$

Could you please clarify whether these two formulas are equivalent? If not, which one is the correct list-mode negative Poisson log-likelihood function?

Thank you again for your assistance.

Sincerely,
Kun Tian

Linking issue with libpthread

This is something that apparently occurs on modern Ubuntu systems.
Understanding of this issue has tested my knowledge but I think the fix/workaround is worth documenting.

I am building on the following system (WSL Ubuntu 22.04 LTS):

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

Installing with CUDA installed via sudo apt-get install nvidia-cuda-toolkit

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

I get the following error.

[ 52%] Linking CUDA device code CMakeFiles/parallelproj_cuda.dir/cmake_device_link.o
nvlink warning : Skipping incompatible '/lib/x86_64-linux-gnu/librt.a' when searching for -lrt
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/librt.a' when searching for -lrt
nvlink warning : Skipping incompatible '/lib/x86_64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libpthread.a' when searching for -lpthread
nvlink warning : Skipping incompatible '/lib/x86_64-linux-gnu/libdl.a' when searching for -ldl
nvlink warning : Skipping incompatible '/usr/lib/x86_64-linux-gnu/libdl.a' when searching for -ldl
nvlink fatal   : Could not open input file '/usr/lib/x86_64-linux-gnu/libpthread.a'
make[2]: *** [cuda/CMakeFiles/parallelproj_cuda.dir/build.make:211: cuda/CMakeFiles/parallelproj_cuda.dir/cmake_device_link.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:985: cuda/CMakeFiles/parallelproj_cuda.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

I do not fully understand this issue but it related to /usr/lib/x86_64-linux-gnu/libpthread.a being only 8 bytes with modern version of libc?

The best description I found was here: https://matsci.org/t/lammps-users-kokkos-linker-error-nvidia-libdl-a/41050

The fix I implemented for this was to change OpenMP_pthread_LIBRARY from /usr/lib/x86_64-linux-gnu/libpthread.a to /usr/lib/x86_64-linux-gnu/libpthread.so.0

This allows parallelproj to build and the tests pass.

Any input from @gschramm or @KrisThielemans would be appreciated.

Issue regarding the list-mode negative Poisson log-likelihood function

Dear Professor Georg Schramm,

I recently reviewed the various algorithm examples in the Iterative Listmode Algorithm Examples and noticed your expression for the list-mode negative Poisson log-likelihood function:

I would like to confirm with you whether this expression aligns with the Listmode re-formulation of the sinogram-based minimization problem in your paper "Fast and memory-efficient reconstruction of sparse Poisson data in listmode with non-smooth priors with application to time-of-flight PET."

Additionally, I observed that your expression for the list-mode negative Poisson log-likelihood function appears to differ from those mentioned in other literature, such as:

#K. Ote, F. Hashimoto, Y. Onishi, T. Isobe and Y. Ouchi, "List-Mode PET Image Reconstruction Using Deep Image Prior," in IEEE Transactions on Medical Imaging, vol. 42, no. 6, pp. 1822-1834, June 2023, doi: 10.1109/TMI.2023.3239596.

Can these two expressions be considered fundamentally equivalent?

I look forward to your response. Thank you!

Sincerely,
Kun Tian

Use matrix test for mac-os and windows

Adapt github action to also test on mac-os and windows.
Need to:

install openmp on mac-os
set env. var correctly

build failure without CUDA

I tried to build v1.2.9, admittedly on Ubuntu 16.04 VM, but I don't think that matters, and get

Compiler requires the CUDA toolkit.  Please set the CUDAToolkit_ROOT
  variable.

It looks like I need to set SKIP_CUDA_LIB. That's ok I guess, but it creates some difficulty for the SIRF-SuperBuild. We'll have to detect CUDA first and then set the variable accordingly. Not too hard I suppose (we probably do this already a bit).

It's not so easy to decide what the best way forward for this. @gschramm what do you think?

STIR now has linking errors when using parallelproj

One of the recent changes has broken something as we now have
https://github.com/SyneRBI/SIRF-SuperBuild/runs/2313382843?check_suite_focus=true#step:6:13025

/home/runner/install/lib/librecon_buildblock.a(ForwardProjectorByBinParallelproj.cxx.o): In function `stir::ForwardProjectorByBinParallelproj::set_input(stir::DiscretisedDensity<3, float> const&)':
ForwardProjectorByBinParallelproj.cxx:(.text+0xf94): undefined reference to `joseph3d_fwd(float const*, float const*, float const*, float const*, float const*, float*, long long, int const*)'

correct TOF start/end ranges and normalize TOF kernel

right now we calculate relevant TOF bins (sino mode) and start and stop positions (listmode) based on "n * sigma_tof"
this is not correct if the tofbin width >> sigma tof
TOF kernel is also not normalized (because it is truncated)

proposed fix

use "effective sigma" when calculating start and stop ranges -> sqrt(sig_tof^2 + (binwidth^2) / 12)
calculate missing part of Gaussian integral due to truncation:
truncated_int = erf((num_sig * sig_eff + 0.5 * binwidth) / (sqrt(2) * sig_eff))
or
truncated_int = erf((num_sig * sig_eff) / (sqrt(2) * sig_eff)) = erf(num_sig/sqrt(2))

proposed new tests

fwd project point in non-tof and tof sinomode + check wether sum over tofbins is non-tof

document units used for the projector

What are the units used? the same as what is used for the voxsize parameter?

import parallelproj error

Hi, after install using "conda install -c conda-forge parallelproj",

I met the error when ran the code at https://parallelproj.readthedocs.io/en/stable/auto_examples/02_pet_sinogram_projections/01_run_pet_non_tof_sinogram_projector.html#sphx-glr-auto-examples-02-pet-sinogram-projections-01-run-pet-non-tof-sinogram-projector-py:

ImportError: Cannot find parallelproj cuda lib. Consider settting the environment variable PARALLELPROJ_CUDA_LIB.

Do you know why it happened? Thanks!

My system: Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-91-generic x86_64)

add version numbers to shared libs

use approach similar to
set_target_properties(algebra PROPERTIES
VERSION ${testdot_VERSION}
SOVERSION ${testdot_VERSION_MAJOR})

gschramm / parallelproj Goto Github PK

parallelproj's Introduction

Installation, Documentation & Examples

Building the OpenMP and CUDA libraries from source (developers only)

Dependencies

Building using cmake

Setting CMAKE_CUDA_ARCHITECTURES

parallelproj's People

Contributors

Stargazers

Watchers

Forkers

parallelproj's Issues

Copy/Paste:

Recommend Projects

Recommend Topics

Recommend Org