Git Product home page Git Product logo

ceed / libceed Goto Github PK

View Code? Open in Web Editor NEW
183.0 25.0 45.0 19.49 MB

CEED Library: Code for Efficient Extensible Discretizations

Home Page: https://libceed.org

License: BSD 2-Clause "Simplified" License

Makefile 0.82% C 51.66% Shell 0.26% C++ 26.02% Fortran 5.05% Cuda 0.19% Python 6.60% Julia 4.34% Rust 5.06%
ceed high-order exascale-computing api ecp high-performance-computing gpu cuda julia linear-algebra hpc

libceed's Introduction

libCEED: Efficient Extensible Discretization

GitHub Actions GitLab-CI Code coverage BSD-2-Clause Documentation JOSS paper Binder

Summary and Purpose

libCEED provides fast algebra for element-based discretizations, designed for performance portability, run-time flexibility, and clean embedding in higher level libraries and applications. It offers a C99 interface as well as bindings for Fortran, Python, Julia, and Rust. While our focus is on high-order finite elements, the approach is mostly algebraic and thus applicable to other discretizations in factored form, as explained in the user manual and API implementation portion of the documentation.

One of the challenges with high-order methods is that a global sparse matrix is no longer a good representation of a high-order linear operator, both with respect to the FLOPs needed for its evaluation, as well as the memory transfer needed for a matvec. Thus, high-order methods require a new "format" that still represents a linear (or more generally non-linear) operator, but not through a sparse matrix.

The goal of libCEED is to propose such a format, as well as supporting implementations and data structures, that enable efficient operator evaluation on a variety of computational device types (CPUs, GPUs, etc.). This new operator description is based on algebraically factored form, which is easy to incorporate in a wide variety of applications, without significant refactoring of their own discretization infrastructure.

The repository is part of the CEED software suite, a collection of software benchmarks, miniapps, libraries and APIs for efficient exascale discretizations based on high-order finite element and spectral element methods. See http://github.com/ceed for more information and source code availability.

The CEED research is supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative.

For more details on the CEED API see the user manual.

% gettingstarted-inclusion-marker

Building

The CEED library, libceed, is a C99 library with no required dependencies, and with Fortran, Python, Julia, and Rust interfaces. It can be built using:

$ make

or, with optimization flags:

$ make OPT='-O3 -march=skylake-avx512 -ffp-contract=fast'

These optimization flags are used by all languages (C, C++, Fortran) and this makefile variable can also be set for testing and examples (below).

The library attempts to automatically detect support for the AVX instruction set using gcc-style compiler options for the host. Support may need to be manually specified via:

$ make AVX=1

or:

$ make AVX=0

if your compiler does not support gcc-style options, if you are cross compiling, etc.

To enable CUDA support, add CUDA_DIR=/opt/cuda or an appropriate directory to your make invocation. To enable HIP support, add ROCM_DIR=/opt/rocm or an appropriate directory. To enable SYCL support, add SYCL_DIR=/opt/sycl or an appropriate directory. Note that SYCL backends require building with oneAPI compilers as well:

$ . /opt/intel/oneapi/setvars.sh
$ make SYCL_DIR=/opt/intel/oneapi/compiler/latest/linux SYCLCXX=icpx CC=icx CXX=icpx

The library can be configured for host applications which use OpenMP paralellism via:

$ make OPENMP=1

which will allow operators created and applied from different threads inside an omp parallel region.

To store these or other arguments as defaults for future invocations of make, use:

$ make configure CUDA_DIR=/usr/local/cuda ROCM_DIR=/opt/rocm OPT='-O3 -march=znver2'

which stores these variables in config.mk.

WebAssembly

libCEED can be built for WASM using Emscripten. For example, one can build the library and run a standalone WASM executable using

$ emmake make build/ex2-surface.wasm
$ wasmer build/ex2-surface.wasm -- -s 200000

Additional Language Interfaces

The Fortran interface is built alongside the library automatically.

Python users can install using:

$ pip install libceed

or in a clone of the repository via pip install ..

Julia users can install using:

$ julia
julia> ]
pkg> add LibCEED

See the LibCEED.jl documentation for more information.

Rust users can include libCEED via Cargo.toml:

[dependencies]
libceed = "0.12.0"

See the Cargo documentation for details.

Testing

The test suite produces TAP output and is run by:

$ make test

or, using the prove tool distributed with Perl (recommended):

$ make prove

Backends

There are multiple supported backends, which can be selected at runtime in the examples:

CEED resource Backend Deterministic Capable
CPU Native
/cpu/self/ref/serial Serial reference implementation Yes
/cpu/self/ref/blocked Blocked reference implementation Yes
/cpu/self/opt/serial Serial optimized C implementation Yes
/cpu/self/opt/blocked Blocked optimized C implementation Yes
/cpu/self/avx/serial Serial AVX implementation Yes
/cpu/self/avx/blocked Blocked AVX implementation Yes
CPU Valgrind
/cpu/self/memcheck/* Memcheck backends, undefined value checks Yes
CPU LIBXSMM
/cpu/self/xsmm/serial Serial LIBXSMM implementation Yes
/cpu/self/xsmm/blocked Blocked LIBXSMM implementation Yes
CUDA Native
/gpu/cuda/ref Reference pure CUDA kernels Yes
/gpu/cuda/shared Optimized pure CUDA kernels using shared memory Yes
/gpu/cuda/gen Optimized pure CUDA kernels using code generation No
HIP Native
/gpu/hip/ref Reference pure HIP kernels Yes
/gpu/hip/shared Optimized pure HIP kernels using shared memory Yes
/gpu/hip/gen Optimized pure HIP kernels using code generation No
SYCL Native
/gpu/sycl/ref Reference pure SYCL kernels Yes
/gpu/sycl/shared Optimized pure SYCL kernels using shared memory Yes
MAGMA
/gpu/cuda/magma CUDA MAGMA kernels No
/gpu/cuda/magma/det CUDA MAGMA kernels Yes
/gpu/hip/magma HIP MAGMA kernels No
/gpu/hip/magma/det HIP MAGMA kernels Yes
OCCA
/*/occa Selects backend based on available OCCA modes Yes
/cpu/self/occa OCCA backend with serial CPU kernels Yes
/cpu/openmp/occa OCCA backend with OpenMP kernels Yes
/cpu/dpcpp/occa OCCA backend with DPC++ kernels Yes
/gpu/cuda/occa OCCA backend with CUDA kernels Yes
/gpu/hip/occa OCCA backend with HIP kernels Yes

The /cpu/self/*/serial backends process one element at a time and are intended for meshes with a smaller number of high order elements. The /cpu/self/*/blocked backends process blocked batches of eight interlaced elements and are intended for meshes with higher numbers of elements.

The /cpu/self/ref/* backends are written in pure C and provide basic functionality.

The /cpu/self/opt/* backends are written in pure C and use partial e-vectors to improve performance.

The /cpu/self/avx/* backends rely upon AVX instructions to provide vectorized CPU performance.

The /cpu/self/memcheck/* backends rely upon the Valgrind Memcheck tool to help verify that user QFunctions have no undefined values. To use, run your code with Valgrind and the Memcheck backends, e.g. valgrind ./build/ex1 -ceed /cpu/self/ref/memcheck. A 'development' or 'debugging' version of Valgrind with headers is required to use this backend. This backend can be run in serial or blocked mode and defaults to running in the serial mode if /cpu/self/memcheck is selected at runtime.

The /cpu/self/xsmm/* backends rely upon the LIBXSMM package to provide vectorized CPU performance. If linking MKL and LIBXSMM is desired but the Makefile is not detecting MKLROOT, linking libCEED against MKL can be forced by setting the environment variable MKL=1. The LIBXSMM main development branch from 7 April 2024 or newer is required.

The /gpu/cuda/* backends provide GPU performance strictly using CUDA.

The /gpu/hip/* backends provide GPU performance strictly using HIP. They are based on the /gpu/cuda/* backends. ROCm version 4.2 or newer is required.

The /gpu/sycl/* backends provide GPU performance strictly using SYCL. They are based on the /gpu/cuda/* and /gpu/hip/* backends.

The /gpu/*/magma/* backends rely upon the MAGMA package. To enable the MAGMA backends, the environment variable MAGMA_DIR must point to the top-level MAGMA directory, with the MAGMA library located in $(MAGMA_DIR)/lib/. By default, MAGMA_DIR is set to ../magma; to build the MAGMA backends with a MAGMA installation located elsewhere, create a link to magma/ in libCEED's parent directory, or set MAGMA_DIR to the proper location. MAGMA version 2.5.0 or newer is required. Currently, each MAGMA library installation is only built for either CUDA or HIP. The corresponding set of libCEED backends (/gpu/cuda/magma/* or /gpu/hip/magma/*) will automatically be built for the version of the MAGMA library found in MAGMA_DIR.

Users can specify a device for all CUDA, HIP, and MAGMA backends through adding :device_id=# after the resource name. For example:

  • /gpu/cuda/gen:device_id=1

The /*/occa backends rely upon the OCCA package to provide cross platform performance. To enable the OCCA backend, the environment variable OCCA_DIR must point to the top-level OCCA directory, with the OCCA library located in the ${OCCA_DIR}/lib (By default, OCCA_DIR is set to ../occa). OCCA version 1.4.0 or newer is required.

Users can pass specific OCCA device properties after setting the CEED resource. For example:

  • "/*/occa:mode='CUDA',device_id=0"

Bit-for-bit reproducibility is important in some applications. However, some libCEED backends use non-deterministic operations, such as atomicAdd for increased performance. The backends which are capable of generating reproducible results, with the proper compilation options, are highlighted in the list above.

Examples

libCEED comes with several examples of its usage, ranging from standalone C codes in the /examples/ceed directory to examples based on external packages, such as MFEM, PETSc, and Nek5000. Nek5000 v18.0 or greater is required.

To build the examples, set the MFEM_DIR, PETSC_DIR (and optionally PETSC_ARCH), and NEK5K_DIR variables and run:

$ cd examples/

% running-examples-inclusion-marker

# libCEED examples on CPU and GPU
$ cd ceed/
$ make
$ ./ex1-volume -ceed /cpu/self
$ ./ex1-volume -ceed /gpu/cuda
$ ./ex2-surface -ceed /cpu/self
$ ./ex2-surface -ceed /gpu/cuda
$ cd ..

# MFEM+libCEED examples on CPU and GPU
$ cd mfem/
$ make
$ ./bp1 -ceed /cpu/self -no-vis
$ ./bp3 -ceed /gpu/cuda -no-vis
$ cd ..

# Nek5000+libCEED examples on CPU and GPU
$ cd nek/
$ make
$ ./nek-examples.sh -e bp1 -ceed /cpu/self -b 3
$ ./nek-examples.sh -e bp3 -ceed /gpu/cuda -b 3
$ cd ..

# PETSc+libCEED examples on CPU and GPU
$ cd petsc/
$ make
$ ./bps -problem bp1 -ceed /cpu/self
$ ./bps -problem bp2 -ceed /gpu/cuda
$ ./bps -problem bp3 -ceed /cpu/self
$ ./bps -problem bp4 -ceed /gpu/cuda
$ ./bps -problem bp5 -ceed /cpu/self
$ ./bps -problem bp6 -ceed /gpu/cuda
$ cd ..

$ cd petsc/
$ make
$ ./bpsraw -problem bp1 -ceed /cpu/self
$ ./bpsraw -problem bp2 -ceed /gpu/cuda
$ ./bpsraw -problem bp3 -ceed /cpu/self
$ ./bpsraw -problem bp4 -ceed /gpu/cuda
$ ./bpsraw -problem bp5 -ceed /cpu/self
$ ./bpsraw -problem bp6 -ceed /gpu/cuda
$ cd ..

$ cd petsc/
$ make
$ ./bpssphere -problem bp1 -ceed /cpu/self
$ ./bpssphere -problem bp2 -ceed /gpu/cuda
$ ./bpssphere -problem bp3 -ceed /cpu/self
$ ./bpssphere -problem bp4 -ceed /gpu/cuda
$ ./bpssphere -problem bp5 -ceed /cpu/self
$ ./bpssphere -problem bp6 -ceed /gpu/cuda
$ cd ..

$ cd petsc/
$ make
$ ./area -problem cube -ceed /cpu/self -degree 3
$ ./area -problem cube -ceed /gpu/cuda -degree 3
$ ./area -problem sphere -ceed /cpu/self -degree 3 -dm_refine 2
$ ./area -problem sphere -ceed /gpu/cuda -degree 3 -dm_refine 2

$ cd fluids/
$ make
$ ./navierstokes -ceed /cpu/self -degree 1
$ ./navierstokes -ceed /gpu/cuda -degree 1
$ cd ..

$ cd solids/
$ make
$ ./elasticity -ceed /cpu/self -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
$ ./elasticity -ceed /gpu/cuda -mesh [.exo file] -degree 2 -E 1 -nu 0.3 -problem Linear -forcing mms
$ cd ..

For the last example shown, sample meshes to be used in place of [.exo file] can be found at https://github.com/jeremylt/ceedSampleMeshes

The above code assumes a GPU-capable machine with the CUDA backends enabled. Depending on the available backends, other CEED resource specifiers can be provided with the -ceed option. Other command line arguments can be found in examples/petsc.

% benchmarks-marker

Benchmarks

A sequence of benchmarks for all enabled backends can be run using:

$ make benchmarks

The results from the benchmarks are stored inside the benchmarks/ directory and they can be viewed using the commands (requires python with matplotlib):

$ cd benchmarks
$ python postprocess-plot.py petsc-bps-bp1-*-output.txt
$ python postprocess-plot.py petsc-bps-bp3-*-output.txt

Using the benchmarks target runs a comprehensive set of benchmarks which may take some time to run. Subsets of the benchmarks can be run using the scripts in the benchmarks folder.

For more details about the benchmarks, see the benchmarks/README.md file.

Install

To install libCEED, run:

$ make install prefix=/path/to/install/dir

or (e.g., if creating packages):

$ make install prefix=/usr DESTDIR=/packaging/path

To build and install in separate steps, run:

$ make for_install=1 prefix=/path/to/install/dir
$ make install prefix=/path/to/install/dir

The usual variables like CC and CFLAGS are used, and optimization flags for all languages can be set using the likes of OPT='-O3 -march=native'. Use STATIC=1 to build static libraries (libceed.a).

To install libCEED for Python, run:

$ pip install libceed

with the desired setuptools options, such as --user.

pkg-config

In addition to library and header, libCEED provides a pkg-config file that can be used to easily compile and link. For example, if $prefix is a standard location or you set the environment variable PKG_CONFIG_PATH:

$ cc `pkg-config --cflags --libs ceed` -o myapp myapp.c

will build myapp with libCEED. This can be used with the source or installed directories. Most build systems have support for pkg-config.

Contact

You can reach the libCEED team by emailing [email protected] or by leaving a comment in the issue tracker.

How to Cite

If you utilize libCEED please cite:

@article{libceed-joss-paper,
  author       = {Jed Brown and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean Sylvain Camier and Veselin Dobrev and Yohann Dudouit and Leila Ghaffari and Tzanio Kolev and David Medina and Will Pazner and Thilina Ratnayaka and Jeremy Thompson and Stan Tomov},
  title        = {{libCEED}: Fast algebra for high-order element-based discretizations},
  journal      = {Journal of Open Source Software},
  year         = {2021},
  publisher    = {The Open Journal},
  volume       = {6},
  number       = {63},
  pages        = {2945},
  doi          = {10.21105/joss.02945}
}

The archival copy of the libCEED user manual is maintained on Zenodo. To cite the user manual:

@misc{libceed-user-manual,
  author       = {Abdelfattah, Ahmad and
                  Barra, Valeria and
                  Beams, Natalie and
                  Brown, Jed and
                  Camier, Jean-Sylvain and
                  Dobrev, Veselin and
                  Dudouit, Yohann and
                  Ghaffari, Leila and
                  Grimberg, Sebastian and
                  Kolev, Tzanio and
                  Medina, David and
                  Pazner, Will and
                  Ratnayaka, Thilina and
                  Shakeri, Rezgar and
                  Thompson, Jeremy L and
                  Tomov, Stanimire and
                  Wright III, James},
  title        = {{libCEED} User Manual},
  month        = nov,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {0.12.0},
  doi          = {10.5281/zenodo.10062388}
}

For libCEED's Python interface please cite:

@InProceedings{libceed-paper-proc-scipy-2020,
  author    = {{V}aleria {B}arra and {J}ed {B}rown and {J}eremy {T}hompson and {Y}ohann {D}udouit},
  title     = {{H}igh-performance operator evaluations with ease of use: lib{C}{E}{E}{D}'s {P}ython interface},
  booktitle = {{P}roceedings of the 19th {P}ython in {S}cience {C}onference},
  pages     = {85 - 90},
  year      = {2020},
  editor    = {{M}eghann {A}garwal and {C}hris {C}alloway and {D}illon {N}iederhut and {D}avid {S}hupe},
  doi       = {10.25080/Majora-342d178e-00c}
}

The BibTeX entries for these references can be found in the doc/bib/references.bib file.

Copyright

The following copyright applies to each file in the CEED software suite, unless otherwise stated in the file:

Copyright (c) 2017-2024, Lawrence Livermore National Security, LLC and other CEED contributors. All rights reserved.

See files LICENSE and NOTICE for details.

libceed's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libceed's Issues

Create gallery of QFunctions (mass, Laplace, etc.)

Public interface something like

int CeedQFunctionCreateFromGallery(Ceed ceed, const char *spec, CeedQFunction *qf);

where spec is something like "Mass" or "Mass:dim=3,rho=10". (Many possible syntax choices for options.)

Doxygen Image

The image in the API documentation page is not showing up on Doxygen. It should be a easy fix.

Add debug-mode checks that every GetArray has a matching RestoreArray

CeedVectorGetArray and CeedVectorGetArrayRead are supposed to always be paired with CeedVectorRestoreArray and CeedVectorRestoreArrayRead respectively, but sometimes users and even backend developers such as myself accidentally forget. In debug mode, the interface should track access so that it can warn in case of non-exclusive write access or if CeedVectorDestroy is called while some caller has access. When available, this functionality could print a stack trace using backtrace from glibc or libbacktrace.

Add CeedVectorSyncArray

We need to add a new frontend function CeedVectorSyncArray to signal to GPU backends that CeedVector data needs to be pushed to the host.

Test OPApply 2x

I found a bug in my active-passive code for ref that was not covered in our tests. I think we should have a test that covers repeated application of the same operator to help future users who are building backends of their own.

Add Nek example

Add simple Nek5000 example for CEED's BP1 and maybe BP3 that use the library for the operator evaluation.

CeedOklPath Error On CU Summit

On CU's Summit, using Intel compilers, I get the following error when running the Fortran q-function and operator tests, for both ocl and omp.
+CEED-OCCA error @ /home/jeth8984/libceed/backends/occa/ceed-occa-okl.c:58 CeedOklPath_Occa

Strangely, there is no similar error on ex1 for ocl or omp.

Full text of the error is here: Error Output
Modules loaded is here: Modules List

This appears to be related to the issue discussed in PR #68

Proposed API improvements

After going through the new active/passive changes, it seems that some parts of the code expect an emode to be a single CeedEvalMode while others still accept multiple values anded together. When a user calls CeedQFunctionAddInput or CeedQFunctionAddOuput it seems reasonable to only expect a single emode to be provided. If so, I think the basis api can be cleaned up a bit.

Proposed changes:
Replace

int (*Apply)(CeedBasis, CeedTransposeMode, CeedEvalMode, const CeedScalar *,
CeedScalar *);

with function pointers for interp, grad, weight, div, curl, etc. Then, have CeedBasisApply be the single function to perform a switch on emode to call the appropriate function pointer. Currently there are a ton of switch statements scattered across the basis and operator files which makes the code fragile. If a new feature like a laplacian were to be added, an unnecessarily large number of files would need to be updated. I think it should reduce code duplication and allow for simplifications in backend operators

Add MAGMA continuous integration

Lack of MAGMA CI is significantly slowing other project activities and runs a risk of the MAGMA backend ceasing to function due to other refactoring. The current backend only supports GPUs, but the MAGMA library is capable of targeting CPU code, so it should be possible to make it run on Travis-CI. Alternatively, we could use this gitlab-ci infrastructure that is being developed for ECP:
https://www.onyxpoint.com/account-level-ci-access-management-with-gitlab-setuid-runners/

t17-basis-f.f Failing

Although the Travis build seems to be fine, I am getting a segfault when I run t17-basis-f.f. I believe the culprit is

call ceedbasiscreatetensorh1lagrange(ceed,dimn,2,p,q,
$ ceed_gauss,bug,err)

The 2 should be changed 1 to match the C version
CeedBasisCreateTensorH1Lagrange(ceed, dim, 1, P, Q, CEED_GAUSS, &bug);

OCCA With Ceed Ex1

When running ex1 using an OCCA backend, there is a segfault when creating the restriction identity when 'prob_size = 256*1024' or larger. When 'prob_size' is half as big, there is no issue.

Apparent race condition when creating OCCA kernels

To reproduce, break (^C) while a kernel is compiling:

$ rm -r ~/.occa/
$ CEED_DEBUG=1 ./bp1-petsc -ceed_resource /cpu/occa
Process decomposition: 1 1 1                                                                     
Local elements: 1000 = 10 10 10                                                                            
Owned dofs: 1331 = 11 11 11                                                   
[CeedInit] resource: /cpu/occa                                                                                                                                                                 
[CeedInit] deviceID: 0                                                                                            
[CeedInit] mode: mode: 'Serial'
Finding compiler vendor: g++ /home/jed/.occa/cache/b00dc65d91a54606/compilerVendorTest.cpp -o /home/jed/.occa/cache/b00dc65d91a54606/binary > /home/jed/.occa/cache/b00dc65d91a54606/build.log
2>&1                                                                                          
[CeedInit] returned deviceMode: Serial                                                        
[CeedElemRestriction][Create]                                                                                                                                                                  
[CeedElemRestriction][Create] Allocating                                                                                                                                                       
[CeedElemRestriction][Create] Building kRestrict                                                                                                                                               
[CeedElemRestriction][Create] filename=/home/jed/src/libceed/backends/occa/ceed-occa-restrict.okl                                                                                              
Compiling [kRestrict0]                                                                                                                                                                         
g++ -x c++ -fPIC -shared -g /home/jed/.occa/cache/da672fb4983a24d2/launchSource.cpp -o /home/jed/.occa/cache/da672fb4983a24d2/launch-binary -I/home/jed/src/occa/src/tools//../../include -L/ho
me/jed/src/occa/src/tools//../../lib -locca                                                                                                                                                    
                                                                                                                                                                                               
^C                                           

then try to run again

$ gdb -ex r --args ./bp1-petsc -ceed_resource /cpu/occa
No symbol table is loaded.  Use the "file" command.
Breakpoint 1 (PetscError) pending.
Reading symbols from ./bp1-petsc...done.
Starting program: /tmp/jed/petsc/bp1-petsc -ceed_resource /cpu/occa
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffeea8b700 (LWP 616)]
Process decomposition: 1 1 1
Local elements: 1000 = 10 10 10
Owned dofs: 1331 = 11 11 11

Thread 1 "bp1-petsc" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff1f43a8c in occa::serial::kernel::runFromArguments(int, occa::kernelArg const*) const () from /home/jed/src/occa/lib/libocca.so
#2  0x00007ffff1ee9af7 in occa::kernel::runFromArguments() const () from /home/jed/src/occa/lib/libocca.so
#3  0x00007ffff1ecbc99 in occaKernelRunN () from /home/jed/src/occa/lib/libocca.so
#4  0x00007ffff69e859a in CeedElemRestrictionApply_Occa (r=<optimized out>, tmode=CEED_NOTRANSPOSE, ncomp=3, lmode=CEED_NOTRANSPOSE, u=0x555555cc3120, v=0x555555bd3a60, request=0x7ffff6bf7448 <ceed_request_immediate>) at /home/jed/src/libceed/backends/occa/ceed-occa-restrict.c:60
#5  0x00007ffff69e6d5a in CeedOperatorApply_Occa (op=0x555555e5cd30, qdata=0x555555bd42a0, ustate=0x555555cc3120, residual=0x0, request=0x7ffff6bf7448 <ceed_request_immediate>) at /home/jed/src/libceed/backends/occa/ceed-occa-operator.c:79
#6  0x000055555555797c in main (argc=<optimized out>, argv=<optimized out>) at bp1-petsc.c:306

This as of a820fbc.

Development Plan

I have laid out some efforts that we are working on and some suggested smaller PRs that would get us there. My intention is to coordinate our efforts and foster smaller, less disruptive, faster PRs. I really would appreciate discussion on this.

Combined List of Current Thrusts

Thrust: Performant backends
PRs / Tasks:

  • Merge CUDA backend done
  • Document performance results to share with apps
  • Compare with libParanumal

Thrust: QFunction Gallery
PRs:

  • Add QFunction names
  • Add QFunction lookup from gallery

Thrust: Create libParanumal Backend
PRs:

  • Add non tensor bases to interface done
  • Add non tensor bases to GPU backends
  • Add QFunction names/QFunction gallery
  • Add GPU template backend that dispatches to OCCA done, can delegate to any backend
  • Add libParanumal backend

Thrust: Provide easy way to benchmark performance
PRs / Tasks:

  • Make sure PETSc BPs are performant on CPUs done
  • Make sure PETSc BPs are performant on GPUs
  • Import running and plotting scripts from CEED/benchmarks repo done

Thrust: First wave of libCEED app integration
PRs / Tasks:

  • Add libCEED support in MFEM
  • Add libCEED support in Nek5000
  • Develop NS/more sophisticated apps than the BP based on libCEED

Thrust: More Compelling Examples

  • Add Navier-Stokes example in PETSc done
  • Update example to handle more complex boundary conditions
  • Update example to handle more complex meshes with DMPLEX
  • Add face integrals to interface
  • Add face integrals to backends
  • Update example to handle even more complex boundary conditions

Thrust: Single Source QFunctions
PRs:

  • Add common CUDA/OCCA/CPU QFunction file

Thrust: Strengthen Test Suite
PRs:

  • Add whitelist/blacklist done
  • Add Nek CI done
  • Add MAGMA CI
  • Add GPU CI
  • Add OSX CI done
  • Add GPU versions of tests?

Thrust: Support Complex Topology
PRs:

  • Add non tensor bases to interface done
  • Add non tensor bases to more backends
  • Add non conforming mesh restrictions
  • Add composite operators to interface done

Thrust: Documentation Improvement
PRs:

  • Document BP examples with mathematical formulation of problem done

Thrust: API Consistency
PRs:

  • Update for variable consistency across functions/objects done

@tzanio @jedbrown @v-dobrev @camierjs @stomov @Steven-Roberts @dmed256 @thilinarmtb

API Documentation

The documentation in /doc needs to be updated with active/passive sample code.

Additional backends

  • Improve OCCA backend
  • Add MFEM backend — how to support backends that don’t support JIT and don’t run on the host?
  • Add MAGMA backend?
  • Add OpenMP 4.5 backend?
  • Add pure CUDA backend?
  • Add HIP backend?

Choice of tiling and striding in OKL for kRestrict1

This code partitions the element list with one element per thread. This means that the access to vv will have strided access with stride: ncomp*elemsize. Ideally, the thread-block of TILE_SIZE threads should access the data in contiguous chunks of size TILE_SIZE. The code as is will potentially suffer significant cache misses. Does this code achieve a significant fraction of peak dram_read/write_throughput according to nvprof ?

@kernel void kRestrict1(const int ncomp,
                        const int *indices,
                        const double* uu,
                        double* vv) {
  for (int e = 0; e < nelem; e++; @tile(TILE_SIZE,@outer,@inner)){
    for (int d = 0; d < ncomp; d++){
      for (int i=0; i<elemsize; i++) {
        vv[i+elemsize*(d+ncomp*e)] =
          uu[indices[i+elemsize*e]+ndof*d];
      }
    }
  }
}

Fortran Tests with ifort

CU's Summit won't compile the newest Fortran basis tests (non-tensor basis) with the following error

/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5149: Illegal character in statement label field  [i]
include 't310-basis-f.h'
^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5149: Illegal character in statement label field  [n]
include 't310-basis-f.h'
-^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5149: Illegal character in statement label field  [c]
include 't310-basis-f.h'
--^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5149: Illegal character in statement label field  [l]
include 't310-basis-f.h'
---^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5149: Illegal character in statement label field  [u]
include 't310-basis-f.h'
----^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5118: First statement in file must not be continued
include 't310-basis-f.h'
-----^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5144: Invalid character_kind_parameter. No underscore
include 't310-basis-f.h'
------------------------^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #5082: Syntax error, found CHARACTER_CONSTANT 't310-basis-f.h' when expecting one of: => = . [ % ( :
include 't310-basis-f.h'
--------^
/home/jeth8984/libceed/tests/t310-basis-f.f(5): error #6236: A specification statement cannot appear in the executable section.
include 't310-basis-f.h'
--------^

It is running intel/17.4.

Fortran interface

Support F77 and/or F90 interface? (PETSc recently switched to the latter)

CeedVectorSetValue

Add utility function 'CeedVectorSetValue(CeedVector vec, CeedScalar val)`. This will clean up zeroing out vectors.

Variable Use Consistency

The variable use in the frontend needs to be reviewed and made consistent across functions. For example, we use ndof in conflicting ways. I want to do a PR for this after the current wave of backend PRs wraps up.

t17-basis-f 3D Failing

All of the values in sum1 are slightly off in the 3D case, leading to an error of ~200.

Making QFunction Context Const

The context for a qfunction callback is documented as an in/out parameter.

libCEED/ceed-qfunction.c

Lines 46 to 47 in 535acf4

1. [void *ctx][in/out] - user data, this is the 'ctx' pointer stored in
the CeedQFunction, set by calling CeedQFunctionSetContext

For the cuda and occa backends, the goal is to have the qfunction being applied in parallel. If the context is modified in a qfunction, however, this could lead to race conditions and other potential issues. Is there any reason that the context cannot be declared constant?

[fortran] truncated __FILE__ line

We have an "Unterminated character constant" error if the __FILE__ size is greater than ~70 characters in the qfunction fortran files.
Is there a portable "line-truncation" option for the compiler we use?

OCCA backend

Consider adding an OCCA backend, in particular with GPU support

CEED_EVAL_WEIGHT in t20-qfunction.c

t20-qfunction.c passes CEED_EVAL_WEIGHT in the setup CeedQFunctionCreateInterior, but uses u[0][i] then in the function. Should I switch for the _INTERP field instead?

t10-output -0.0 vs 0.0

t10-output produces +/-0.0 that are diff'ed with output/t10-basis.out which produces errors.
Do you want me to add a rounding procedure?

Add MFEM example

Add simple MFEM example for CEED's BP1 and maybe BP3 that use the library for the operator evaluation.

CeedOperator constructor

In the CeedOperator constructor:

libCEED/ceed.h

Lines 140 to 142 in 4c2750d

CEED_EXTERN int CeedOperatorCreate(Ceed ceed, CeedElemRestriction r,
CeedBasis b, CeedQFunction qf, CeedQFunction dqf, CeedQFunction dqfT,
CeedOperator *op);

I don't think the parameters dqf and dqfT are needed, since we have the CeedEvalMode parameters in the CeedQFunction.

/ocl/occa without GPU

Currently, if OpenCL libraries are available, OCCA is built with support and /ocl/occa attempts to run on a GPU that may not exist, thus causing tests to fail/crash. OpenCL can be used to target other devices, including a CPU. I think we need a better naming scheme and the implementation needs to handle lack of a GPU gracefully.

Create template backend that dispatches to `ref`

This would simplify creating new backends. The idea would be that every backend function would delegate through to a ref Ceed that it holds. Then a backend author could override only the functions that they find interesting, without duplicating ref implementation code that would then need to be maintained.

make install targets

Install to prefix, recognize DESTDIR, install ceed.h header plus library. Also install Fortran components if applicable (cf. #19).

Setting up Travis CI with OCCA staging branch

Regarding libocca/occa#198, would you guys be open to adding an extra Travis CI job that pulls OCCA from the staging branch?

Ideally all commits in OCCA should be going from either

  • stagingmaster
  • branchstagingmaster

allowing us to hopefully catch issues before I merge things to master. If yes, I'll setup a PR for this

Add PETSc example

Add simple PETSc example for CEED's BP1 and maybe BP3 that use the library for the operator evaluation.

Proposed improved API?

Below is a mockup of some proposed "improvements" in the current API. (This is all up for debate of course.)

Some of the rationale for these changes:

  • The Create + Set + Complete workflow is self-documenting and flexible if we decide to introduce additional options in the future (e.g. a rectangular operator).
  • CeedBasis separates dof and qpt descriptions, allows Lagrange elements e.g. in Chebyshev Lobatto points
  • The G and B operators are not independent, so bundling them in a space makes sense
  • CeedOperator is an abstraction for just operator action (as opposed to action and assembly). The D matrix (qdata) is stored internally in it.

@jedbrown, what do you think? Are you open to any of these changes?

{
   // FE Basis = Operator B, B^t for a set of Eval modes
   CeedBasis basis;
   CeedBasisCreate(ceed, &basis);
   CeedBasisSetElement(basis, CEED_HEX); // implies 3D
   // Nodal scalar Lagrange basis, H1 or L2
   CeedBasisSetType(basis, CEED_LAGRANGE_BASIS, 3, CEED_GAUSS_LOBATTO);
   CeedBasisSetQuadrature(basis, 7, CEED_GAUSS); // order 7 = 4 points in 1D
   ierr = CeedBasisComplete(basis); // constructs B1d, etc., calls backend

   // Element restriction = Operator G, G^t
   CeedRestriction restriction;
   CeedRestrictionCreate(ceed, &restriction);
   CeedRestrictionSetElemSize(restriction, 64);
   CeedRestrictionSetNumElems(restriction, 8);
   CeedRestrictionSetNumDofs(restriction, 343);
   CeedInt *element_indices;
   // Set element_indices from mesh topology
   CeedRestrictionSetIndexType(restriction, element_indices,
                               CEED_MEM_HOST, CEED_USE_POINTER); // calls backend
   ierr = CeedRestrictionComplete(restriction); // error handling

   // Isoparametric case, otherwise define mesh-specific restriction & basis
   CeedSpace mesh_space;
   CeedSpaceCreate(ceed, &mesh_space);
   CeedSpaceSetRestriction(mesh_space, &restriction);
   CeedSpaceSetBasis(mesh_space, &basis);
   CeedSpaceSetNumComponents(mesh_space, 3);
   ierr = CeedSpaceComplete(mesh_space); // can call backend
   CeedVector mesh_nodes;
   // Set mesh nodes...

   // FE Space = Operators G, B, ...
   CeedSpace space;
   CeedSpaceCreate(ceed, &space);
   CeedSpaceSetRestriction(space, &restriction);
   CeedSpaceSetBasis(space, &basis);
   ierr = CeedSpaceComplete(space); // can call backend

   // FE Operator = Combines the mesh, the space, and kernels for assembly and
   // action of Operator D
   CeedOperator poisson;
   CeedOperatorCreate(ceed, &poisson);
   CeedOperatorSetMesh(poisson, mesh_space, mesh_nodes);
   CeedOperatorSetSpace(poisson, space);

   // Default kernels (set of Q-functions provided by libCEED)
   CeedOperatorSetKernels(poisson, "poisson");

   // Alternative: Custom kernels based on Q-functions
   CeedQFunction qf_poisson3d, qf_buildcoeffs;
   CeedQFunctionCreateInterior(ceed, 8, 1, 10*sizeof(CeedScalar), CEED_EVAL_GRAD,
                               CEED_EVAL_GRAD, f_poisson3d, "ex1.c:f_poisson3d",
                               &qf_poisson3d);
   CeedQFunctionCreateInterior(ceed, 1, 3, 10*sizeof(CeedScalar),
                               CEED_EVAL_INTERP | CEED_EVAL_GRAD, CEED_EVAL_NONE,
                               f_buildcoeffs,
                               "ex1.c:f_buildcoeffs", &qf_buildcoeffs);
   CeedOperatorSetAssemblyKernel(poisson, qf_my_poisson3d_assemble);
   CeedOperatorSetActionKernel(poisson, qf_my_poisson3d_apply);

   ierr = CeedOperatorComplete(poisson); // can call backend

   ierr = CeedOperatorSetup(poisson, CEED_REQUEST_IMMEDIATE); // calls backend

   ierr = CeedOperatorApply(poisson, u, r, CEED_REQUEST_IMMEDIATE); // calls backend
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.