Git Product home page Git Product logo

sw4lite's Introduction

Sw4lite

Sw4lite is a bare bone version of SW4 (Github) intended for testing performance optimizations in a few important numerical kernels of SW4.

To build

The Makefiles are suited for our systems at LLNL and LBNL; you will have to modify them to suit your system.

Type:

make

to build the code with OpenMP. The executable will be named optimize_mp_hostname/sw4lite.

A debug version with OpenMP can be built by:

make debug=yes

which will be located at debug_mp_hostname/sw4lite.

To build with only C code (no Fortran) and with OpenMP, type:

make ckernel=yes

The executable will be optimize_mp_c_hostname/sw4lite.

To build without OpenMP type:

make openmp=no

The executable will be optimize_hostname/sw4lite.

The Cuda version is built by:

make -f Makefile.cuda

and the executable will be under optimize_cuda_hostname/sw4lite.

More options are described in the Makefile.

Experimental cmake build is available for cuda build:

mkdir build;
cd build;
cmake ..; # optionally add -DCMAKE_PREFIX_PATH=$PWD/../../lapack_build/ if lapack is not found by default.
make;

To run

To run sw4lite with OpenMP threading, you need to assign the number of threads per MPI-task by setting the environment variable OMP_NUM_THREADS, e.g.,

setenv OMP_NUM_THREADS 4

An example input file is provided under tests/pointsource/pointsource.in. This case solves the elastic wave equation for a single point source in a whole space or a half space. The input file is given as argument to the executable, as in the example:

mpirun -np 16 sw4lite pointsource.in

Output from a run is provided at tests/pointsource/pointsource.out. For this point source example, the analytical solution is known. The error is printed at the end:

Errors at time 0.6 Linf = 0.569416 L2 = 0.0245361 norm of solution = 3.7439

When modifying the code, it is important to verify that these numbers have not changed.

Some timings are also output. The average execution times (in seconds) over all MPI processes are reported as follows:

  1. Total execution time for the time stepping loop,
  2. Communication between MPI-tasks (BC comm)
  3. Imposing boundary conditions (BC phys),
  4. Evaluating the difference scheme for divergence of the stress tensor (Scheme),
  5. Evaluating supergrid damping terms (Supergrid), and
  6. Evaluating the forcing functions (Forcing)

The code under tests/testil is a stand alone single-core program that only exercises the computational kernel (Scheme).

sw4lite's People

Contributors

andersp avatar bjorn2 avatar peihunglin avatar rwvo avatar spakin avatar tjesser-ucdavis-edu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sw4lite's Issues

Lack of Explicit Fortran Module Dependencies Causes Race Condition

While making an initial spackage (spack/spack#5917) I ran into some issues while building the fortran versions in that files would try to be compiled before type_defs.f90 had been used to generate a .mod file. It happened maybe one in every ten times (and I have not been able to reproduce it in just a terminal with make -j), but I disabled the fortran version and only built the ckernels as a result.

I am not an expert on fortran but a quick googling makes it seem like this may be a common issue due to Make not having knowledge of these dependencies.

ERROR: developer option corder, must be zero when fortran routines are used

Hi,
I am new to this proxy app.
I built it according to the readme and tried to run it with:

mpirun -n 4 ./sw4lite ../tests/pointsource/pointsource.in

It would say:

ERROR: developer option corder, must be zero when fortran routines are use

I modified the content of tests/pointsource/pointsource.in to make corder=0 and it will give the same output as the readme:

...
Errors at time 0.6 Linf = 0.569416 L2 = 0.0245361 norm of solution = 3.7439

Am I doing this right? What is corder?

Thanks in advance.

Regards,
Chen

Suspicious array indices in device-routines.C

When compiling sw4lite with a clang-based compiler for AMD GPUs, I'm getting warnings for device-routines.C, lines 1841 and 1843, e.g.:

src/device-routines.C:1841:26: warning: array index 3 is past the end of the array (which contains 3 elements)
      [-Warray-bounds]
            (qum[c][4]-2*qum[c][3]+sum(c,ith,jth,k  ))
                         ^      ~
src/device-routines.C:1744:3: note: array 'qum' declared here
  float_sw4 qu[DIAMETER][3], qum[DIAMETER][3];                                                                            ^                      
                                                                        

Since DIAMETER has the value 5, and in the loop, c is at most 3, the array accesses are not really out of bounds, but I can't tell whether this is an actual mistake, or if something clever is going on. Could you please double-check, and perhaps add a comment if the code is in fact correct?

The relevant lines were added by Anders Petersson in Oct last year:

[rvanoo@snell:~/repos/sw4lite] $ git blame -L 1840,+5 src/device-routines.C                                            
^dfbecf0 (Anders Petersson 2017-10-19 16:07:10 -0700 1840)             -rho(i,j,k+1)*dcz(k+1)*                         
^dfbecf0 (Anders Petersson 2017-10-19 16:07:10 -0700 1841)             (qum[c][4]-2*qum[c][3]+sum(c,ith,jth,k  ))      
^dfbecf0 (Anders Petersson 2017-10-19 16:07:10 -0700 1842)             +2*rho(i,j,k)*dcz(k)  *                         
^dfbecf0 (Anders Petersson 2017-10-19 16:07:10 -0700 1843)             (qum[c][3]-2*sum(c,ith,jth,k  )+qum[c][1])      
^dfbecf0 (Anders Petersson 2017-10-19 16:07:10 -0700 1844)             -rho(i,j,k-1)*dcz(k-1)*                         

[Edit] I just noticed that the commit mentioned by 'git blame' above was the initial commit.

Incorrect output for CUDA version?

When building the latest CUDA version, and running it with the "pointsource.in" example input, as described in README.md, the expected output is

Errors at time 0.6 Linf = 0.569416 L2 = 0.0245361 norm of solution = 3.7439

Instead, I'm getting

Errors at time 0.6 Linf = 3.7439 L2 = 1.01524 norm of solution = 3.7439

(I.e., Linf and L2 are different, norm of solution is identical).

I interpret the statement in README.md "When modifying the code, it is important to verify that these numbers have not changed" to say that all three values should match, and they don't. Does this indicate an error in the code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.