Linux/OpenMP for Nautilus/RTK and Nautilus/CCK
To construct Linux/OpenMP, with the same flags as Nautilus/RTK and Nautilus/CCK:
git checkout rtk-cck-flags
make clean;make suite
Now to run them for testing, go to the directory ./bin
; the executables are located there.
Linux/OpenMP for Nautilus/PIK
To construct Linux/OpenMP, with the same flags as Nautilus/PIK:
git checkout pik-flags
make clean;make suite
Now to run them for testing, still go to the directory ./bin
; the executables are located there.
/*********************************************************************** *
-
NAS Parallel Benchmarks 3.0
-
Unofficial OpenMP C Version
- Copyright 2014 University of Versailles Saint Quentin en Yvlines
- Copyright 2000 Omni OpenMP Compiler Project
- Copyright 1991-2014 NASA Advanced Supercomputing Division
-
November 08, 2014
***********************************************************************/
- Introduction
This package contains an unofficial C version of the NAS Parallel Benchmarks OpenMP 3.0. The benchmarks are derived from the Omni OpenMP Compiler Project 2.3 unofficial C version of NPB 2.3.
The benchmarks were modified to match the new official 3.0 Fortran NPB. benchmarks. In particular, benchmarks in this release follow the same parallel region structure than the official 3.0 Fortran NPB.
- Change Log
This section tracks all the modifications that were made to update the Omni OpenMP 2.3 unofficial C version to the 3.0 version.
Each modification includes an annotation of the form XXX:YYY, where XXX is the line number in the 2.3 version and YYY is the line number in the 3.0 version.
BT
- Transforms the initialization process into distinct parallel regions:
129:127 - initialize
131:128 - lhsinit 133:129 - exact_rhs - Splits adi function into separated parallel regions: 211:206 - compute_rhs 213:209 - x_solve 215:212 - y_solve 217:215 - z_solve 219:218 - add
- TODO - Initialization part different
CG
- Updates sparse in makea 761:756 - add preload data pages parallel region loop
- Splits the first main parallel region 187:195 - initialization parallel region in main are reduced and single regions are replaced by serial parts 356:347 - conj grad (see below) 209:219 - three loops parallelization
- Decompose Conj grad function into parallel regions 402:395 - initialize conj grad algorithm 413:405 - conj grad iteration loop 551:551 - compute residual norm explicitly
- Split the second main parallel region 261:260 - conj grad 275:271 - post conj grad two loops parallelization
- 78:78 - Remove temporarry array w
- 375:367 - Remove static variables in conj grad since they are initialized at each iteration
- 500:494 - Add barrier after reduction because LLVM OpenMP does not support implicit synchronization after a reduction
- 504:498 - Remove single because all the variables are private due to parallel regions updates
EP
- Parallelizes x-array initialization 109:110 - main
FT
- Splits the first main parallel region
123:123 - compute_indexmap 131:130 - fft (see below) - Splits the second main parallel region
176:168 - evolve 164:158 - fft 187:177 - fft 845:849 - checksum - Decompose fft into parallel regions 514:501 - cffts1 562:556 - cffts2 607:606 - cffts3
- define y0 and y1 in cffts[123] on the stack because there is no pointer in fortran
- TODO - need to insert init_ui to touch the initial data
IS
- Already in C
LU
- Splits into parallel regions boundaries and initialization of dependent variables and also forcing term computing 125:123 - setbv 130:128 - setiv 135:133 - erhs
- Produces setparte parallel regions for ssor 3073:3092 - l2norm 3068:3087 - rhs 3064:3083 - SSOR initialization 3094:3112 - SSOR iteration
- TODO - Parallelize post computation part
- error
- pintgr
MG
- Turns omp parallel region into omp parallel loop for 1239:1211 - zero3
- Split first big parallel region
238:233 - norm2u3 257:249 - mg3P 258:243 - resid - Splits the main iteration big parallel region 273:262- resid 274:263 - norm2u3 277:266 - mg3P
- Transforms the omp parallel region into omp parallel loops from main and mg3P
846:826 - norm2u3
527:516 - resid 608:595 - rprj3 1245:1217 - zero3
463:454 - psinv 684:669 - interep (produces two parallel regions) - 820:835 - Remove static because of algorithmic changes
- 836:862 - Algorithmic changes
SP
- Splits adi function into separated parallel regions: 205:204 - compute_rhs 208:207 - txinvr 211:210 - x_solve and ninvr 214:213 - y_solve and tzetar 216:216 - z_solve and pinvr 220:219 - add
- TODO - Initialize has to be updated
- 654:659 parallelize the function
- TODO - Post computation verify has to be parallelized
221:226 - error_norm
- Installation
THe package should contain the following files/directories:
README - this file README.omc - Readme of the Omni OpenMP Compiler release LOG.omc - Change log of the Omni OpenMP Compiler release Makefile - makefile for the suite (not modified from NPB2.3-omp) Doc - documentations (not modified from NPB2.3-serial) BT, CG, EP, FT, IS, LU, MG, SP - directory for each program bin - directory to put executable files common - common routines (only change version display from NPB2.3-omp) config - configuration files used by 'make' (not modified from NPB2.3-omp) sys - utilities (only change version display from NPB2.3-omp)
To use the suite, edit file 'make.def' in directory 'config'.
You must specify the name of compiler and linker, and compiler options.
For more details, refer to file "README.install" in subdirectory "Doc" and to "README.omc".
- Information
-
Original benchmark suite
Information on the NAS Parallel Benchmarks is available at http://www.nas.nasa.gov/NAS/NPB/.
-
C translation
Information on the OpenMP C versions and the Omni OpenMP compiler is available at http://pdplab.trc.rwcp.or.jp/pdperf/Omni/.
-
3.0 NAS update
Information on the NAS Parallel Benchmarks translation from 2.3 to 3.0 is available at http://benchmark-subsetting.github.io/cNPB/ Note that NAS does not support the OpenMP C versions nor the Omni OpenMP compiler supports the 3.0 structured NAS. License informations about the NAS NPB benchmarks can be found at http://opensource.org/licenses/nasa1.3.php