Git Product home page Git Product logo

io500's Introduction

io500

This is the C version of the IO500 benchmark.

Preparation

The program interfaces directly with IOR/MDtest and pfind. To retrieve the required packages and compile the library version, run

$ ./prepare.sh

Then you can compile the io500 application running make.

Usage

The benchmark requires a .ini file containing the options. The .ini file is structured in sections depending on the phase.

Detailed help for the available options is provided when running:

$ ./io500 -h
Synopsis: ./io500 <INI file> [-v=<verbosity level>] [--dry-run]

The benchmark output the commands it would run (equivalence of command line invocations of ior/mdtest). Use --dry-run to not invoke any command.

In order to create a new INI file with all the options, you can execute:

$ ./io500 --list > config-all.ini

The config-some illustrates the setting of various options. For more details, run ./io500 -h.

To see the currently active options, run:

$ ./io500 <file.ini> -h

Integrity check

After a run is completed, the score obtained and the configuration file can be verified to ensure that it wasn't accidentially modified.

You can either use the full-featured io500 application:

$ ./io500 config-test-run.ini  --verify result.txt
config-hash = 1065C0D
score-hash  = C97CC873
[OK] But this is an invalid run!

Or the lightweight verification tool which has fewer dependencies:

$ ./io500-verify config-test-run.ini result.txt

Output

  • The benchmark will output a default set of information in the INI format to simplify parsing. When setting verbosity to 5, you will receive more information.
  • It also stores the output files from IOR and MDTest in the results/ subdirectory with the timestamp of the run.

Example output on the command line

The following is the minimal output when setting verbosity to 0

$ mpiexec -np 2   ./io500 config-minimal.ini
[RESULT]       ior-easy-write        0.186620 GiB/s  : time 0.027 seconds
[RESULT]    mdtest-easy-write      103.300821 kIOPS : time 1.121 seconds
[RESULT]       ior-hard-write        0.001313 GiB/s  : time 0.067 seconds
[RESULT]    mdtest-hard-write       58.939081 kIOPS : time 1.021 seconds
[RESULT]                 find     1486.435084 kIOPS : time 0.118 seconds
[RESULT]        ior-easy-read        1.575557 GiB/s  : time 0.005 seconds
[RESULT]     mdtest-easy-stat      839.392805 kIOPS : time 0.138 seconds
[RESULT]        ior-hard-read        2.272671 GiB/s  : time 0.000 seconds
[RESULT]     mdtest-hard-stat     1212.558124 kIOPS : time 0.050 seconds
[RESULT]   mdtest-easy-delete      160.765642 kIOPS : time 0.753 seconds
[RESULT]     mdtest-hard-read      275.011939 kIOPS : time 0.219 seconds
[RESULT]   mdtest-hard-delete      132.015851 kIOPS : time 0.474 seconds
[SCORE INVALID] Bandwidth 0.172092 GB/s : IOPS 292.625029 kiops : TOTAL 7.096374

This information is also saved in the file result_summary.txt in the respective results directory.

In the same directory, you will also find the result.txt file that contains more information and is stored using the INI file format.

version         = SC20-testing
config-hash     = 25C33C96
result-dir      = ./results/
; START 2020-01-06 10:23:49
; ERROR INVALID stonewall-time != 300


[ior-easy-write]
t_start         = 2020-01-06 10:23:49
exe             = ./ior -C -Q 1 -g -G 271 -k -e -o ./out//ior-easy/ior_file_easy -O stoneWallingStatusFile=./out//ior-easy/stonewall -O stoneWallingWearOut=1 -t 2m -b 2m -F -w -D 1 -a POSIX
; ERROR INVALID Write phase needed 0.020932s instead of stonewall 1s. Stonewall was hit at 0.0s
throughput-stonewall = 0.37
score           = 0.186620
; ERROR INVALID Runtime of phase (0.027088) is below stonewall time. This shouldn't happen!
t_delta         = 0.0271
t_end           = 2020-01-06 10:23:49

[mdtest-easy-write]
t_start         = 2020-01-06 10:23:49
exe             = ./mdtest -n 1000000 -u -L -F -N 1 -d ./out//mdtest-easy -x ./out//mdtest-easy-stonewall -C -W 1 -a POSIX
rate-stonewall  = 109.492799
score           = 103.300821
t_delta         = 1.1207
t_end           = 2020-01-06 10:23:50

[timestamp]
t_start         = 2020-01-06 10:23:50
t_delta         = 0.0000
t_end           = 2020-01-06 10:23:50

[ior-hard-write]
t_start         = 2020-01-06 10:23:50
exe             = ./ior -C -Q 1 -g -G 27 -k -e -o ./out//ior-hard/file -O stoneWallingStatusFile=./out//ior-hard/stonewall -O stoneWallingWearOut=1 -t 47008 -b 47008 -s 1 -w -D 1 -a POSIX
; ERROR INVALID Write phase needed 0.066709s instead of stonewall 1s. Stonewall was hit at 0.0s
throughput-stonewall = 0.00
score           = 0.001313
; ERROR INVALID Runtime of phase (0.067146) is below stonewall time. This shouldn't happen!
t_delta         = 0.0671
t_end           = 2020-01-06 10:23:50

[mdtest-hard-write]
t_start         = 2020-01-06 10:23:50
exe             = ./mdtest -n 1000000 -t -w 3901 -e 3901 -N 1 -F -d ./out//mdtest-hard -x ./out//mdtest-hard-stonewall -C -W 1 -a POSIX
rate-stonewall  = 59.263301
score           = 58.939081
t_delta         = 1.0207
t_end           = 2020-01-06 10:23:51

[find]
t_start         = 2020-01-06 10:23:51
exe             = ./pfind ./out/ -newer ./results//timestampfile -size 3901c -name *01* -C -H 1 -q 10000
found           = 1596
total-files     = 175761
score           = 1486.435084
t_delta         = 0.1184
t_end           = 2020-01-06 10:23:51

[ior-easy-read]
t_start         = 2020-01-06 10:23:51
exe             = ./ior -C -Q 1 -g -G 271 -k -e -o ./out//ior-easy/ior_file_easy -O stoneWallingStatusFile=./out//ior-easy/stonewall -O stoneWallingWearOut=1 -t 2m -b 2m -F -r -R -a POSIX
score           = 1.575557
t_delta         = 0.0054
t_end           = 2020-01-06 10:23:51

[mdtest-easy-stat]
t_start         = 2020-01-06 10:23:51
exe             = ./mdtest -n 1000000 -u -L -F -N 1 -d ./out//mdtest-easy -x ./out//mdtest-easy-stonewall -T -a POSIX
score           = 839.392805
t_delta         = 0.1381
t_end           = 2020-01-06 10:23:51

[ior-hard-read]
t_start         = 2020-01-06 10:23:51
exe             = ./ior -C -Q 1 -g -G 27 -k -e -o ./out//ior-hard/file -O stoneWallingStatusFile=./out//ior-hard/stonewall -O stoneWallingWearOut=1 -t 47008 -b 47008 -s 1 -r -R -a POSIX
score           = 2.272671
t_delta         = 0.0004
t_end           = 2020-01-06 10:23:51

[mdtest-hard-stat]
t_start         = 2020-01-06 10:23:51
exe             = ./mdtest -n 1000000 -t -w 3901 -e 3901 -N 1 -F -d ./out//mdtest-hard -x ./out//mdtest-hard-stonewall -T -a POSIX
score           = 1212.558124
t_delta         = 0.0499
t_end           = 2020-01-06 10:23:51

[mdtest-easy-delete]
t_start         = 2020-01-06 10:23:51
exe             = ./mdtest -n 1000000 -u -L -F -N 1 -d ./out//mdtest-easy -x ./out//mdtest-easy-stonewall -r -a POSIX
score           = 160.765642
t_delta         = 0.7534
t_end           = 2020-01-06 10:23:52

[mdtest-hard-read]
t_start         = 2020-01-06 10:23:52
exe             = ./mdtest -n 1000000 -t -w 3901 -e 3901 -N 1 -F -d ./out//mdtest-hard -x ./out//mdtest-hard-stonewall -E -X -a POSIX
score           = 275.011939
t_delta         = 0.2191
t_end           = 2020-01-06 10:23:52

[mdtest-hard-delete]
t_start         = 2020-01-06 10:23:52
exe             = ./mdtest -n 1000000 -t -w 3901 -e 3901 -N 1 -F -d ./out//mdtest-hard -x ./out//mdtest-hard-stonewall -r -a POSIX
score           = 132.015851
t_delta         = 0.4742
t_end           = 2020-01-06 10:23:53

[SCORE]
MD              = 292.625029
BW              = 0.172092
SCORE           = 7.096374  [INVALID]
hash            = 884F29B
; END 2020-01-06 10:23:53

io500's People

Contributors

adilger avatar blastmaster avatar carlilek avatar frankgad avatar gflofst avatar gmarkomanolis avatar johnbent avatar juliankunkel avatar jyvet avatar mchaarawi avatar sihara avatar wangrunji0408 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

io500's Issues

do not unlink ior-rnd file if the phase was not run

we can probably just add a check here:

diff --git a/src/phase_ior_rnd_write.c b/src/phase_ior_rnd_write.c
index 0ca1e63..5772a60 100644
--- a/src/phase_ior_rnd_write.c
+++ b/src/phase_ior_rnd_write.c
@@ -30,7 +30,8 @@ static void cleanup(void){
     unlink(filename);
   }
   if(opt.rank == 0){
-    u_purge_file("ior-rnd/file");
+    if (opt.mode != IO500_MODE_STANDARD)
+      u_purge_file("ior-rnd/file");
     u_purge_datadir("ior-rnd");
   }
 }

mdworkbench fails when dataPacketType=random is specified

Example:

[mdworkbench-bench]
t_start         = 2021-10-22 08:39:00
filesPerProc    = 359
precreatePerSet = 359
exe             = ./md-workbench --dataPacketType=random --process-reports -a POSIX -o=./datadir/2021.10.22-07.34.40/mdworkbench -t=0.000000 -O=1 --run-info-file=./results/2021.10.22-07.34.40/mdworkbench.status -D=10 -G=-676760310 -P=359 -I=359 -2 -R=2 -X
result-file     = ./results/2021.10.22-07.34.40/mdworkbench-bench.txt
; ERROR INVALID Errors (2584800) occured during the md-workbench phase. This invalidates your run.
maxOpTime       = 0.535336
scoreIteration0 = 0.000000
maxOpTime0      = 0.532502
; ERROR INVALID Resulting score shouldn't be 0.0
score           = 0.000000
t_delta         = 228.7599
t_end           = 2021-10-22 08:42:49

Without dataPacketType being set, it works just fine. Storage system is Vast Data 3.6sp7 in this case.

Verify score hash does not match generated one

When running the verify phase over the generated results.txt file, I found that the score hash is always failing with the isc21 tag.
After some digging I found the issue to be that the hash generation in the verifier program adds more items to calculate the hash vs what the original program does to compute the hash when the benchmark is run.
Specifically those three items are added by the verifier but were not used when computing the score hash in the benchmark:
timestamp score
BW score
MD score

Random 1MB Test

According to the survey, we add for testing the 1 MB random test as well to the "extended" mode benchmark.

Aborted (core dumped) io500: aiori-POSIX.c:769: POSIX_Xfer: Assertion `rc >= 0' failed.

Main branch Aborted (core dumped) on aarch64(arm64), OS: openEuler 22.03 LTS SP3
Build and run io500 with config-minimal.ini (no change)

[jenkins@lustre-tzifycl3-01 io500]$ ./prepare.sh && make
[jenkins@lustre-tzifycl3-01 io500]$ ./io500 config-minimal.ini                                                                                                                                 
IO500 version io500-sc23_v1 (standard)                                                                                                                                                         
[RESULT]       ior-easy-write        0.113594 GiB/s : time 310.403 seconds                                                                                                                     
ERROR INVALID (src/main.c:437) Runtime of phase (101.183177) is below stonewall time. This shouldn't happen!                                                              
ERROR INVALID (src/main.c:443) Runtime is smaller than expected minimum runtime                                                                                                                
[RESULT]    mdtest-easy-write       10.052514 kIOPS : time 101.183 seconds [INVALID]                                                                                                           
[      ]            timestamp        0.000000 kIOPS : time 0.000 seconds                                                                                                                       
io500: aiori-POSIX.c:769: POSIX_Xfer: Assertion `rc >= 0' failed.                                                                                                                              
[lustre-tzifycl3-01:21666] *** Process received signal ***                                                                                                                                     
[lustre-tzifycl3-01:21666] Signal: Aborted (6)                                                                                                                                                 
[lustre-tzifycl3-01:21666] Signal code:  (-6)                                                                                                                                                  
[lustre-tzifycl3-01:21666] [ 0] linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb936693c]                                                                                                     
[lustre-tzifycl3-01:21666] [ 1] /usr/lib64/libc.so.6(+0x83dc0)[0xffffb902bdc0]                                                                                                                 
[lustre-tzifycl3-01:21666] [ 2] /usr/lib64/libc.so.6(raise+0x1c)[0xffffb8fe4f7c]                                                                                                               
[lustre-tzifycl3-01:21666] [ 3] /usr/lib64/libc.so.6(abort+0xe4)[0xffffb8fd2d30]                                                                                                               
[lustre-tzifycl3-01:21666] [ 4] /usr/lib64/libc.so.6(+0x368a8)[0xffffb8fde8a8]
[lustre-tzifycl3-01:21666] [ 5] /usr/lib64/libc.so.6(+0x3690c)[0xffffb8fde90c]                                                                                                                 
[lustre-tzifycl3-01:21666] [ 6] ./io500[0x441eac]                                                                                                                                              
[lustre-tzifycl3-01:21666] [ 7] ./io500[0x427f44]                                                                                                                                              
[lustre-tzifycl3-01:21666] [ 8] ./io500[0x429fe4]                                                                                                                                              
[lustre-tzifycl3-01:21666] [ 9] ./io500[0x42b1a8]                                                                                                                                              
[lustre-tzifycl3-01:21666] [10] ./io500[0x42bd74]                                                                                                                                              
[lustre-tzifycl3-01:21666] [11] ./io500[0x40ad04]                                                                                                                                              
[lustre-tzifycl3-01:21666] [12] ./io500[0x40b9e0]                                                                                                                                              
[lustre-tzifycl3-01:21666] [13] ./io500[0x405f28]                                                                                                                                              
[lustre-tzifycl3-01:21666] [14] /usr/lib64/libc.so.6(+0x2afc0)[0xffffb8fd2fc0]
[lustre-tzifycl3-01:21666] [15] /usr/lib64/libc.so.6(__libc_start_main+0x94)[0xffffb8fd3098]                                                                                                   
[lustre-tzifycl3-01:21666] [16] ./io500[0x403d30]                                                                                                                                              
[lustre-tzifycl3-01:21666] *** End of error message ***                                                                                                                                        
Aborted (core dumped)                                 

Error in result verification

./io500 config-all.ini --verify ./results/2021.06.08-21.50.34/result.txt
[run]
config-hash = 61320D6
score-hash = 3529390A

ERROR: Score hash expected: "3529C077" read: "3529390A"

I tried several times and I always get similar error. Any suggestions? Thank you!

-lei

Error while running tests under Extended mode

I am running IO-500 benchmark (https://github.com/VI4IO/io500.git) in extended mode. However, when it hits ior-rnd-read tests, it shows following error:

ERROR: the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D, (ior.c:1450)
ERROR: the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D, (ior.c:1450)
ERROR: the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D, (ior.c:1450)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[node1.mrashid2-qv98443.dirr-pg0.wisc.cloudlab.us:27115] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
[node1.mrashid2-qv98443.dirr-pg0.wisc.cloudlab.us:27115] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

In (ior.c:1450) file (https://github.com/hpc/ior/blob/8475c7d30025dd5e39147c251bf84e1ed24b9858/src/ior.c#L1449), following condition checking is defined:

        if (test->deadlineForStonewalling == 0 && test->stoneWallingWearOut > 0)
          ERR("the stoneWallingWearOut is only sensible when setting a stonewall deadline with -D");

Upon investigating further, it can be seen that the command to run ior-rnd-read phase is the following:

./ior -Q=1 -g -G=-1313584709 -z --random-offset-seed=11 -e -o=./datafiles/2021.05.10-01.26.15/ior-rnd/file -O stoneWallingStatusFile=./results/2021.05.10-01.26.15/ior-rnd.stonewall -O stoneWallingWearOut=1 -k -t=4096 -b=1073741824 -s=10000000 -r -R -a POSIX -O saveRankPerformanceDetailsCSV=./results/2021.05.10-01.26.15/ior-rnd-read.csv

From the command, it can be seen that even though stoneWallingWearOut is set to 1, there is no parameter that defines deadlineForStonewalling which causes the error to happen and the experiment to be aborted.

FAILED in TestIoSys, Cannot create file

Hello, may I ask you,Why does the following error occur when I use the Io500 to connect to S3?
image
My config-s3.ini configuration is as follows:
image

Can the configuration of the config-s3.ini file for interconnecting with S3 be provided?

find_hard spams stdout

When using Extended mode, the find_hard section spams the stdout with found files, e.g.

/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.99.701
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.97.1010
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.99.601
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.99.801
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.98.1011
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.0.1001
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.97.1012
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.97.1011
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.98.801
/nrs/scicompsys/io500-sc21/datadir/2021.10.22-12.17.03/mdtest-hard//test-dir.0-0/mdtest_tree.0/file.mdtest.99.501

This makes it hard to check scores while a run is going.

Failure to hit stonewall while running IO-500 benchmark in extended mode.

In my experiment, when I tried to run the IO-500 benchmark in the extended mode, it didn't hit the stonewall time of 300s for ior-easy-write phase. In ior-easy-write phase, the write operations were continuing up until, the cluster storage space was about to exhaust. The same thing occured for test phase ior-hard-write too. I haven't faced the problem when I ran in standard mode.

I am providing below one of the example instance from my local experiment where the ior-easy-write took very long time (the benchmark configuration was default one for the ior-easy-write phase):

result_summary.txt:

[ior-easy-write]
t_start         = 2021-05-10 01:26:16
exe             = ./ior -C -Q 1 -g -G 761181371 -k -e -o ./datafiles/2021.05.10-01.26.15/ior-easy/ior_file_easy -O stoneWallingStatusFile=./results/2021.05.10-01.26.15/ior-easy.stonewall -t 2m -b 9920000m -F -w -D 300 -O stoneWallingWearOut=1 -a POSIX -O saveRankPerformanceDetailsCSV=./results/2021.05.10-01.26.15/ior-easy-write.csv
throughput-stonewall = 1.27
score           = 0.396580
t_delta         = 3233.3158
t_end           = 2021-05-10 02:20:09

ior-easy-write.txt:

IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O
Began               : Mon May 10 01:26:16 2021
Command line        : ./ior -C -Q 1 -g -G 761181371 -k -e -o ./datafiles/2021.05.10-01.26.15/ior-easy/ior_file_easy -O stoneWallingStatusFile=./results/2021.05.10-01.26.15/ior-easy.stonewall -t 2m -b 9920000m -F -w -D 300 -O stoneWallingWearOut=1 -a POSIX -O saveRankPerformanceDetailsCSV=./results/2021.05.10-01.26.15/ior-easy-write.csv
Machine             : Linux node1.mrashid2-qv98443.dirr-pg0.wisc.cloudlab.us
TestID              : 0
StartTime           : Mon May 10 01:26:16 2021
Path                : ./datafiles/2021.05.10-01.26.15/ior-easy/ior_file_easy.00000000
FS                  : 1.5 TiB   Used FS: 0.0%   Inodes: 23.6 Mi   Used Inodes: 0.0%

Options: 
api                 : POSIX
apiVersion          : 
test filename       : ./datafiles/2021.05.10-01.26.15/ior-easy/ior_file_easy
access              : file-per-process
type                : independent
segments            : 1
ordering in a file  : sequential
ordering inter file : constant task offset
task offset         : 1
nodes               : 4
tasks               : 4
clients per node    : 1
repetitions         : 1
xfersize            : 2 MiB
blocksize           : 9.46 TiB
aggregate filesize  : 37.84 TiB
stonewallingTime    : 300
stoneWallingWearOut : 1

Results: 

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
stonewalling pairs accessed min: 9763 max: 164128 -- min data: 19.1 GiB mean data: 95.5 GiB time: 301.4s
WARNING: Expected aggregate file size       = 41607495680000
WARNING: Stat() of aggregate file size      = 1376805453824
WARNING: Using actual aggregate bytes moved = 1376805453824
WARNING: Maybe caused by deadlineForStonewalling
write     406.10     203.05     0.000061    10158080000 2048.00    0.001297   3233.26    0.003247   3233.27    0   

ior-easy.stonewall:

164128

build fails with newer automake/autoconf

When using autoconf/2.71 (and automake/1.16.5), the io500 build fails with errors like the following (autoconf/2.69 and automake/1.15.1 works fine):

configure.ac:13: warning: The macro `AC_CONFIG_HEADER' is obsolete.
configure.ac:13: You should run autoupdate.
./lib/autoconf/status.m4:719: AC_CONFIG_HEADER is expanded from...
configure.ac:13: the top level
configure.ac:42: warning: The macro `AC_PROG_CC_C99' is obsolete.
configure.ac:42: You should run autoupdate.
./lib/autoconf/c.m4:1659: AC_PROG_CC_C99 is expanded from...
configure.ac:42: the top level
configure.ac:89: error: possibly undefined macro: AC_DEFINE
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:132: error: possibly undefined macro: AC_SUBST
autoreconf: error: ..../autoconf failed with exit status: 1

compile-time error with INI_UNSET_FLOAT

With the Intel compiler (2020.4.304), the following causes a compile-time error (occurred during io500-isc21_v1 prepare):

$ ./prepare.sh
...
src/ini-parse.c(285): error #264: floating-point value does not fit in required floating-point type
              *(float*) o->var = INI_UNSET_FLOAT;
                                 ^
 compilation aborted for src/ini-parse.c (code 2)
make: *** [build/ini-parse.o] Error 2

The culprit seems to be this #define:

include/io500-util.h:#define INI_UNSET_FLOAT 1e307

Segmentation fault while trying to run single phase test

I am facing this error in the scenario where I tried to run only a single phase of the test. For example, when I tried to run only ior-easy phase(both read and write) what I did is that for other phases of the benchmark test except ior-easy, I have set the noRun variables to TRUE. After executing the command (stonewall-time set to 90 seconds)to run the benchmark, I got the following error:

ERROR INVALID (src/phase_dbg.c)stonewall-time != 300s
IO500 version io500-sc20_v3-6-gd25ea80d54c7
[RESULT-invalid]       ior-easy-write        0.346598 GiB/s : time 90.190 seconds
[RESULT-invalid]    mdtest-easy-write        0.000000 kIOPS : time 0.000 seconds
ERROR INVALID (src/main.c)Runtime of phase (0.000056) is below stonewall time. This shouldn't happen!
[RESULT-invalid]       ior-hard-write        0.000000 GiB/s : time 0.000 seconds
ERROR INVALID (src/main.c)Runtime of phase (0.000054) is below stonewall time. This shouldn't happen!
[RESULT-invalid]    mdtest-hard-write        0.000000 kIOPS : time 0.000 seconds
ERROR INVALID (src/main.c)Runtime of phase (0.000045) is below stonewall time. This shouldn't happen!
[RESULT-invalid]                 find        0.000000 kIOPS : time 0.000 seconds
[RESULT]        ior-easy-read        2.520806 GiB/s : time 12.406 seconds
[RESULT-invalid]     mdtest-easy-stat        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]        ior-hard-read        0.000000 GiB/s : time 0.000 seconds
[RESULT-invalid]     mdtest-hard-stat        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]   mdtest-easy-delete        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]     mdtest-hard-read        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]   mdtest-hard-delete        0.000000 kIOPS : time 0.000 seconds
[SCORE-invalid] Bandwidth 0.000000 GiB/s : IOPS 0.000000 kiops : TOTAL 0.000000

The result files are stored in the directory: ./results/2021.04.20-02.44.45
Segmentation fault

After setting the verbosity level to 10 in the configuration, following logs were reported:

; [I] Creating dir ./results/2021.04.20-02.40.07/
; [I] Creating dir ./datafiles/2021.04.20-02.40.07/
ERROR INVALID (src/phase_dbg.c)stonewall-time != 300s
; [I] Creating dir ./datafiles/2021.04.20-02.40.07/ior-easy/
; [I] Creating dir ./datafiles/2021.04.20-02.40.07/ior-hard/
IO500 version io500-sc20_v3-6-gd25ea80d54c7
[RESULT-invalid]       ior-easy-write        0.346252 GiB/s : time 90.180 seconds
[RESULT-invalid]    mdtest-easy-write        0.000000 kIOPS : time 0.000 seconds
ERROR INVALID (src/main.c)Runtime of phase (0.000081) is below stonewall time. This shouldn't happen!
[RESULT-invalid]       ior-hard-write        0.000000 GiB/s : time 0.000 seconds
ERROR INVALID (src/main.c)Runtime of phase (0.000037) is below stonewall time. This shouldn't happen!
[RESULT-invalid]    mdtest-hard-write        0.000000 kIOPS : time 0.000 seconds
ERROR INVALID (src/main.c)Runtime of phase (0.000031) is below stonewall time. This shouldn't happen!
[RESULT-invalid]                 find        0.000000 kIOPS : time 0.000 seconds
[RESULT]        ior-easy-read        2.516911 GiB/s : time 12.410 seconds
[RESULT-invalid]     mdtest-easy-stat        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]        ior-hard-read        0.000000 GiB/s : time 0.000 seconds
[RESULT-invalid]     mdtest-hard-stat        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]   mdtest-easy-delete        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]     mdtest-hard-read        0.000000 kIOPS : time 0.000 seconds
[RESULT-invalid]   mdtest-hard-delete        0.000000 kIOPS : time 0.000 seconds
; [I]  MD = (0.00000000 * 0.00000000 * 0.00000000 * 0.00000000 * 0.00000000 * 0.00000000 * 0.00000000 * 0.00000000)^0.125000
; [I]  BW = (0.34625199 * 0.00000000 * 2.51691134 * 0.00000000)^0.250000
[SCORE-invalid] Bandwidth 0.000000 GiB/s : IOPS 0.000000 kiops : TOTAL 0.000000

The result files are stored in the directory: ./results/2021.04.20-02.40.07
; [I] Removing file ./datafiles/2021.04.20-02.40.07/ior-easy/ior_file_easy.00000000
; [I] Removing dir ./datafiles/2021.04.20-02.40.07/ior-easy
; [I] Removing dir ./datafiles/2021.04.20-02.40.07/mdtest-easy
; [I] Removing file ./datafiles/2021.04.20-02.40.07/ior-hard/file
Fatal error: glibc detected an invalid stdio handle
[node0:157537] *** Process received signal ***
[node0:157537] Signal: Aborted (6)
[node0:157537] Signal code:  (-6)
[node0:157537] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7fd14514e630]
[node0:157537] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fd144da7387]
[node0:157537] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fd144da8a78]
[node0:157537] [ 3] /lib64/libc.so.6(+0x78ed7)[0x7fd144de9ed7]
[node0:157537] [ 4] /lib64/libc.so.6(__libc_fatal+0x1e)[0x7fd144de9fbe]
[node0:157537] [ 5] /lib64/libc.so.6(+0x79333)[0x7fd144dea333]
[node0:157537] [ 6] /lib64/libc.so.6(fflush+0xf8)[0x7fd144ddf718]
[node0:157537] [ 7] ./io500[0x405f7d]
[node0:157537] [ 8] ./io500[0x409a4e]
[node0:157537] [ 9] ./io500[0x405bff]
[node0:157537] [10] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd144d93555]
[node0:157537] [11] ./io500[0x403c09]
[node0:157537] *** End of error message ***
Aborted

rmdir is called from all ranks, and should be called only from rank 0

probably need something like:

diff --git a/src/util.c b/src/util.c
index fc3b449..ec4f5a6 100644
--- a/src/util.c
+++ b/src/util.c
@@ -66,11 +66,12 @@ void u_call_cmd(char const * str){
   }
 }
 void u_purge_datadir(char const * dir){
-  char d[2048];
-  sprintf(d, "%s/%s", opt.datadir, dir);
-  DEBUG_INFO("Removing dir %s\n", d);
-
-  opt.aiori->rmdir(d, opt.backend_opt);
+  if( ! opt.dry_run && opt.rank == 0){
+    char d[2048];
+    sprintf(d, "%s/%s", opt.datadir, dir);
+    DEBUG_INFO("Removing dir %s\n", d);
+    opt.aiori->rmdir(d, opt.backend_opt);
+  }
 }
 
 void u_purge_file(char const * file){

Disabling one step of the benchmark zeroes out the score

I'm trying to run the benchmark with one particular step disabled, with the key

RUN = FALSE

but while the step does indeed not run, it gives out a score of 0, which in turn zeroes out the geometric mean of the score aggregation. I guess that's mathematically correct, but it doesn't make much sense for as long as I'm not trying to submit a score but rather bench a system to test ameliorations

aiori drivers initialized called without initializing MPI rank

If one runs ior manually (outside of io500, the ior-main function sets the global rank variable here:
https://github.com/hpc/ior/blob/main/src/ior.c#L205
before calling the init function for each backend driver (POSIX, DFS, etc).
However; with io500 app, this rank global variable is not set until later (after the backend driver initialization is called).
That means, for drivers like DFS, which take advantage of some collective operations for the initialization of the pool and container for the workflow, will see everyone as rank 0 and thus will have everyone do some expensive operation to connect to the pool and the container individually, vs 1 rank connect and share the handles.

We can get around this in the DFS driver with a small patch:

diff --git a/src/aiori-DFS.c b/src/aiori-DFS.c
index 23741e1..8c204e6 100755
--- a/src/aiori-DFS.c
+++ b/src/aiori-DFS.c
@@ -199,8 +199,10 @@ static int DFS_check_params(aiori_mod_opt_t * options){
         if (o->pool == NULL || o->cont == NULL)
                 ERR("Invalid pool or container options\n");
 
-        if (testComm == MPI_COMM_NULL)
+        if (testComm == MPI_COMM_NULL) {
                 testComm = MPI_COMM_WORLD;
+                MPI_CHECK(MPI_Comm_rank(testComm, &rank), "cannot get rank");
+        }
 
         return 0;
 }

or we can fix this in the io500 app with:

diff --git a/src/main.c b/src/main.c
index ad23285..389f507 100644
--- a/src/main.c
+++ b/src/main.c
@@ -24,6 +24,8 @@ static char const * io500_phase_str[IO500_SCORE_LAST] = {
   "MD",
   "BW"};
 
+extern int rank;
+
 static void prepare_aiori(void){
   // check selected API, might be followed by API options
   char * api = strdup(opt.api);
@@ -204,6 +206,7 @@ int main(int argc, char ** argv){
   MPI_Init(& argc, & argv);
   MPI_Comm_rank(MPI_COMM_WORLD, & opt.rank);
   MPI_Comm_size(MPI_COMM_WORLD, & opt.mpi_size);
+  rank = opt.rank;
 
   int verbosity_override = -1;
   int print_help = 0;

I pushed a PR for io500 with the latter:
#60
but if that is not acceptable, i can push the other one to ior/dfs.

Stonewalling: IOR Hard on Lustre and HDD takes extremely long

Opened this issue to track the situation of IOR hard that takes many hours when running on spinning disks.
On a Lustre system such as DKRZ and Archer 2 that has spinning disks, the IOR hard takes basically unbearable long.

Inspecting ior-hard-write.txt for a test with
[debug]
stonewall-time = 1

For example on 10 nodes with 8 procs each leads to:
stonewalling pairs accessed min: 21 max: 3195 -- min data: 0.0 GiB mean data: 0.0 GiB time: 1.2s
The overall runtime here was then 16.5s.

The imbalance on 5 minute runs can stretch the runtime further, causing the hard phase to take many hours (often to be killed by 8 hour deadlines).

The only suitable workaround is to reduce the segment count to a bearable number, e.g.,
[ior-hard]
segmentCount = 10000

question about the ior-hard-write result when using MPIIO+collective

When I run ior-hard-write with API MPIIO+collective(opemMPI ROMIO)

[ior-hard-write]
# The API to be used
API = MPIIO
# Collective operation (for supported backends)
collective = TRUE
## io500.sh
io500_mpiargs="-hostfile /root/io500test/mpi-hosts --map-by node -np $np \
        -mca pml ucx \
        -mca btl ^openib \
        -mca io romio321 \
        -x ROMIO_FSTYPE_FORCE=lustre: \
         --allow-run-as-root "

With the same np=144, I find that although the running time is reduced a lot, but the bandwidth is smaller than the POSIX API
result (API: MPIIO+collective)

[RESULT]       ior-hard-write        2.423454 GiB/s : time 300.345 seconds

result (API: POSIX )

[RESULT]       ior-hard-write        3.770933 GiB/s : time 1843.400 seconds

Why the running time is better but the bandwidth is worse??

SC21 BoF Checklist

Here are the todo items for the SC20 Birds of a Feather:

  • Update BoF Submission
  • Submit BoF Request to the Conference
  • Update Call for Submissions
  • Send Call for Submissions to various email lists via io500 committee email account
  • Review submissions for quality (separate process)
  • Update general sections of BoF Presentation
  • Update and generate awards (and get trophies if we are doing this again). Includes printing certificates
  • Update trends graphs and slides
  • Update discussion materials
  • Finalize data post BoF
  • create Zenodo DOI for final list (post BoF if we can't change the data blob)
  • publish the DOI on the IO500 website with the list

MDTest hard on Lustre

This issue is to document and discuss the known problem of running MDtest hard on Lustre which may abort due to a "No space left on device" error. The reason is that by default, with LDiskFS:
"We use the ext3 hashed directory code, which has a theoretical limit of about 15 million files per directory".
With a deployed DNE2 or a specific mount option, this limit can be avoided.

hintsFileName in config.ini doesn't work

My config.ini contained this configuration:

[ior-hard]
API = mpiio
hintsFileName = ./hints.gerty

which resulted in a runtime error of

Error invalid argument: -U
Error invalid argument: ./hints.gerty
Error invalid argument: mpiio
Invalid options
Synopsis ./ior

Looks like IOR dropped the -U flag when it was turned into a module-specific parameter, but IO500 wasn't updated accordingly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.