While casually observing a scripts/run_OQStandard.sh run, I noticed that OpenQuake itself would happily use all available CPU cores to do calculations in parallel (which is awesome), but some other processing are single-threaded and could take over 12 hours. For example:
from ps auxwww
nearing the end of python3 scripts/consequences-v3.10.0.py -2
run:
user 2151 0.0 0.0 8756 3792 pts/0 S+ 07:51 0:00 bash scripts/run_OQStandard.sh SCM6p5_Montreal_conv -h -r -d -o
user 2225 0.0 0.0 3065888 101008 ? Sl 07:51 0:01 oq-dbserver
user 6603 100 0.0 2836080 263132 pts/0 Rl+ 09:53 759:05 python3 scripts/consequences-v3.10.0.py -2
from top
:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6603 user 20 0 2836080 263132 50196 R 100.0 0.0 286:58.05 python3
from free -h
:
total used free shared buff/cache available
Mem: 749Gi 1.4Gi 739Gi 1.0Mi 8.6Gi 744Gi
Swap: 0B 0B 0B
So, in this particular case, calculations before python3 scripts/consequences-v3.10.0.py -2
took just over 2 hours, but python3 scripts/consequences-v3.10.0.py -2
alone was approaching 5 hours took 12.65 hours (759 minutes), running single-threaded (not using a lot of RAM) and writing to CSV files at about 200 lines/second (487,211 lines per CSV file in this scenario):
-rw-rw-r-- 1 user group 96764235 May 2 10:40 consequences-rlz-000_-2.csv
-rw-rw-r-- 1 user group 96262336 May 2 11:28 consequences-rlz-001_-2.csv
-rw-rw-r-- 1 user group 96978159 May 2 12:15 consequences-rlz-002_-2.csv
-rw-rw-r-- 1 user group 97646335 May 2 13:03 consequences-rlz-003_-2.csv
-rw-rw-r-- 1 user group 98016335 May 2 13:50 consequences-rlz-004_-2.csv
-rw-rw-r-- 1 user group 83709311 May 2 14:31 consequences-rlz-005_-2.csv
Ditto for the python3 scripts/consequences-v3.10.0.py -1
command which is expected to take another 12 hours.
Would be an interesting exercise to profile this script and see where it is spending most of its time, and find ways to make it speedier.
(Low priority, could have)
P.S. A quick-and-dirty script that I am using to record basic metrics:
#!/bin/bash
LOGFILE=~/logs/log_2022-05-02_cpu-ram-process.log
while true; do
( date; uptime; free -h; ps auxwww | grep ^user ; echo) | tee -a "${LOGFILE}"
sleep 15
done