Benchmark repo to keep track of progress on memory pressure.
With pip you can install the dependencies
python3 -m pip install -r dependencies.txt
You can also install duckdb for the command line https://duckdb.org/docs/installation/?version=latest&environment=cli&installer=binary&platform=macos
Then you can create the data with the following commands
~/memory-pressure-benchmarks$ duckdb tpch-sf100.duckdb -c "install httpfs; load httpfs; call dbgen(sf=100);"
~/memory-pressure-benchmarks$ duckdb tpch-sf100.duckdb -c "export database 'tpch_data' (FORMAT CSV, HEADER 1);"
~/memory-pressure-benchmarks$ python3 hyper/load.py
Give the benchmark a name. The name defines a directory in the benchmarks directory where the data is stored. If you are running the benchmark on AWS, see if you can use instance storage to reduce variability introduced by EBS. This can be done by running the command below
./utils/format_and_mount.sh
This will format and mount a copy of the benchmarks at ~/memory-pressure-benchmarks-metal
on instance storage. Change into this directory when running the benchmarks
To run the benchmark
python3 utils/run_benchmark.py --benchmark_name={benchmark_name} --benchmark={benchmark} --system=all
example
python3 utils/run_benchmark.py --benchmark_name=jan-1-duckdb-dev --benchmark=tpch --system=duckdb
With this command you can plot the summary of the benchmark.
Rscript graph_utils/plot_summary.R {benchmark_name}