Light

ccbaumler / 2022-benchmark Goto Github PK

benchmarking dib-lab software

Standard ML 99.57% Shell 0.01% Python 0.01% Jupyter Notebook 0.43%

2022-benchmark's Introduction

About

Focused on bioinformatically deriving biomedical-relevant answers from public data in the population health field. I find some purpose in helping and mentoring others. Broadly, I aim to eloquently teaching topics that are esoteric and under-taught.

Current projects

Discovery and re-discovery of genomic and metabolomic biomarkers in case verse control studies.
Teaching inclusive, equitable, and sustainable practices in scholarly knowledge management

Github Statistics

My latest open-source contributions:

🗣 Commented on #3183 in sourmash-bio/sourmash
🗣 Commented on #70 in sourmash-bio/sourmash_plugin_directsketch
🗣 Commented on #70 in sourmash-bio/sourmash_plugin_directsketch
🗣 Commented on #70 in sourmash-bio/sourmash_plugin_directsketch
❗ Opened issue #70 in sourmash-bio/sourmash_plugin_directsketch

Colton's dynamically generated GitHub stats

2022-benchmark's People

Contributors

Stargazers

Watchers

Forkers

ctb bryshalm

2022-benchmark's Issues

srun options for parallelization

https://stackoverflow.com/questions/65603381/slurm-nodes-tasks-cores-and-cpus

What should we benchmark?

@ccbaumler and I have developed a benchmarking workflow to analyze various different metrics of sourmash commands. This will help us identify parts of the program which can be further optimized.

The input sig files vary in sample location, instrumentation, base quantity, and file size. They are listed below:

Sample	Sampling Location	Instrument	Base Quantity (gigabases)	Files size (gigabytes)
SRR1976948	USA: Alaska, North Slope, Schrader Bluff formation	Illumina MiSeq	8.65	4.96
SRR1977249	USA: Alaska, North Slope, Schrader Bluff formation	Illumina MiSeq	9.61	5.65
SRR1977296	USA: Alaska, North Slope, Ivishak formation	Illumina HiSeq 2500	15.49	8.47
SRR1977304	USA: Alaska, North Slope, Ivishak formation	Illumina HiSeq 2500	14.50	8.45
SRR1977357	USA: Alaska, North Slope, Kuparuk formation	Illumina HiSeq 2500	19.00	11.10
SRR1977365	USA: Alaska, North Slope, Kuparuk formation	Illumina HiSeq 2500	18.48	10.82

(Metadata found here and Project information here)

Metrics are measured two different ways:

Flamegraphs constructed using python profiler (py-spy). An example of a flamegraph for the gather command:
Furthermore, the workflow measures Computational metrics by snakemake's benchmark directive. The benchmark outputs a tsv file with the following metrics ref :

colname	type (unit)	description
s	float (seconds)	Running time in seconds
h :m :s	string (-)	Running time in hour, minutes, seconds format
max_rss	float (MB)	Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms	float (MB)	Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss	float (MB)	“Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss	float (MB)	“Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in	float (-)	the number of read operations performed (cumulative).
io_out	float (-)	the number of write operations performed (cumulative).
mean_load	float (-)	CPU usage over time, divided by the total running time (first row)
cpu_time	float(-)	CPU time summed for user and system

We can compare the metrics among different samples with line graphs (example below) to check for inconsistencies.

Possible additions to the benchmarking work flow:

Running the command on each sample multiple times of an arbitrary standard (say 3 times) and aggregating these results. This will prevent one time events and outliers from influencing our analysis.
Use the perf command to gather more robust metrics. See the flamegraph off-shoots here

config docs

https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html

Benchmarking documentation

Examples of benchmarking from lab

An example of using snakemake for benchmarking

The benchmark output files from the above repo

The use of mprof which profile memory usage on sourmash gather

Jessica's benchmarking process

sourmash compare memory needs profiling!!

Tools for benchmarking software

Flamegraph

py-spy

timeit

documentation

snakemake

Snakefile benchmark needs an id column

The benchmark files created by snakemake do not come with an internal id column only the benchmarks. This could become an issue when running the same input file multiple times.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.