Analyzing gene flow in the deep biosphere via "multi-omics" integrated analysis.
config/
: configuration files
data/
: raw data goes here
resources/
: databases, references etc.
workflow/
: main directory for the snakemake workflow
data/testdata/
:
This folder contains two small test datasets:
- test
This dataset is comprised of 25k paired-end reads from the SCAPP test data + 25k paired-end reads each from three plasmids:
-
CEX4 plasmid pCEX4 (LC556220.1, Enterobacter cloacae)
-
unnamed plasmid (NC_012780.1, [Eubacterium] eligens ATCC 27750)
-
plasmid pBPSE01 (NZ_KF418775.1, Burkholderia pseudomallei strain MSHR1950) generated using
randomreads.sh
frombbmap
. -
mock
comprised of 50k reads from the minced
testdata of the Aquifex aeolicus VF5
genome (generated with randomreads.sh
) + 50k reads subsampled from a
synthetic mock metagenome (using seqtk
).
snakemake --use-conda -j 10 -rpk --profile slurm --configfile config/deepbio_config.yaml
- Benchmarking shows that assembly + binning underperforms for plasmids and GIs: Maguire et al 2020