fricktobias / dbs-pro Goto Github PK

DBSpro Analysis

License: MIT License

Python 4.41% Jupyter Notebook 95.59%

dbs-pro's Issues

Include option to create DAG in run script.

weird construct file path input

Currently it is seemingly needed to give the path to the construct relative to the output directory rather than your working directory.

Make snakemake loop over inputs (abc clustering)

Currently snakemake is using all inputs/output as a globbed list rather than looping over them.

More options in run.py

Include options to pass to snakefile. Used config in snakemake module?

Make dict with arguments in run.py --> pass to config.

Create config.yml for all parameters for pipeline

We should have a config file containing all relevant parameters for running the pipeline that is prone to change. For example:

Handle sequences
ABC sequences

Reduce memory usage of `integrate`

Currently the entire DBS FASTQ is read into memory.

Make snakefile loop-structure for ABCs

Instead of individual rules for each case a general case should be developed

Implement pytest

Use pytest and run tests settings and running from remote directories.

Change pipeline order

Just and idea I had about how we might want to change the order in our pipeline.

I have found the following issue. For UMIs we cluster them for each ABC target but do not separate on DBS. This could mean that we are merging UMIs that should in fact be separate. My proposal would be to separate all UMIs by ABC and DBS before clustering. This would better represent the actual conditions in the experiment.

I am however unsure about the benefits in the end, possibly this would only be a lot of work for nothing, but I wanted to raise the idea anyway to set what you think.

Current pipeline

START. Input = Fastq file

Separate for DBS
1.1 Extract DBS
1.2. Cluster DBS
1.3 Correct DBS fastq
Separate for ABCs
2.1. Extract ABC-UMI
2.2 Split ABC-UMI by ABC
2.3 Cluster ABCs independently
2.4 Correct ABC fastqs.
Analysis of corrected DBS and ABC files.

END.

Purposed outline pipeline

START. Fastq file

Extract DBS
Extract ABC-UMI
Cluster DBS
Correct DBS fastq
Split/Tag ABC-UMI by DBS //This represent separated dropletts
Split/Tag ABC-UMI by ABC // This represents spliting within dropletts for different targets.
Cluster for each DBS-ABC pair indepentently
Correct DBS-ABC pairs
Analysis

END.