Git Product home page Git Product logo

dbs-pro's People

Contributors

fricktobias avatar pontushojer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

afshinlab

dbs-pro's Issues

Check for & filter chimeric reads

The UMI sequences can be used to identify chimeric sequences by looking for UMI:s linked to several different ABC or DBS sequences.

Change pipeline order

Just and idea I had about how we might want to change the order in our pipeline.

I have found the following issue. For UMIs we cluster them for each ABC target but do not separate on DBS. This could mean that we are merging UMIs that should in fact be separate. My proposal would be to separate all UMIs by ABC and DBS before clustering. This would better represent the actual conditions in the experiment.

I am however unsure about the benefits in the end, possibly this would only be a lot of work for nothing, but I wanted to raise the idea anyway to set what you think.

Current pipeline

START. Input = Fastq file

  1. Separate for DBS
    1.1 Extract DBS
    1.2. Cluster DBS
    1.3 Correct DBS fastq

  2. Separate for ABCs
    2.1. Extract ABC-UMI
    2.2 Split ABC-UMI by ABC
    2.3 Cluster ABCs independently
    2.4 Correct ABC fastqs.

  3. Analysis of corrected DBS and ABC files.

END.

Purposed outline pipeline

START. Fastq file

  1. Extract DBS
  2. Extract ABC-UMI
  3. Cluster DBS
  4. Correct DBS fastq
  5. Split/Tag ABC-UMI by DBS //This represent separated dropletts
  6. Split/Tag ABC-UMI by ABC // This represents spliting within dropletts for different targets.
  7. Cluster for each DBS-ABC pair indepentently
  8. Correct DBS-ABC pairs
  9. Analysis

END.

Outdated dependency file

These modules are currently not included in the environment file

  • pandas
  • seaborn
  • dnaio
  • snakemake
  • umi_tools

weird construct file path input

Currently it is seemingly needed to give the path to the construct relative to the output directory rather than your working directory.

Jupyther notebook example run

Provide example of how to edit and run the pipeline in a linked publically available google colab jupyter noteboook

Implement pytest

Use pytest and run tests settings and running from remote directories.

Modular ABC file system

The ABC file system is currently not that adaptable and it would be nice to be able to have several ABC-sequence files for different setups.

I'd suggest adding functionality for adding new construct file using dbspro set and a separate command for changing which is used by adding a dbspro config command (or something like that).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.