Git Product home page Git Product logo

makemytranscriptome's People

Contributors

aswalters avatar bluegenes avatar gitter-badger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

makemytranscriptome's Issues

KEGG mapping

update kegg scripts to get (& map to) arbitrary list of KEGG maps
re-integrate into main pipeline
grab KO from annotation table, not simple conversion file
optional user-specified KO map numbers?

Annotation subset: Transrate "good" contigs

Continue to annotate full Trinity transcriptome, but also provide annotation info for the subset of contigs called "good" by transrate: --> just need to grep relevant contig info out of annotation table (or print the subset as part of annotation table script)

extend diamond blast to include stitle

*use general database conversion file (ID: Name) to add the 'Name' / stitle field to the blast tabular output from diamond. not necessary if using blast+, as this output is already supported

databases

allow user input of pre-downloaded databases

diamond: annotation table

move NR "best words" /query coverage calculation --> main annotation table script
calculate query coverage = (alignment length/ query length)
- alignment length = index [3] in the blast file
- query length = from input fasta
- "name" /words can come from index 12 - Tessa will extend the diamond blast output to contain this field

csv input for assembly + quality tools

  • enable users to provide CSV input when using the assembly tool and quality tool (in addition to "full" and "expression")

(useful for quality tool b/c translate assesses assemblies based on their input reads, if given)

assembly_scrape

  • edit to include BUSCO & Transrate output
  • wrap + integrate into pipeline

Manage_tools output

When executing manage_tools, it should direct stdout to console not capture and hide it

Salmon

  • command is already wrapped, but needs to be integrated into pipeline

gene-trans map

remove reliance on Trinity gene-trans map for assembly:

  1. allow no gene-trans map (optional input into all downstream steps)
  2. allow optional gene-trans-map input from user.

configurable databases

Refactor databases to support user configurable install locations and user defined read locations

chmod bug

chmod commands don't normally make new targets meaning they don't work well with the supervisor interface. Needs a fix.
Potential fix : Just chmod using subprocess instead of Task system

Transrate setup task is always skipped

The transrate setup task installs dependendencies for transrate, meaning it doesn't have any obvious targets. This results in the task getting skipped as the declared targets are created by downloading transrate, not by running the setup command.

manage databases: modifications

Low priority, but imp:

  1. change default behavior of manage_databases:
    • change default functionality--> check available databases, install any missing db's
    • make "--update" an option instead (currently default)
  2. make manage_db a module, so can call with:
    "mmt databases" instead of having to use ./scripts/manage_databases.py
  3. option for which busco db to download (to go with our current option in pipeline.py)

diamond integration: post-diamond blast processing

  1. in manage_databases:
    • use fastaID2Names to get a tabular lookup dictionary for each fasta
  2. In annotator:
    • addStitleToBlastTab.py
      • takes in output of fastaID2Names + diamond blast output; modifies blast output in-place

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.