Git Product home page Git Product logo

ngi-exoseq's People

Contributors

apeltzer avatar ewels avatar senthil10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngi-exoseq's Issues

JointDiscovery Workflow: Open Points

Requirement is (according to BP) a total of >30 Exomes for VQSR or 1 WGS sample.

  • Aquire 35 Exome BAMS from 1000G
  • Generate gVCFs for these
  • Use them for jointDiscovery workflow (VQSR)

Try to generalise pipeline more

At the moment some of the commands are fairly tied to the NGI infrastructure (eg. regexes that assume sample names looking like P1234). It would be good to try to generalise this as much as possible, moving stuff out into param variables if required.

To be implemented

The followings have to be implemented yet

  • Documentation
  • Travis test
  • Reports (MultiQC)
  • Possible standalone images (Docker/Singularity)
  • Final code review (configs, logging, mailing)

Build a fat container

  • Create Dockerfile
  • Try pushing that to quay.io in nf-core repository
  • Adapt config files to utilize this instead of different containers
  • Rename GATK 4 to gatk-launch (and all other tools, too)

Benefits: We can simply have a simple process collecting metrics instead of having to do that in individual processes, "cleaner" approach.

Waiting for @ewels to allow quay.io access to nf-core....

Documentation update

Ideally, we should have the same structure as in the NGI-RNAseq repository and as proposed by the CookieCutter module @ewels recently provided

Integrate Reports

I submitted a pull request to integrate MultiQC support in ExoSeq. This features all tools in the pipeline, including:

  • FastQC
  • Picard MarkDuplicates
  • GATK VariantEval
  • QualiMap
  • SnpEff

Indel realignment obsolete

Hi!

Took some inspiration from your project for another exome pipeline I am working on. I noticed that you are still using the indel realignment step as by the older GATK best practice. I just wanted to point out that this step is no longer considered best practice when HaplotypeCaller is used:
https://software.broadinstitute.org/gatk/blog?id=7847

Will cut down on the processing time significantly, I reckon.

Cheers,
Marc

Implement GATK 4 Support

GATK 4 will be published on January 9th 2018. We should consider moving most of the calls to support GATK 4 directly as it both speeds up important parts of the analysis and achieves higher sensitivity and specificity.

That does mostly involve generating a new container for GATK4, setting it up and checking whether the calls have changed (which they most certainly didn't, at least not too much).

Support for mixed Capture Methods

The current version of the pipeline supports "only" a single kit for all samples. We have the situation that people come here with "mixed" datasets (e.g. Agilent v3,v4,v5) and then require to analyze everything together. While this is not optimal, there is certainly at least a partial overlap between samples and we should support setting things up.

I assume something like a CSV file with <ID>\t<kit_type> should work as input if a mixed input is used.

Evaluate if an optional process can prepare Exome-Kit files

We should evaluate whether we can automatically prepare Exome-Kit files for the pipeline if users don't specify explicitly which BED files to use for a certain exome kit. Currently, we only have documentation up that suggests how to achieve that, but this process could potentially be automatically performed, too.

Polish MultiQC Report

Ideally, do something similar to CAW and collect more information in a separate process, then creating the appropriate YAML file for MultiQC.

Automatically check Reference Files are compatible

The pipeline should be able to check certain files for consistency: E.g., determine whether reference genome is in the same order as selected dbSNP files, exome BED file(s). Otherwise the pipeline will break at a later point, confusing users and annoying developers too ;-)

Move kits to a base params definition (similar to igenomes)

I'll move the description of all kits and genome files to a base params definition and document this.

e.g. you only specify -genome "blagarbl" and the pipeline looks in the cluster specific configuration / base configuration where the genome files for the requested files are located at.

  • move paths to a certain "base dir" (and not hard link them)
  • documentation update to specify what is expected in such a path
  • test things

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.