Git Product home page Git Product logo

ampseeker's Introduction

Hello there ๐Ÿ‘‹

I'm Sanjay Curtis Nagi, a researcher studying the major malaria mosquito Anopheles gambiae ๐ŸฆŸ

ampseeker's People

Contributors

chabbytmd avatar eddug avatar ericrlucas avatar sanjaynagi avatar

Watchers

 avatar  avatar  avatar  avatar

ampseeker's Issues

add quality control steps post variant-calling

  • Things like filtering on depth

  • Heterozygosity (Hardy-weinberg)

  • Need inspiration for other thresholds, worth looking back at Ag1000G QC steps, or any other AmpSeq pipelines/analyses.

AgamDao modules

We will need to implement a species ID script for AgamDao.

This will be the first protocol-specific bit of analysis, and so we will need to think about the best way to approach this. There could simply be a series of options in the config for each protocol AmpSeeker supports.

Modules:
  AgamDao: True #anopheles
  GRC1: False #plasmodium

TODO

  • Species ID
  • KDR haplotypes/diplotypes
  • ???

implement bcl to fastq conversion

Need a rule at the start of the workflow to convert and demultiplex BCL files from the Illumina miseq output directory to fastq.
The command should be something like -

bcl-convert --bcl-input-directory {illumina_out_dir} --output-directory resources/reads --sample-sheet {illumina_out_dir}/SampleSheet.csv

Assess index read quality

This will involve getting bcl-convert to produce fastqs for the index read. however, I could only get bcl2fastq to do it, not bcl-convert, so we might want to change to bcl2fastq, the older software.

We can then use fastqc or fastp on the index reads.

Heatmap of reads per well

A script which after demultiplexing, counts reads per sample and makes a heatmap of reads per well of each input plate.

igv-notebook

Use a jupyter notebook with papermill to explore read data in IGV

BCL Conversion Failing.

Workflow throws a
ChildIOException: File/directory is a child to another output: when provided with an Illumina data folder.

introduce snakemake checkpoint for samples with no data

Sometimes samples will have zero data, which makes the pipeline fail.

To handle this, the pipeline should start with a checkpoint which evaluates what samples actually have data and uses these to run the rest of the pipeline with.

Build private web page of all results with Jupyter book

@ChabbyTMD @eddUG

I was having a think, and I think it should be possible to use Jupyter-book within snakemake to build a private web-site, which contains all of the results of the workflow for each users analysis.

This would be really cool, imagine that you wouldn't even need to look in the results folder, one could simply open the webpage and explore all the results that way. If we make the analyses in papermill/Jupyter-notebooks, this shouldn't be too complicated!

(Ill explain in our next meeting, but jupyter-book can basically take as input a load of jupyter notebooks and build a website from it).

fix logo

add plasmodium/bacteria/virus and mosquito

Automatic merging of vcf files depending on sample size

Currently, the pipeline has a rule which splits the bcftools merge step into two groups, as when running more than 1000 samples, bcftools merge fails. It also means the pipeline can fail if there are less than 1000 samples.

We should write this so that this is automated, I.e merging is only done in 2 steps if there are more than 1000 samples.

Make documentation webpage

As well as the results Jupyter-book, we also want a jupyter book which will be public, and contain the documentation, for things like

  • setting up AmpSeeker (config, input files)
  • contributing
  • troubleshooting

This will be hosted within the AmpSeeker repo using github pages.

Qualimap wrapper error

Im getting a conda error when trying to run the workflow due to the qualimap wrapper.

Dynamically generate toc.yaml

As different parts of the analysis are optional, the toc will also need to be generated dynamically, within the workflow.

move input file list to common.smk

We should keep the snakefile tidy, and so have an input function for rule all, which resides in a rule file called common.smk. This is good practice in snakemake.

This function will determine which output files we want to produce, based on the config.yaml

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.