Git Product home page Git Product logo

miplithon's Introduction

miplithon

This is a snakemake pipeline for analyzing drug resistance prevalences. The pipeline expects 3 tables of read counts similar to the ones output by the variant calling step of miptools. Each table consists of columns that are mutations and rows that are samples, and a given (column, row) value is the count of reads associated with a mutation in a given sample.

The tables are:

  • coverage_AA_table.csv: the total number of reads that span the genomic region.
  • reference_AA_table.csv: the count of reads that did not support the mutation.
  • alternate_AA_table.csv: the count of reads that supported the mutation.

The user also provides a list of mutations of interest (in a yaml file) along with coverage and alternate thresholds to apply to each mutation, and a metadata file that assigns categorical values to each sample in a column (or columns) of interest.

The program outputs a table of categorical values for the column of interest, as well as how many samples passed the thresholds for each category.

For example, a user might have counts for samples collected from several geographic regions, and metadata that links individual samples to individual regions. The user might be interested in 10 mutations at thresholds of coverage 3 and alternate count of 1. The program would output how many samples in each geographic region had counts that passed these thresholds for each of the 10 mutations.

Installation

To install this program, first download snakemake (e.g. pip install snakemake).

Next, cd to a folder of interest and git clone the pipeline, e.g.

git clone https://github.com/bailey-lab/miplithon.git

Usage

To use this pipeline, edit the yaml file using the comments in the yaml file. Some example values have been filled in.

Next, run the code with

snakemake -s DR_analysis.smk --cores 4

You can optionally change the number of cores to a higher value for slightly faster running times (this increases the number of rules that can run simultaneously but does not affect the running time for any individual rule).

Results

alternate_freqs.csv gives the unfiltered within sample allele frequencies of each mutation in each sample.

This is graphed as a plotly heatmap in frequency_heatmap.html

The main results can be found in the prevalences/{summarize_by}:all_{cov}_{alt}_summary.tsv file, where {summarize_by} is the column of interest, {cov} is the threshold coverage, and {alt} is the threshold alternate count (number of times the mutant has been seen. Mutations in this table will match those provided by the user in the yaml file.

prevalences_by_threshold/{cov}{alt}{count}_summary.tsv gives all mutations (even those not explicitly asked for) that passed coverage, alternate count, and minimum sample count thresholds from the yaml file.

background_mutations/561_on_Asian_backgrounds.tsv is currently hardcoded to give the number of samples that have 561H and Asian mutations. This will be generalized in the future.

prevalences/mdr1-NFD_{cov}{alt}cov.txt and prevalences/mdr1-NFD{cov}{alt}_alt.txt give tables that show the number of samples that contain the mdr1-Asn86, mdr1-Tyr184Phe, and mdr1-Asp1246 mutations (number of samples that pass coverage thresholds and number of samples that pass coverage and alternate thresholds, respectively).

miplithon's People

Contributors

alfredsimkin avatar

Watchers

Michael Purcaro avatar Nicholas Hathaway avatar Christian Parobek avatar Jeff Bailey avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.