Git Product home page Git Product logo

protein-inference's Introduction

Protein Inference - Mass Dynamics (MD)

This codebase is being developed at Mass Dynamics https://www.massdynamics.com/ to facilitate work on the Protein Inference Problem.

It has been developed using the output of crux comet/percolator (eg: percolator.target.psms.txt), but requires only that psm tables contain q-value and score columns, although more information about psms could theoretically be integrated into scoring methods.

This codebase could be used for the following purposes:

* to develop or benchmark protein inference algorithms using shared data processing and  parallelization infrastructure
* to visualize the peptide-protein networks that are being reasoned about by a protein inference algorithm
* to opensource an algorithm that is implimented inside this package

The REPRISAL algorithm, currently unpublished, was the first novel algorithm implemented inside the protein_inference framework and is contained in the distinct subpackage "reprisal".

REPRISAL stands for "REcursive PRotein Inference Scoring ALgorithm" and works by iteratively scoring proteins by their associated peptides and grouping all associated and unallocated peptides with the highest scoring protein at each iteration. FDR's are then calculated as per the score distributions for proteins in the target and decoy tables.

Installation

pip install protein-inference

Build Docs

sphinx-build -b html docs/source/ docs/build

Development Set up

conda create -y --name protein-inference python=3.8 --file requirements.txt

Dependencies

Many of these are just used in the benchmarking suite for visualization.

pandas

networkx

pyvis

matplotlib

plotly

seaborn

upsetplot

Quick Start

If you have a psms list from percolator (or otherwise) ready to go, you can run the default pipeline easily:

import protein_inference as pi

target_path = "path/to/your/target.psms.txt"
decoy_path =  "path/to/your/decoy.psms.txt"
output_directory = "path/to/your/output_directory"
pi.ProteinInferenceRunner().run(target_path, 
                            decoy_path,
                            output_directory)

You can also run the package via the commandline if you can specify the path to the python directory where you have protein_inference installed.

python -m protein_inference.main \
  --output-directory path/to/out/folder \
  --target-path path/to/your/target.psms.txt \
  --decoy-path path/to/your/decoy.psms.txt

Streamlit App

If you'd like to play around with some processed data/visualization, install streamlit and run the protein inference playground app.

pip install streamlit
cd streamlit
streamlit run pi_playground.py -- --folder /path/to/your/pi/output/

You can also start the app without a path to an upload folder, and upload your output files after the fact.

Using Docker:

A docker file has been created which creates an environment to run the protein_inference package via a command line interface.

You can build the image from the docker file as so:

docker build --pull --rm -f "Dockerfile" -t proteininference:latest "." 

Then you can run the main file from the command line by using a /data volume and passing arguments with paths to files in that folder.

docker run -it --rm -v /data:/data \
    proteininference conda run -n protein-inference -c python -m protein_inference.main 
    --output-directory /data/output \
    --target-path /data/input/percolator.target.psms.txt \
    --decoy-path /data/input/percolator.decoy.psms.txt

Future Plans:

The benchmarking suite is currently undocumented as it is still being developed. I'll get around to this as I work towards publishing REPRISAL as a paper.

While not currently in development, possible additions to this codebase are:

  • Writing file converters to accept mzIdentML
  • Integrating Baysian methods such as Fido or EPIFANY.
  • Integrating an optimisation method, maybe using a package like PULP.
  • Further development of benchmarking suite to allow easy comparison of existing algorithms.

ASMS 2022 Poster

ASMS 2022 Poster

Feel free to download the png or pdf of the ASMS 2022 poster on protein inference in python.

Click to Download the poster pdf

NB: Table 1 was removed when trying to improve the app asthetics. I've provided it here for reference

Table 1

App Link!

https://share.streamlit.io/massdynamics/protein-inference/main/pi_playground.py

protein-inference's People

Contributors

jbloomaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.