Git Product home page Git Product logo

metapuf's Introduction

MetaPUF


An approach to integrate metagenomics, metatranscriptomics and metaproteomics data found in public resources such as MGnify (for metagenomics/metatranscriptomics) and the PRIDE database (for metaproteomics). When these omics techniques are applied to the same sample, their integration offers new opportunities to understand the structure (metagenome) and functional expression (metatranscriptome and metaproteome) of the microbiome.

Installation

You need a working installation of Snakemake. Then:

git clone <this-repo>

The pipeline uses conda environments to manage dependencies, which are handled automatically if you run snakemake with the --use-conda flag.

It also relies on some tools (ThermoRawFileParser, SearchGui and PeptideShaker) which do not have conda packages or docker images available for the versions we used. These tools are downloaded on-the-fly by snakemake, so you do not need to install them separately.

Example usage (test data-set)

There is a small test-data set, using a few assemblies from MGnify and two RAW files from PRIDE. To fetch the (~GB size) RAW files, which are too big for this git repository:

./test-data/pride/fetch-pride-test-data.sh

This downloads two RAW files into test-data/pride/.

Then:

conda activate snakemake # (assuming you installed snakemake with conda, into an env called snakemake)
cd MetaPUF
snakemake --cores 4 --use-conda

This will run the pipeline on the small dataset, and put results into ../test-run.

Real usage (configuration)

Edit the config/config.proteomics.yaml and sample_info.csv files to point the pipeline at real data. sample_info.csv is the mapping of MGnify to PRIDE datasets, and in the config parameters.input_dir and parameters.raw_dir refer to the MGnify and PRIDE data folders respectively.

Tips for running Snakemake

  • You can run a dry-run to check for any syntax errors
 Snakemake  -np
  • To run the workflow
 Snakemake --cores 4 --use-conda
  • Using LSF on an HPC cluster:
bsub -n 4 -R "rusage[mem=4096]" -J metapuf -u $USER -o job.log -e job.err snakemake --cores 4 --use-conda
  • Tips: IF the pipeline got collapsed during running, you can always try to run a dry-run Snakemake -np first to check how many rules have been successful executed, and if you are sure that some files are generated correctly, you can use snakemake --cleanup-metadata <filenames> to skip these files to be re-generated. However, sometimes snakemake --cleanup-metadata <filenames> doesn't work, you can also try to manually delete the .snakemake/incomplete directory.

Development installation

git clone https://github.com/PRIDE-reanalysis/MetaPUF.git
cd MetaPUF
conda activate snakemake  # or another conda/venv if you prefer
pip install ".[dev,docs]"
pre-commit install

This installs the development requirements, and installs the pre-commit hooks which format the code correctly while commiting changes. You can also manually format the code using black .. It also installs mkdocs, which is used to build the documentation.

Editing docs

Change the markdown files in the docs/ folder. Then mkdocs serve to view the documentation site locally.

Core contributors and collaborators

Code of Conduct

As part of our efforts toward delivering open and inclusive science, we follow the Contributor Covenant Code of Conduct for Open Source Projects.

How to cite

Copyright notice

metapuf's People

Contributors

casrssk avatar kaurs-ebi avatar sandyrogers avatar shengbokw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.