Git Product home page Git Product logo

denovopipeline's Introduction

denovopipeline

denovopipeline uses multiple de novo sequencing algorithms (pNovo3, SMSNet, Novor, DeepNovo, PointNovo, Casanovo, ALPS) for identification and assembly of peptide sequences by tandem mass spectrometry.

denovopipeline was used to evaluate multiple de novo sequencing algorithms on monoclonal antibody data: Denis Beslic, Georg Tscheuschner, Bernhard Y Renard, Michael G Weller, Thilo Muth, Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly, Briefings in Bioinformatics, 2022;, bbac542, https://doi.org/10.1093/bib/bbac542

Click here for all citations
  • Novor:

  • pNovo 3:

    • Yang, H., Chi, H., Zeng, W. F., Zhou, W. J., & He, S. M. (2019). pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework. Bioinformatics (Oxford, England), 35(14), i183–i190. https://doi.org/10.1093/bioinformatics/btz366
  • DeepNovo:

    • Tran, N. H., Zhang, X., Xin, L., Shan, B., & Li, M. (2017). De novo peptide sequencing by deep learning. Proceedings of the National Academy of Sciences of the United States of America, 114(31), 8247–8252. https://doi.org/10.1073/pnas.1705691114
  • SMSNet:

    • Karunratanakul, K., Tang, H. Y., Speicher, D. W., Chuangsuwanich, E., & Sriswasdi, S. (2019). Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework. Molecular & cellular proteomics : MCP, 18(12), 2478–2491. https://doi.org/10.1074/mcp.TIR119.001656
  • PointNovo:

    • Qiao, R., Tran, N.H., Xin, L. et al. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. Nat Mach Intell 3, 420–425 (2021). https://doi.org/10.1038/s42256-021-00304-3
  • Casanovo:

    • Yilmaz, M., Fondrie, W. E., Bittremieux, W., Oh, S. & Noble, W. S. De novo mass spectrometry peptide sequencing with a transformer model. in Proceedings of the 39th International Conference on Machine Learning - ICML '22 vol. 162 25514–25522 (PMLR, 2022). https://proceedings.mlr.press/v162/yilmaz22a.html
  • ALPS:

    • Tran, N. H., Rahman, M. Z., He, L., Xin, L., Shan, B., & Li, M. (2016). Complete De Novo Assembly of Monoclonal Antibody Sequences. Scientific reports, 6, 31730. https://doi.org/10.1038/srep31730

How to use

Download pre-trained models

To download the pre-trained models for PointNovo, SMSNet and DeepNovo use following link: https://drive.google.com/drive/folders/1LFmez1yq7eXNTNs7IWhYy9vQpLzD8rLI?usp=sharing

Move each corresponding model to the resources/ directory of each de novo sequencing tool. Move knapsack.npy to resources/PointNovo and resources/DeepNovo.

Format raw data

De novo sequencing requires mgf (Mascot generic format) files.

You can use Proteowizard msconvert to convert your .raw/.mzxml/.mzml files to .mgf format. Proteowizard can be simply installed using conda.

msconvert preformatted_spectra.raw --mgf --filter "peakPicking vendor" --filter "zeroSamples removeExtra"--filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"

Your .mgf file should now look like this

START IONS
TITLE= Run: antibody_tryptic_1, Index: 745, Scan: 756
RTINSECONDS=199.46399
PEPMASS=593.2639
CHARGE=2+
145.9389801 163.0
145.9406433 490.0
145.9423218 762.0
...
END IONS

Reformat mgf file

Another reformatting operation is necessary, because certain tools ignore the old indexing and do not work with predefined IDs. Spectrum indices and scan IDs are changed to integers from 1 to N. Information on old IDs is preserved in the TITLE line.

python src/main.py reformatMGF --input YOURDATA.mgf --output YOURDATA_reformatted.mgf

This will produce two .mgf files. One called YOURDATA_reformatted_deepnovo.mgf for DeepNovo and PointNovo and another one called YOURDATA_reformatted.mgf for all other tools. The file for DeepNovo includes the 'SEQ=' line, which is necessary for DeepNovo to run.

Your final *_reformatted.mgf file should now look like this.

START IONS
TITLE=Run: Light-Chain-Trypsin-1, Index: 1, Old index: 3175, Old scan: 3176
PEPMASS=404.12344
CHARGE=2+
SCANS=1
RTINSECONDS=852.852
SEQ=AAAAAA
145.9389801 163.0
...
END IONS

It is very important that your files use the same formatting. Otherwise, the de novo sequencing and summary step will not work correctly.

Build Conda Environments

Since DeepNovo, SMSNet and Python have different requirements regarding their Python version and other dependencies, we recommend using conda to build virtual environments.

conda env create -n deepnovo -f envs/requirements_deepnovo.yml
conda env create -n smsnet -f envs/requirements_smsnet.yml
conda env create -n pointnovo -f envs/requirements_pointnovo.yml
conda env create -n casanovo -f envs/requirements_casanovo.yml
conda env create -n denovopipeline -f envs/requirements_denovopipeline.yml

Run de novo sequencing tools

Novor will be executed by using DeNovoCLI from DeNovoGUI. It is necessary to provide a parameter file. We recommend using the instructions from DeNovoCLI.

DeepNovo, SMSNet and PointNovo need to be run separately, because they use different dependencies. We provide some pre-trained models, but it is recommended to train models yourself. You can change the model each DeepLearning Tool is using by the command line arguments --smsnet_model, --deepnovo_model and --pointnovo_model

Important: PointNovo and DeepNovo require the *_reformatted_deepnovo.mgf, while SMSNet uses the *reformatted.mgf as input for the prediction.

pNovo3 can only run on Windows and does not work within the pipeline. You can run it separately by following the instructions on its website and put its final output in the results directory.

Use following commands

conda activate smsnet
python src/main.py denovo --input example_dataset/YOURDATA_reformatted.mgf --output example_dataset/results --denovogui 1 --smsnet 1


conda activate deepnovo
python src/main.py denovo --input example_dataset/YOURDATA_reformatted_deepnovo.mgf --output example_dataset/results --deepnovo 1


conda activate pointnovo
python src/main.py denovo --input example_dataset/YOURDATA_reformatted_deepnovo.mgf --output example_dataset/results --pointnovo 1


conda activate casanovo
python src/main.py denovo --input example_dataset/YOURDATA_reformatted.mgf --output example_dataset/results --casanovo 1

Postprocessing

After having used all desired utilized all denovo tools, use denovo_summary to generate the summary file. You need to specify the directory where all the de novo results are stored and provide your initial reformatted mgf file to correctly assign the predictions to each spectrum. Additionally, you can specify a feature file from Peptide Shaker to compare your de novo sequencing results with Database results. We recommend using DeNovoGUI and PeptideShaker and exporting the "Default PSM Report with non-validated matches".

conda activate denovopipeline
python src/main.py summary --input example_dataset/YOURDATA_reformatted.mgf --results example_dataset/results/

The summary file will be generated in your results directory and include Spectrum Title, Peptide Prediction, Peptide Score, Single Amino Acid score for each tool. Using database results will also generate information about missing cleavages and noise factor in your spectrum.

Assembly results

For generating additional stats and assemble peptide predictions to contigs using ALPS:

conda activate denovopipeline
python src/main.py assembly --input example_dataset/results/summary.csv

The command will split up the summary file and generate contigs for each tool in results/ALPS_Assembly. Additionally, it will also generate CSVs with information about the Peptide Recall, AA Recall, AA Precision.

denovopipeline's People

Contributors

denisbeslic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

denovopipeline's Issues

pNovo and search parameters

Hey Denis,

Thank you very much for providing the pipeline! I have couple of *.mgffiles and trying to run the different de-novo algorithms of them. Regarding the de-novo algorithms, I have couple of questions:

  • How did you use pNovo? I installed it from the pfind website and the installation worked as usual, after opening the tool (pNovo3_Search.exe) I tried to run small toy example, I exactly did the same steps as in the user guide from pFind by choosing the input file and selecting the output dir path (all parameters are kept as default), after couple of seconds the progress output window shown that the process is completed but the search button still loading for couple of hours without writing any results (only empty results files e.g. result.res….etc.). I tested the tool with DeNovoGUI and everything worked very well and fast, but the output file is only the pNovo.res, which contains all searching result. Your pipeline is expecting the file result.res, which records the top-one PSM per spectrum. Is there any way to filter out the pNovo.res or to use it in your pipeline?
  • As I am using different files, there are different search parameters/modifications. By running Novor, Casanovo and pNovo3(using DeNovoGUI), I had no problem with selecting these parameters. Is there any way to set parameters in SMSNet and DeepNovo? In PointNovo should I just change the parameters in resources/PointNovo/config.py?

Best,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.