Git Product home page Git Product logo

mhcquant's Introduction

nf-core/mhcquant nf-core/mhcquant

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo

Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nfcore/mhcquant is a bioinformatics analysis pipeline used for quantitative processing of data dependent (DDA) peptidomics data.

It was specifically designed to analyse immunopeptidomics data, which deals with the analysis of affinity purified, unspecifically cleaved peptides that have recently been discussed intensively in the context of cancer vaccines.

The workflow is based on the OpenMS C++ framework for computational mass spectrometry. RAW files (mzML) serve as inputs and a database search (Comet) is performed based on a given input protein database. FDR rescoring is applied using Percolator based on a competitive target-decoy approach (reversed decoys). For label free quantification all input files undergo identification based retention time alignment (MapAlignerIdentification), and targeted feature extraction matching ids between runs (FeatureFinderIdentification). In addition, a variant calling file (vcf) can be specified to translate variants into proteins that will be included in the database search and binding predictions on specified alleles (alleles.tsv) using MHCFlurry (Class 1) or MHCNugget (Class 2) can be directly run on the output peptide lists. Moreover, if a vcf file was specified, neoepitopes will automatically be determined and binding predictions can also directly be predicted for them.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

overview

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.tsv:

ID	Sample	Condition	ReplicateFileName
1	msrun	tumor	/path/to/msrun.raw|mzML|d

Each row represents a mass spectrometry run in one of the formats: raw, mzML, d

Now, you can run the pipeline using:

nextflow run nf-core/mhcquant
    -profile <docker/singularity/.../institute> \
    --input 'samples.tsv' \
    --fasta 'SWISSPROT_2020.fasta' \
    --outdir ./results

Note

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline summary

Default Steps

By default the pipeline currently performs the following

Identification

  • Identification of peptides in the MS/MS spectra using Comet (CometAdapter)
  • Refreshes the protein references for all peptide hits and adds target/decoy information (PeptideIndexer)
  • Filters peptide/protein identification results on ID based alignment (IDFilter)
  • Merges idXML files of a sample-condition group into one idXML file (IDMerger)
  • Defines extra features for Percolator (PSMFeatureExtractor)
  • Facilitates the input to, the call of and output integration of Percolator (PercolatorAdapter)
  • Filters peptide/protein identification result based on Percolator q-value (IDFilter)
  • Splits merged idXML file into their respective runs again (IDRipper)
  • Uses Comet XCorr instead of percolator q-value as primary score for downstream purposess (IDScoreSwitcher)
  • Keeps peptides observed after FDR filtering in each run and selects the best peptide per run (Pyopenms_IDFilter)

Map alignment

  • Corrects retention time distortions between runs, using information from peptides identified in different runs (MapAlignerIdentification)
  • Applies retention time transformations to runs (MapRTTransformer)

Process features

  • Detects features in MS1 data based on peptide identifications (FeatureFinderIdentification)
  • Group corresponding features across labelfree experiments (FeatureLinkerUnlabeledKD)
  • Resolves ambiguous annotations of features with peptide identifications (IDConflictResolver)

Output

  • Converts XML format to text files (TextExporter)
  • Converts XML format to mzTab files (MzTabExporter)

Additional Steps

Additional functionality contained by the pipeline currently includes:

Input

  • Inclusion of proteins in the reference database (mhcnuggets, mhcflurry, fred2)
  • Create a decoy peptide database from standard FASTA databases (DecoyDatabase)
  • Conversion of raw to mzML files (ThermoRawFileParser)
  • Conversion of tdf (.d) to mzML files (tdf2mzml)
  • Executing the peak picking with high_res algorithm (PeakPickerHiRes)

Additional features for rescoring

  • Retention time prediction (DeepLC)
  • Peak intensity prediction (MS2PIP)

Warning

The refine FDR feature will be evaluated on a large benchmark dataset in the following releases. Consider it as an experimental feature.

Refine FDR

  • This application converts several OpenMS XML formats to mzTab. (MzTabExporter)
  • Predict psm results using mhcflurry to shrink search space (mhcflurry)
  • Facilitates the input to, the call of and output integration of Percolator (PercolatorAdapter)

Warning

The HLA prediction feature is outdated and will be reworked in the following releases

Prediction of HLA class 1 peptides

  • Predict peptides (mhcnuggets, mhcflurry, fred2)
  • Predict possible neoepitopes - when an vcf files is provided (mhcnuggets, mhcflurry, fred2)
  • Predict neoepitopes based on the peptide hits (mhcnuggets, mhcflurry, fred2)
  • Resolve found neoepitopes (mhcnuggets, mhcflurry, fred2)

Output

  • Annotates final list of peptides with their respective ions and charges (IonAnnotator)

Documentation

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

  1. Nextflow installation
  2. Pipeline configuration
  3. Running the pipeline
    • This includes tutorials, FAQs, and troubleshooting instructions
  4. Output and how to interpret the results

Credits

nf-core/mhcquant was originally written by Leon Bichmann from the Kohlbacher Lab. The pipeline was re-written in Nextflow DSL2 and is primarily maintained by Marissa Dubbelaar and Jonas Scheid from Peptide-based Immunotherapy and Quantitative Biology Center in Tübingen.

Helpful contributors:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #mhcquant channel (you can join with this invite).

Citations

If you use nf-core/mhcquant for your analysis, please cite it using the following doi: 10.5281/zenodo.1569909 and the corresponding manuscript:

MHCquant: Automated and Reproducible Data Analysis for Immunopeptidomics

Leon Bichmann, Annika Nelde, Michael Ghosh, Lukas Heumos, Christopher Mohr, Alexander Peltzer, Leon Kuchenbecker, Timo Sachsenberg, Juliane S. Walz, Stefan Stevanović, Hans-Georg Rammensee & Oliver Kohlbacher

Journal of Proteome Research 2019 18 (11), 3876-3884. doi: 10.1021/acs.jproteome.9b00313

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows:

Fred2 Immunoinformatics Toolbox

Schubert B. et al, Bioinformatics 2016 Jul 1;32(13):2044-6. doi: 10.1093/bioinformatics/btw113. Epub 2016 Feb 26

Comet Search Engine

Eng J.K. et al, J Am Soc Mass Spectrom. 2015 Nov;26(11):1865-74. doi: 10.1007/s13361-015-1179-x. Epub 2015 Jun 27.

Percolator

Käll L. et al, Nat Methods 2007 Nov;4(11):923-5. doi: 10.1038/nmeth1113. Epub 2007 Oct 21.

Retention time prediction

Bouwmeester R. et al, Nature Methods 2021 Oct;18(11):1363-1369. doi: 10.1038/s41592-021-01301-5

MS2 Peak intensity prediction

Gabriels R. et al, Nucleic Acids Research 2019 Jul;47(W1):W295-9. doi: 10.1093/nar/gkz299

Identification based RT Alignment

Weisser H. et al, J Proteome Res. 2013 Apr 5;12(4):1628-44. doi: 10.1021/pr300992u. Epub 2013 Feb 22.

Targeted peptide quantification

Weisser H. et al, J Proteome Res. 2017 Aug 4;16(8):2964-2974. doi: 10.1021/acs.jproteome.7b00248. Epub 2017 Jul 19.

MHC affinity prediction

O'Donnell T.J., Cell Syst. 2018 Jul 25;7(1):129-132.e4. doi: 10.1016/j.cels.2018.05.014. Epub 2018 Jun 27.

Shao X.M., Cancer Immunol Res. 2020 Mar;8(3):396-408. doi: 10.1158/2326-6066.CIR-19-0464. Epub 2019 Dec 23.

mhcquant's People

Contributors

apeltzer avatar ewels avatar fu avatar ggabernet avatar jonasscheid avatar kevinmenden avatar leon-bichmann avatar lkuchenb avatar marissadubbelaar avatar maxulysse avatar nf-core-bot avatar steffenlem avatar susijo avatar sven1103 avatar wackero avatar zethson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mhcquant's Issues

Variable modifications

Specifying multiple variable modifications results in an error since the OpenMS CometAdapter takes space separated lists as input parameter whereas nextflow provides a single string.

Missing output documentation

e.g. there is a output_docs file opened, but no documentation is created in general.
Also make sure to use a channel for the output_docs feature if possible to allow staging of the documentation template to different storage providers by nextflow.

problem --fixed_modification

Description of the bug

Something went wrong with the conversion, so the --fixed_modificationparameter is standard on Carbamidomethyl (C).

Command used and terminal output

-

Relevant files

System information

U / B / X in fasta

mhcflurry will fail if any of these are included in a fasta file : U / B /X

Remove second environment

Hi @Leon-Bichmann !

we need to get all tools in a single environment, thus need to recompile all tools required by this pipeline against the most current CXX environment in bioconda/conda-forge. Can you start listing up tools in here, then we could for example start coming up with a strategy on how to achieve this?

This second environment causes so many downstream issues, that we should get rid of this asap:

  • Nextflow Conda environment support is broken this way
  • Reproducibility for users is quite difficult, as they need two environments set up

....

We should get this rolling for 1.3.X already...

boolean parameters default == false

All boolean parameters such as include_proteins_from_vcf should be false per default, since setting them to false is not possible at the moment.

Changes to the params documentation

Description of feature

I looked at the parameter documentation and noted some things that I would propose to change. I will do a bunch of commits in the following pull request to implement these changes.

  • --skip_decoy_generation: If enabled, how is own decoy database provided to the pipeline? Maybe add an example of the file format which the DB should follow? Also: "will then prevent appending variants..."

  • --pick_ms_levels: What exactly does this mean? Out of all the levels (whatever that means), you have to provide the one that is not centroided so that it can be centroided?

  • --run_centroidisation: This is confusing a bit; should ms_level not by definition be a level that is not centroided? Also: Rephrase the sentence to indicate what is happening, e.g. "Set if the specified ms_level is not ctrd. to centroid it" or sth like that; Also, default=false?

  • --peptide_min_length to --fragment_bin_offset: "Specify .. TO BE considered after..." (or not? Maybe you have to specify the settings used in previous programs?) -->this also applies to the variant options

  • --number_mods: maybe rename to max_number_mods or max_accepted_mods or sth like that?

  • --num_hits: maybe rename to --number_reported_hits (or at least to number_hits to make it analogous to number_mods)

  • --digest_mass_range: maybe rename to accepted_mass_range? Unless the digest thing is relevant

  • --prec_charge: rename accepted_prec_charge_range or accepted_charge_range?

  • --fixed_mods AND --variable_mods: is this a choice parameter? If so, possibly list options?

  • --remove_precursor_peak: Description states something else...

  • --description_correct_features: Does this require the summed up descriptor values? I.e. 1+2+4+8?

  • --variant_indel/fs/snp: Change to: "Set this option to not consider..."

Make ion annotation optional

Description of feature

Since the ion annotation module outputs potentially big tsv files (especially *_all_peaks.tsv) it would be better make this output optional and default false

PASS Filter for Variants in vcf reader

The vcf reader function in MHCquant currently doesn't filter provided vcfs by PASS.

Hence, this has to be done manually by the user at the moment but should be easily fixed in a new release.

Inconsistent annotation fdr_level parameters

Description of feature

It might be nothing major, but I noticed that the three parameters are inconsistent with the dividers used.

  • peptide_level_fdrs
  • psm-level-fdrs
  • protein_level_fdrs

The suggestion would be to change psm-level-fdrs > psm_level_fdrs and test this in the pipeline

[CI] Travis build doesn't find Java

ERROR: Cannot find Java or it's a wrong version -- please make sure that Java 8 is installed

NOTE: Nextflow is trying to use the Java VM defined by the following environment variables:

 JAVA_CMD: /usr/local/lib/jvm/openjdk11/bin/java

 JAVA_HOME: /usr/local/lib/jvm/openjdk11

The command "wget -qO- get.nextflow.io | bash" failed and exited with 1 during .

I assume that we need to bump the minimum nextflow version to at least 19.04.0 .

Has been suggested by @ewels on nf-core slack on October 11th.

Default runtime limit for predicting possible neoepitopes is too short

The default runtime limit for the processes:
"predict_possible_neoepitopes" and "predict_possible_classII_neoepitopes"

is too low since querying the Biomart API for large vcf files can exceed this limit.

However, currently it is possible to raise the runtime by specifying a runtime profile with the -c parameter.

vcf support

Next Release should integrate vcf support for mutated neoantigen search based on the (Fred2 bioconda package).

Error in db_search_comet: Profile data provided but centroided MS2 spectra expected

Hi,
I am trying to run MHCquant but I am getting the following error:

Caused by: Process db_search_comet (1) terminated with an error exit status (8)
Progress of 'loading chromatogram list':
-- done [took 0.01 s (CPU), 0.00 s (Wall)] --
Error: Unexpected internal error (Error: Profile data provided but centroided MS2 spectra expected. To enforce processing of the data set the -force flag.)
Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap

The full error log can be found here: error_log.txt

Any idea?
Thank you very much!

Problem with step 11, run Percolator

I would like to thank the team for the amazing work and the great software, however, I have tried to run the pipeline, everything went well, until step 11 where execution ends with the following error:
"""
Error executing process > 'run_percolator (1)'

Caused by:
Process run_percolator (1) terminated with an error exit status (8)

Command executed:

OMP_NUM_THREADS=6
PercolatorAdapter -in s1_all_ids_merged_psm.idXML
-out s1_all_ids_merged_psm_perc.idXML
-seed 4711
-trainFDR 0.05
-testFDR 0.05
-threads 6
-enzyme no_enzyme
-peptide-level-fdrs
-subset-max-train 0
-doc 0 \

Command exit status:
8

Command output:
Loading input file: s1_all_ids_merged_psm.idXML
Merging peptide ids.
Merging protein ids.
Error: Unexpected internal error (Prefix of string '6file=4959' successfully converted to an integer value. Additional characters found at position 2)

Command wrapper:
Loading input file: s1_all_ids_merged_psm.idXML
Merging peptide ids.
Merging protein ids.
Error: Unexpected internal error (Prefix of string '6file=4959' successfully converted to an integer value. Additional characters found at position 2)
"""
I have tried to run the PercolatorAdapter from the openMS docker container with the s1_all_ids_merged_psm.idXML in he intermediate results also ending up in the same results.

I would truly appreciate your help, thanks a lot

MapAlignerIdentification step error.

-[nf-core/mhcquant] Pipeline completed with errors-
Error executing process > 'align_ids'

Caused by:
  Process `align_ids` terminated with an error exit status (8)

Command executed:

  MapAlignerIdentification -in train_sample_66_ms_run_4.mzML_idx_fdr_filtered.idXML \
                           -trafo_out train_sample_66_ms_run_4.mzML_idx_fdr_filtered.trafoXML \
                           -model:type linear \
                           -algorithm:max_rt_shift 300

Command exit status:
  8

Command output:
  Progress of 'loading input files':

  -- done [took 0.00 s (CPU), 0.01 s (Wall)] --
  Warning: Value of parameter 'min_run_occur' (here: 2) is higher than the number of runs incl. reference (here: 1). Using 1 instead.
  Progress of 'aligning maps':
  Error: Unexpected internal error (No reference RT information left after filtering)

Command wrapper:
  Progress of 'loading input files':

  -- done [took 0.00 s (CPU), 0.01 s (Wall)] --
  Warning: Value of parameter 'min_run_occur' (here: 2) is higher than the number of runs incl. reference (here: 1). Using 1 instead.
  Progress of 'aligning maps':
  Error: Unexpected internal error (No reference RT information left after filtering)

Work dir:
  /projectsp/f_jx76_1/xiaolong/temp/MSV000082648/20200326comet/train_sample_66_ms_run_4.mod1/work/5e/dce3d87a55eac31852aad2c4ad3398

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Add Zenodo DOI for release to main README on master

Would be good to add the Zenodo DOI for the release to the main README of the pipeline in order to make it citable. You will have to do this via a branch pushed to the repo in order to directly update master. See PR below for example and file changes:
nf-core/atacseq#38

See https://zenodo.org/record/3359618#.XVZ0bOhKhPY

Web-hooks are already set-up for this repo to have a unique Zenodo DOI generated everytime a new version of the pipeline is released. Would be good to add this in after every release 👍

Issue annotation of the HLA alleles in parameters

Description of the bug

The current annotation of A 03:01;A 68:01;B 27:05;B 35:03;C 02:02;C 04:01 give a misconception on how the annotation of the HLA alleles should be.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Warn if alleles which are not supported by MHCFlurry are specified

Hi,

I think that it would be a good idea to warn users if a specific allele or multiple alleles are not supported by MHCFlurry.
I propose that we enable "echo true" for processes: predict_psms and predict peptides. I could then adapt the python scripts to print a warning if an unsupported allele was detected.

Alternative 1: Instead of enabling echo we could redirect stdout, but imo this is overkill here.
Alternative 2: We validate the alleles inline in the nextflow script. I think this unnecessarily clutters the nextflow script.

What do you think @Leon-Bichmann ?

Process labeled data

Description of feature

To process labeled data, we could let Comet search also for labeled peptides by specifying them as variable modifications using the unimod nomenclature. A large set of modifications is supported by the CometAdapter, however they are still fixed to distinct accessions and not customizable afaik.

Suggestions welcome

Pipeline crashes when setting the --skip_quantification flag

Check Documentation

I have checked the following places for your error:

Description of the bug

Pipeline does not run any process when setting the --skip_quantification flag.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
nextflow run nf-core/mhcquant -r 2.0.0 \
--input 'samples.tsv' \
--fasta 'uniprot-proteome_UP000005640.fasta' \
--allele_sheet 'alleles_new.tsv'  \
--predict_class_1  \
--skip_quantification \
--max_time '240.h' \
-profile cfc \
-resume \
-c config.conf
  1. See error:
------------------------------------------------------
WARN: There's no process matching config selector: get_software_versions
[-        ] process > NFCORE_MHCQUANT:MHCQUANT:INPUT_CHECK:SAMPLESHEET_CHECK -
[-        ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_DECOYDATABASE          -
[-        ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_THERMORAWFILEPARSER    -
[-        ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_COMETADAPTER           -
[-        ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_PEPTIDEINDEXER         -
WARN: Access to undefined parameter `singularity_pull_docker_container` -- Initialise it to a default value eg. `params.singularity_pull_docker_container = some_value`
No such variable: Exception evaluating property 'idXML' for nextflow.script.ChannelOut, Reason: groovy.lang.MissingPropertyException: No such property: idXML for class: groovyx.gpars.dataflow.DataflowBroadcast

 -- Check script '/home-link/iizle01/.nextflow/assets/nf-core/mhcquant/./workflows/mhcquant.nf' at line: 287 or see '.nextflow.log' file for more details

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware: HPC
  • Executor: slurm
  • OS: CentOS Linux
  • Version: CentOS Linux release 7.9.2009

Nextflow Installation

  • Version: 21.04.3

Container engine

  • Engine: singularity version 3.7.4-1.el7

Ion identification within a mass spectrum

Hello !

I am very new to MS/MS and MHCquant.
I would like to get the peak identification in my mass spectrum when I visualize them, to know which peaks were considered as significant for peptide identification and which ion corresponds to which peak (cf image).
I guess there is something within the pipeline that identifies such peaks: is there a way to retrieve this information somewhere ?

For now, I just retrieve the scan ID for each identified peptide, and I visualize the associated mass spectrum on XCalibur.

Best,
Paul

image

Missing mzTab files

Description of the bug

.mztab files were missing for me in the Intermediate_Results directory

Command used and terminal output

.

Relevant files

.

System information

.

bioconda dependency conflict

When trying to install openms and percolator in the same conda environment i get a number of conflicts:

If openms2.3 and percolator3.1 is selected:
libxerces-c-3.1 will be installed but openms requires libxerces-c-3.2
the build works but all tools fail

if openms2.4 and percolator3.1 is selected:
the build fails because of a dependency conflict
conflict:
openms=2.4 -> boost[version='>=1.64.0,<1.64.1.0a0']
percolator -> boost==1.62
percolator -> xerces-c==3.1.2

Some discussions or attempts to fix this can be found here:
bioconda/bioconda-recipes#11871
bioconda/bioconda-recipes#12060

Comet: theoretical_fragment_ion parameter missing

Description of feature

The “theoretical_fragment_ion” parameter instructs Comet whether or not to include signal from the flanking bins in the cross-correlation calculation. See Comet paper

They also state that High-Resolution Runs should have theoretical_fragment_ion = 0. However in the nextflow.config this parameter is not specified. This leads to running Comet with theoretical_fragment_ion = 1 since that is the default. See Comet doc.

Problem with Mhcquant

Hi, I'm trying to use mhcquant with workflow docker profile but when I try to star the pipeline appears this error message:
WARN: unknown format for entry /Users/presta/null in provided sample sheet. ignoring line.
Someone could help me?
Thank you

protein fasta file with empty first line

ERROR ~ Error executing process > 'generate_decoy_database (1)'

Caused by:
Process generate_decoy_database (1) terminated with an error exit status (3)

Command executed:

DecoyDatabase -in UP000005640_9606_reviewed_added_vcf.fasta
-out UP000005640_9606_reviewed_added_vcf_decoy.fasta
-decoy_string DECOY_
-decoy_string_position prefix

Command exit status:
3

Command output:
Version 2.4.0-HEAD-2018-10-26 of DecoyDatabase is available at www.OpenMS.de
Warning: Only one FASTA input file was provided, which might not contain contaminants.You probably want to have them! Just add the contaminant file to the input file list 'in'.
Error: Unable to read file (Error while parsing FASTA file! The first entry could not be read! Please check the file! in: )

Command wrapper:
Version 2.4.0-HEAD-2018-10-26 of DecoyDatabase is available at www.OpenMS.de
Warning: Only one FASTA input file was provided, which might not contain contaminants.You probably want to have them! Just add the contaminant file to the input file list 'in'.
Error: Unable to read file (Error while parsing FASTA file! The first entry could not be read! Please check the file! in: )

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details

RT alignment mixup

If previous process of database search are not synchrone, the order of id files can be switched leading to the application of a wrong retention time transformation to each file.

Include OpenMS FileInfo to check for valid mzML files

Description of feature

We keep having issues with malformed mzML files and retrieve non-descriptive errors by the CometAdapter. Therefore it would be of benefit to check the input (or converted) mzML files before processing them further. For this FileInfo in openms comes in handy.

Remove TOC `Usage.md`

Description of feature

The Table of contents can be removed from the Usage.md to adhere to the structure of other pipeline docs.

parsing of VEP annotated VCF files fails

Description of the bug

VEP annotated VCF files caused an error during parsing in variants2fasta.py here .In more detail: The field corresponding to CDS_position is filled with the string {position/length}, but the program expects an parsable integer here.

Command used and terminal output

nextflow run nf-core/mhcquant -r 2.2.0 -profile cfc \
 --input '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/samplesheet_class1_mhcquant.tsv' \
 --fasta '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/UP000005640_9606.fasta' \
 --allele_sheet '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/allele_sheet_mhcquant.tsv' \
 --peptide_max_length 12 \
 --predict_class1 \
 --fdr_threshold 0.1 \
 --predict_RT \
 --include_proteins_from_vcf \
 --vcf_sheet '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/vcf_sheet_mhcquant.tsv' \
 --variant_annotation_style 'VEP' \
 --variant_indel_filter \
 --variant_reference 'GRCH37' \
 --max_time '240.h'


Command error:
  Using TensorFlow backend.
  Traceback (most recent call last):
    File "/home-link/qeasc01/.nextflow/assets/nf-core/mhcquant/bin/variants2fasta.py", line 228, in <module>
      sys.exit(main())
    File "/home-link/qeasc01/.nextflow/assets/nf-core/mhcquant/bin/variants2fasta.py", line 172, in main
      variants = read_variant_effect_predictor(args.vcf, gene_filter=protein_ids)
    File "/home-link/qeasc01/.nextflow/assets/nf-core/mhcquant/bin/variants2fasta.py", line 76, in read_variant_effect_predictor
      coding[transcript_id] = MutationSyntax(transcript_id, int(transcript_pos)-1,
  ValueError: invalid literal for int() with base 10: '144/3999'

Relevant files

No response

System information

No response

Issue ion annotator feature

Description of the bug

Normal run of the pipeline leads to the following error
Process NFCORE_MHCQUANT:MHCQUANT:PYOPENMS_IONANNOTATOR input file name collision -- There are multiple input files for each of the following file names

Command used and terminal output

nextflow run nf-core/mhcquant -r dev -profile cfc --input  --outdir ./results --fasta --digest_mass_range 800:2500 --activation_method CID --prec_charge 2:3 --fdr_threshold 0.05 --number_mods 3 --precursor_mass_tolerance 5 --fragment_mass_tolerance 0.02 --num_hits 1 --peptide_min_length 8 --peptide_max_length 12 --max_rt_alignment_shift 300 --max_time '240.h' --email [email protected]

Relevant files

No response

System information

  • Nextflow version: 22.04.4
  • Hardware: HPC
  • Container engine: Singularity
  • Version of nf-core/mhcquant: dev

Fixed modifications

Specifying a fixed modification such as 'Carbamidomethyl (C)' has failed since quotes are missing surrounding the parameter:

     CometAdapter  -in ${mzml_file} \\

	                   (...)

	                   -variable_modifications '${params.variable_mods}' \\
	                   -fixed_modifications ${params.fixed_mods} \\

	                   (...)

--> .command.sh: line 17: syntax error near unexpected token `('

Profile test and test_full cannot access ref database

Description of the bug

When running the test or test_full profile mhcquant terminates with

Error: File not found (the file 'test.fasta' could not be found)

The accession of the reference database in the test data repository is erroneous.

Furthermore we need to increment the amount of tests since the current ones do not cover all of the feature implemented (e.g. ion annotation)

Command used and terminal output

nextflow run nf-core/mhcquant -profile test,cfc --outdir test

Output documentation

Description of the bug

There are some typos in the output doc and other things I would suggest to change.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Issue in the MzTabExporter

Description of the bug

I experienced the following issue on the dev branch, and I would like to resolve this before release (of course), but I have no idea where to look for this. I noticed a similar issue request on OpenMS
The major difference is that I tried to update OpenNMS from 2.6.0 to 2.8.0 on the dev branch, I will go over the changes between these two versions, but some input would be highly appreciated.

The following error is returned:

Caused by:
  Process `NFCORE_MHCQUANT:MHCQUANT:PROCESS_FEATURE:OPENMS_MZTABEXPORTER_QUANT (QAMTL477AO_J314_Pre_T39L243_J314_Pre_T39L243)` terminated with an error exit status (8)

Command executed:

  MzTabExporter -in QAMTL477AO_J314_Pre_T39L243_J314_Pre_T39L243_resolved.consensusXML \
      -out QAMTL477AO_J314_Pre_T39L243_J314_Pre_T39L243.mzTab \
      -threads 2 \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_MHCQUANT:MHCQUANT:PROCESS_FEATURE:OPENMS_MZTABEXPORTER_QUANT":
      openms: $(echo $(FileInfo --help 2>&1) | sed 's/^.*Version: //; s/-.*$//' | sed 's/ -*//; s/ .*$//')
  END_VERSIONS

Command exit status:
  8

Command output:
  No fractions annotated in consensusXML. Assuming unfractionated.
  Error: Unexpected internal error (PSM controllerType=0 controllerNumber=1 scan=9440 does not map to an MS file registered in the quantitative metadata. Check your merging and filtering steps and/or report the issue, please.)
  <No fractions annotated in consensusXML. Assuming unfractionated.> occurred 4 times

Command wrapper:
  nxf-scratch-dir node003:/scratch/117012/nxf.8kITDHhTPh
  No fractions annotated in consensusXML. Assuming unfractionated.
  Error: Unexpected internal error (PSM controllerType=0 controllerNumber=1 scan=9440 does not map to an MS file registered in the quantitative metadata. Check your merging and filtering steps and/or report the issue, please.)
  <No fractions annotated in consensusXML. Assuming unfractionated.> occurred 4 times

I also looked into the file of interest and I could file a spectrum_reference annotated.

<consensusElement id="e_11275396928539264273" quality="3.574097" charge="5">
			<centroid rt="2886.454795973814726" mz="527.123682987430925" it="3.764833e06"/>
			<groupedElementList>
				<element map="0" id="28923268865551816" rt="2897.00668984044114" mz="527.123682987430925" it="4.085203e06" charge="5"/>
				<element map="1" id="14924012554702823270" rt="2882.901894744373749" mz="527.123682987430925" it="3.478242e06" charge="5"/>
				<element map="2" id="13437987073385272035" rt="2880.484798142305408" mz="527.123682987430925" it="3.642787e06" charge="5"/>
				<element map="3" id="1164487179193034968" rt="2885.425801168138605" mz="527.123682987430925" it="3.8531e06" charge="5"/>
			</groupedElementList>
			<PeptideIdentification identification_run_ref="PI_0" score_type="q-value" higher_score_better="false" significance_threshold="0" MZ="527.1254486319" RT="2874.61804275669" spectrum_reference="controllerType=0 controllerNumber=1 scan=9440" >
				<PeptideHit score="0.0126048" sequence="ANANSRQQIRKLIKDGLIIRKPV" charge="5" aa_before="I I I I I I I" aa_after="T T T T T T T" start="32 32 32 29 52 52 32" end="54 54 54 51 74 74 54" protein_refs="PH_219 PH_1709 PH_1272 PH_1265 PH_1253 PH_937 PH_927">
					<UserParam type="string" name="target_decoy" value="target"/>
					<UserParam type="string" name="MS:1002258" value="11"/>
					<UserParam type="string" name="MS:1002259" value="176"/>
					<UserParam type="string" name="num_matched_peptides" value="4252"/>
					<UserParam type="int" name="isotope_error" value="0"/>
					<UserParam type="float" name="MS:1002252" value="2.234"/>
					<UserParam type="float" name="MS:1002253" value="1.0"/>
					<UserParam type="float" name="MS:1002254" value="0.0"/>
					<UserParam type="float" name="MS:1002255" value="25.699999999999999"/>
					<UserParam type="float" name="MS:1002256" value="1.0"/>
					<UserParam type="float" name="MS:1002257" value="0.0494"/>
					<UserParam type="string" name="protein_references" value="non-unique"/>
					<UserParam type="float" name="COMET:deltCn" value="1.0"/>
					<UserParam type="float" name="COMET:deltLCn" value="0.0"/>
					<UserParam type="float" name="COMET:lnExpect" value="-3.00780485478826"/>
					<UserParam type="float" name="COMET:lnNumSP" value="8.355144739461839"/>
					<UserParam type="float" name="COMET:lnRankSP" value="0.0"/>
					<UserParam type="float" name="COMET:IonFrac" value="0.0625"/>
					<UserParam type="float" name="MS:1001492" value="0.412529"/>
					<UserParam type="float" name="MS:1001491" value="0.0126048"/>
					<UserParam type="float" name="MS:1001493" value="0.245321"/>
				</PeptideHit>
				<UserParam type="int" name="id_merge_index" value="4"/>
				<UserParam type="string" name="FFId_category" value="internal"/>
				<UserParam type="int" name="map_index" value="0"/>
				<UserParam type="string" name="feature_id" value="11275396928539264273"/>
			</PeptideIdentification>
			<UserParam type="string" name="feature_id" value="11275396928539264273"/>
		</consensusElement>

Command used and terminal output

nextflow run nf-core/mhcquant \
-r dev \
-profile cfc \
--input input.tsv \
--outdir ./results \
--fasta *.fasta \
--digest_mass_range 800:2500 \
--activation_method CID \
--prec_charge 2:3 \
--fdr_threshold 0.05 \
--number_mods 3 \
--precursor_mass_tolerance 5 \
--fragment_mass_tolerance 0.02 \
--num_hits 1 \
--peptide_min_length 8 \
--peptide_max_length 12 \
--max_rt_alignment_shift 300 \
--max_time '240.h' \
--email [email protected]

Relevant files

nextflow.log

System information

  • Nextflow version: 22.04.4
  • Hardware: HPC
  • Container engine: Singularity
  • Version of nf-core/mhcquant: dev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.