nf-core / mhcquant Goto Github PK
View Code? Open in Web Editor NEWIdentify and quantify MHC eluted peptides from mass spectrometry raw data
Home Page: https://nf-co.re/mhcquant
License: MIT License
Identify and quantify MHC eluted peptides from mass spectrometry raw data
Home Page: https://nf-co.re/mhcquant
License: MIT License
The default runtime limit for the processes:
"predict_possible_neoepitopes" and "predict_possible_classII_neoepitopes"
is too low since querying the Biomart API for large vcf files can exceed this limit.
However, currently it is possible to raise the runtime by specifying a runtime profile with the -c parameter.
I have checked the following places for your error:
Pipeline does not run any process when setting the --skip_quantification flag.
Steps to reproduce the behaviour:
nextflow run nf-core/mhcquant -r 2.0.0 \
--input 'samples.tsv' \
--fasta 'uniprot-proteome_UP000005640.fasta' \
--allele_sheet 'alleles_new.tsv' \
--predict_class_1 \
--skip_quantification \
--max_time '240.h' \
-profile cfc \
-resume \
-c config.conf
------------------------------------------------------
WARN: There's no process matching config selector: get_software_versions
[- ] process > NFCORE_MHCQUANT:MHCQUANT:INPUT_CHECK:SAMPLESHEET_CHECK -
[- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_DECOYDATABASE -
[- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_THERMORAWFILEPARSER -
[- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_COMETADAPTER -
[- ] process > NFCORE_MHCQUANT:MHCQUANT:OPENMS_PEPTIDEINDEXER -
WARN: Access to undefined parameter `singularity_pull_docker_container` -- Initialise it to a default value eg. `params.singularity_pull_docker_container = some_value`
No such variable: Exception evaluating property 'idXML' for nextflow.script.ChannelOut, Reason: groovy.lang.MissingPropertyException: No such property: idXML for class: groovyx.gpars.dataflow.DataflowBroadcast
-- Check script '/home-link/iizle01/.nextflow/assets/nf-core/mhcquant/./workflows/mhcquant.nf' at line: 287 or see '.nextflow.log' file for more details
Have you provided the following extra information/files:
.nextflow.log
fileChange \t
to real tabs in samplesheet.
Specifying multiple variable modifications results in an error since the OpenMS CometAdapter takes space separated lists as input parameter whereas nextflow provides a single string.
Since the ion annotation module outputs potentially big tsv files (especially *_all_peaks.tsv
) it would be better make this output optional and default false
Normal run of the pipeline leads to the following error
Process
NFCORE_MHCQUANT:MHCQUANT:PYOPENMS_IONANNOTATOR input file name collision -- There are multiple input files for each of the following file names
nextflow run nf-core/mhcquant -r dev -profile cfc --input --outdir ./results --fasta --digest_mass_range 800:2500 --activation_method CID --prec_charge 2:3 --fdr_threshold 0.05 --number_mods 3 --precursor_mass_tolerance 5 --fragment_mass_tolerance 0.02 --num_hits 1 --peptide_min_length 8 --peptide_max_length 12 --max_rt_alignment_shift 300 --max_time '240.h' --email [email protected]
No response
Hello !
I am very new to MS/MS and MHCquant.
I would like to get the peak identification in my mass spectrum when I visualize them, to know which peaks were considered as significant for peptide identification and which ion corresponds to which peak (cf image).
I guess there is something within the pipeline that identifies such peaks: is there a way to retrieve this information somewhere ?
For now, I just retrieve the scan ID for each identified peptide, and I visualize the associated mass spectrum on XCalibur.
Best,
Paul
Use -S
flag for PercolatorAdapter
to make results reproducible
It might be nothing major, but I noticed that the three parameters are inconsistent with the dividers used.
The suggestion would be to change psm-level-fdrs
> psm_level_fdrs
and test this in the pipeline
I looked at the parameter documentation and noted some things that I would propose to change. I will do a bunch of commits in the following pull request to implement these changes.
--skip_decoy_generation
: If enabled, how is own decoy database provided to the pipeline? Maybe add an example of the file format which the DB should follow? Also: "will then prevent appending variants..."
--pick_ms_levels
: What exactly does this mean? Out of all the levels (whatever that means), you have to provide the one that is not centroided so that it can be centroided?
--run_centroidisation
: This is confusing a bit; should ms_level not by definition be a level that is not centroided? Also: Rephrase the sentence to indicate what is happening, e.g. "Set if the specified ms_level is not ctrd. to centroid it" or sth like that; Also, default=false?
--peptide_min_length
to --fragment_bin_offset
: "Specify .. TO BE considered after..." (or not? Maybe you have to specify the settings used in previous programs?) -->this also applies to the variant options
--number_mods
: maybe rename to max_number_mods or max_accepted_mods or sth like that?
--num_hits
: maybe rename to --number_reported_hits (or at least to number_hits to make it analogous to number_mods)
--digest_mass_range
: maybe rename to accepted_mass_range? Unless the digest thing is relevant
--prec_charge
: rename accepted_prec_charge_range or accepted_charge_range?
--fixed_mods
AND --variable_mods
: is this a choice parameter? If so, possibly list options?
--remove_precursor_peak
: Description states something else...
--description_correct_features
: Does this require the summed up descriptor values? I.e. 1+2+4+8?
--variant_indel/fs/snp
: Change to: "Set this option to not consider..."
Currently, there is no annotation of the different samples of the data that is generated with the --skip_quantification
parameter
Specifying a fixed modification such as 'Carbamidomethyl (C)' has failed since quotes are missing surrounding the parameter:
CometAdapter -in ${mzml_file} \\
(...)
-variable_modifications '${params.variable_mods}' \\
-fixed_modifications ${params.fixed_mods} \\
(...)
--> .command.sh: line 17: syntax error near unexpected token `('
Would be good to add the Zenodo DOI for the release to the main README of the pipeline in order to make it citable. You will have to do this via a branch pushed to the repo in order to directly update master
. See PR below for example and file changes:
nf-core/atacseq#38
See https://zenodo.org/record/3359618#.XVZ0bOhKhPY
Web-hooks are already set-up for this repo to have a unique Zenodo DOI generated everytime a new version of the pipeline is released. Would be good to add this in after every release 👍
The Pipeline summary needs to be updated.
There are some typos in the output doc and other things I would suggest to change.
No response
No response
No response
Hi,
I think that it would be a good idea to warn users if a specific allele or multiple alleles are not supported by MHCFlurry.
I propose that we enable "echo true" for processes: predict_psms and predict peptides. I could then adapt the python scripts to print a warning if an unsupported allele was detected.
Alternative 1: Instead of enabling echo we could redirect stdout, but imo this is overkill here.
Alternative 2: We validate the alleles inline in the nextflow script. I think this unnecessarily clutters the nextflow script.
What do you think @Leon-Bichmann ?
Next Release should integrate vcf support for mutated neoantigen search based on the (Fred2 bioconda package).
For example the score column contains q-values but it is not included in the documentation.
I experienced the following issue on the dev branch, and I would like to resolve this before release (of course), but I have no idea where to look for this. I noticed a similar issue request on OpenMS
The major difference is that I tried to update OpenNMS from 2.6.0 to 2.8.0 on the dev branch, I will go over the changes between these two versions, but some input would be highly appreciated.
The following error is returned:
Caused by:
Process `NFCORE_MHCQUANT:MHCQUANT:PROCESS_FEATURE:OPENMS_MZTABEXPORTER_QUANT (QAMTL477AO_J314_Pre_T39L243_J314_Pre_T39L243)` terminated with an error exit status (8)
Command executed:
MzTabExporter -in QAMTL477AO_J314_Pre_T39L243_J314_Pre_T39L243_resolved.consensusXML \
-out QAMTL477AO_J314_Pre_T39L243_J314_Pre_T39L243.mzTab \
-threads 2 \
cat <<-END_VERSIONS > versions.yml
"NFCORE_MHCQUANT:MHCQUANT:PROCESS_FEATURE:OPENMS_MZTABEXPORTER_QUANT":
openms: $(echo $(FileInfo --help 2>&1) | sed 's/^.*Version: //; s/-.*$//' | sed 's/ -*//; s/ .*$//')
END_VERSIONS
Command exit status:
8
Command output:
No fractions annotated in consensusXML. Assuming unfractionated.
Error: Unexpected internal error (PSM controllerType=0 controllerNumber=1 scan=9440 does not map to an MS file registered in the quantitative metadata. Check your merging and filtering steps and/or report the issue, please.)
<No fractions annotated in consensusXML. Assuming unfractionated.> occurred 4 times
Command wrapper:
nxf-scratch-dir node003:/scratch/117012/nxf.8kITDHhTPh
No fractions annotated in consensusXML. Assuming unfractionated.
Error: Unexpected internal error (PSM controllerType=0 controllerNumber=1 scan=9440 does not map to an MS file registered in the quantitative metadata. Check your merging and filtering steps and/or report the issue, please.)
<No fractions annotated in consensusXML. Assuming unfractionated.> occurred 4 times
I also looked into the file of interest and I could file a spectrum_reference
annotated.
<consensusElement id="e_11275396928539264273" quality="3.574097" charge="5">
<centroid rt="2886.454795973814726" mz="527.123682987430925" it="3.764833e06"/>
<groupedElementList>
<element map="0" id="28923268865551816" rt="2897.00668984044114" mz="527.123682987430925" it="4.085203e06" charge="5"/>
<element map="1" id="14924012554702823270" rt="2882.901894744373749" mz="527.123682987430925" it="3.478242e06" charge="5"/>
<element map="2" id="13437987073385272035" rt="2880.484798142305408" mz="527.123682987430925" it="3.642787e06" charge="5"/>
<element map="3" id="1164487179193034968" rt="2885.425801168138605" mz="527.123682987430925" it="3.8531e06" charge="5"/>
</groupedElementList>
<PeptideIdentification identification_run_ref="PI_0" score_type="q-value" higher_score_better="false" significance_threshold="0" MZ="527.1254486319" RT="2874.61804275669" spectrum_reference="controllerType=0 controllerNumber=1 scan=9440" >
<PeptideHit score="0.0126048" sequence="ANANSRQQIRKLIKDGLIIRKPV" charge="5" aa_before="I I I I I I I" aa_after="T T T T T T T" start="32 32 32 29 52 52 32" end="54 54 54 51 74 74 54" protein_refs="PH_219 PH_1709 PH_1272 PH_1265 PH_1253 PH_937 PH_927">
<UserParam type="string" name="target_decoy" value="target"/>
<UserParam type="string" name="MS:1002258" value="11"/>
<UserParam type="string" name="MS:1002259" value="176"/>
<UserParam type="string" name="num_matched_peptides" value="4252"/>
<UserParam type="int" name="isotope_error" value="0"/>
<UserParam type="float" name="MS:1002252" value="2.234"/>
<UserParam type="float" name="MS:1002253" value="1.0"/>
<UserParam type="float" name="MS:1002254" value="0.0"/>
<UserParam type="float" name="MS:1002255" value="25.699999999999999"/>
<UserParam type="float" name="MS:1002256" value="1.0"/>
<UserParam type="float" name="MS:1002257" value="0.0494"/>
<UserParam type="string" name="protein_references" value="non-unique"/>
<UserParam type="float" name="COMET:deltCn" value="1.0"/>
<UserParam type="float" name="COMET:deltLCn" value="0.0"/>
<UserParam type="float" name="COMET:lnExpect" value="-3.00780485478826"/>
<UserParam type="float" name="COMET:lnNumSP" value="8.355144739461839"/>
<UserParam type="float" name="COMET:lnRankSP" value="0.0"/>
<UserParam type="float" name="COMET:IonFrac" value="0.0625"/>
<UserParam type="float" name="MS:1001492" value="0.412529"/>
<UserParam type="float" name="MS:1001491" value="0.0126048"/>
<UserParam type="float" name="MS:1001493" value="0.245321"/>
</PeptideHit>
<UserParam type="int" name="id_merge_index" value="4"/>
<UserParam type="string" name="FFId_category" value="internal"/>
<UserParam type="int" name="map_index" value="0"/>
<UserParam type="string" name="feature_id" value="11275396928539264273"/>
</PeptideIdentification>
<UserParam type="string" name="feature_id" value="11275396928539264273"/>
</consensusElement>
nextflow run nf-core/mhcquant \
-r dev \
-profile cfc \
--input input.tsv \
--outdir ./results \
--fasta *.fasta \
--digest_mass_range 800:2500 \
--activation_method CID \
--prec_charge 2:3 \
--fdr_threshold 0.05 \
--number_mods 3 \
--precursor_mass_tolerance 5 \
--fragment_mass_tolerance 0.02 \
--num_hits 1 \
--peptide_min_length 8 \
--peptide_max_length 12 \
--max_rt_alignment_shift 300 \
--max_time '240.h' \
--email [email protected]
ERROR ~ Error executing process > 'generate_decoy_database (1)'
Caused by:
Process generate_decoy_database (1)
terminated with an error exit status (3)
Command executed:
DecoyDatabase -in UP000005640_9606_reviewed_added_vcf.fasta
-out UP000005640_9606_reviewed_added_vcf_decoy.fasta
-decoy_string DECOY_
-decoy_string_position prefix
Command exit status:
3
Command output:
Version 2.4.0-HEAD-2018-10-26 of DecoyDatabase is available at www.OpenMS.de
Warning: Only one FASTA input file was provided, which might not contain contaminants.You probably want to have them! Just add the contaminant file to the input file list 'in'.
Error: Unable to read file (Error while parsing FASTA file! The first entry could not be read! Please check the file! in: )
Command wrapper:
Version 2.4.0-HEAD-2018-10-26 of DecoyDatabase is available at www.OpenMS.de
Warning: Only one FASTA input file was provided, which might not contain contaminants.You probably want to have them! Just add the contaminant file to the input file list 'in'.
Error: Unable to read file (Error while parsing FASTA file! The first entry could not be read! Please check the file! in: )
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
-- Check '.nextflow.log' file for details
All boolean parameters such as include_proteins_from_vcf should be false per default, since setting them to false is not possible at the moment.
We keep having issues with malformed mzML files and retrieve non-descriptive errors by the CometAdapter
. Therefore it would be of benefit to check the input (or converted) mzML files before processing them further. For this FileInfo in openms comes in handy.
The vcf reader function in MHCquant currently doesn't filter provided vcfs by PASS.
Hence, this has to be done manually by the user at the moment but should be easily fixed in a new release.
The Table of contents can be removed from the Usage.md
to adhere to the structure of other pipeline docs.
mhcflurry will fail if any of these are included in a fasta file : U / B /X
Hi,
I am trying to run MHCquant but I am getting the following error:
Caused by: Process db_search_comet (1) terminated with an error exit status (8)
Progress of 'loading chromatogram list':
-- done [took 0.01 s (CPU), 0.00 s (Wall)] --
Error: Unexpected internal error (Error: Profile data provided but centroided MS2 spectra expected. To enforce processing of the data set the -force flag.)
Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap
The full error log can be found here: error_log.txt
Any idea?
Thank you very much!
The Quick Start run command needs backslashes.
Hi, I'm trying to use mhcquant with workflow docker profile but when I try to star the pipeline appears this error message:
WARN: unknown format for entry /Users/presta/null in provided sample sheet. ignoring line.
Someone could help me?
Thank you
VEP annotated VCF files caused an error during parsing in variants2fasta.py
here .In more detail: The field corresponding to CDS_position
is filled with the string {position/length}
, but the program expects an parsable integer here.
nextflow run nf-core/mhcquant -r 2.2.0 -profile cfc \
--input '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/samplesheet_class1_mhcquant.tsv' \
--fasta '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/UP000005640_9606.fasta' \
--allele_sheet '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/allele_sheet_mhcquant.tsv' \
--peptide_max_length 12 \
--predict_class1 \
--fdr_threshold 0.1 \
--predict_RT \
--include_proteins_from_vcf \
--vcf_sheet '/sfs/7/workspace/ws/qeasc01-QLFGB-mhcquant-0/vcf_sheet_mhcquant.tsv' \
--variant_annotation_style 'VEP' \
--variant_indel_filter \
--variant_reference 'GRCH37' \
--max_time '240.h'
Command error:
Using TensorFlow backend.
Traceback (most recent call last):
File "/home-link/qeasc01/.nextflow/assets/nf-core/mhcquant/bin/variants2fasta.py", line 228, in <module>
sys.exit(main())
File "/home-link/qeasc01/.nextflow/assets/nf-core/mhcquant/bin/variants2fasta.py", line 172, in main
variants = read_variant_effect_predictor(args.vcf, gene_filter=protein_ids)
File "/home-link/qeasc01/.nextflow/assets/nf-core/mhcquant/bin/variants2fasta.py", line 76, in read_variant_effect_predictor
coding[transcript_id] = MutationSyntax(transcript_id, int(transcript_pos)-1,
ValueError: invalid literal for int() with base 10: '144/3999'
No response
No response
Something went wrong with the conversion, so the --fixed_modification
parameter is standard on Carbamidomethyl (C)
.
-
-[nf-core/mhcquant] Pipeline completed with errors-
Error executing process > 'align_ids'
Caused by:
Process `align_ids` terminated with an error exit status (8)
Command executed:
MapAlignerIdentification -in train_sample_66_ms_run_4.mzML_idx_fdr_filtered.idXML \
-trafo_out train_sample_66_ms_run_4.mzML_idx_fdr_filtered.trafoXML \
-model:type linear \
-algorithm:max_rt_shift 300
Command exit status:
8
Command output:
Progress of 'loading input files':
-- done [took 0.00 s (CPU), 0.01 s (Wall)] --
Warning: Value of parameter 'min_run_occur' (here: 2) is higher than the number of runs incl. reference (here: 1). Using 1 instead.
Progress of 'aligning maps':
Error: Unexpected internal error (No reference RT information left after filtering)
Command wrapper:
Progress of 'loading input files':
-- done [took 0.00 s (CPU), 0.01 s (Wall)] --
Warning: Value of parameter 'min_run_occur' (here: 2) is higher than the number of runs incl. reference (here: 1). Using 1 instead.
Progress of 'aligning maps':
Error: Unexpected internal error (No reference RT information left after filtering)
Work dir:
/projectsp/f_jx76_1/xiaolong/temp/MSV000082648/20200326comet/train_sample_66_ms_run_4.mod1/work/5e/dce3d87a55eac31852aad2c4ad3398
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
If previous process of database search are not synchrone, the order of id files can be switched leading to the application of a wrong retention time transformation to each file.
I would like to thank the team for the amazing work and the great software, however, I have tried to run the pipeline, everything went well, until step 11 where execution ends with the following error:
"""
Error executing process > 'run_percolator (1)'
Caused by:
Process run_percolator (1)
terminated with an error exit status (8)
Command executed:
OMP_NUM_THREADS=6
PercolatorAdapter -in s1_all_ids_merged_psm.idXML
-out s1_all_ids_merged_psm_perc.idXML
-seed 4711
-trainFDR 0.05
-testFDR 0.05
-threads 6
-enzyme no_enzyme
-peptide-level-fdrs
-subset-max-train 0
-doc 0 \
Command exit status:
8
Command output:
Loading input file: s1_all_ids_merged_psm.idXML
Merging peptide ids.
Merging protein ids.
Error: Unexpected internal error (Prefix of string '6file=4959' successfully converted to an integer value. Additional characters found at position 2)
Command wrapper:
Loading input file: s1_all_ids_merged_psm.idXML
Merging peptide ids.
Merging protein ids.
Error: Unexpected internal error (Prefix of string '6file=4959' successfully converted to an integer value. Additional characters found at position 2)
"""
I have tried to run the PercolatorAdapter from the openMS docker container with the s1_all_ids_merged_psm.idXML in he intermediate results also ending up in the same results.
I would truly appreciate your help, thanks a lot
Hi @Leon-Bichmann !
we need to get all tools in a single environment, thus need to recompile all tools required by this pipeline against the most current CXX environment in bioconda/conda-forge. Can you start listing up tools in here, then we could for example start coming up with a strategy on how to achieve this?
This second environment causes so many downstream issues, that we should get rid of this asap:
....
We should get this rolling for 1.3.X already...
On the long run it would be nice to have an experimental design file specifying which samples are replicates and that the whole batch can be processed at once.
ERROR: Cannot find Java or it's a wrong version -- please make sure that Java 8 is installed
NOTE: Nextflow is trying to use the Java VM defined by the following environment variables:
JAVA_CMD: /usr/local/lib/jvm/openjdk11/bin/java
JAVA_HOME: /usr/local/lib/jvm/openjdk11
The command "wget -qO- get.nextflow.io | bash" failed and exited with 1 during .
I assume that we need to bump the minimum nextflow version to at least 19.04.0 .
Has been suggested by @ewels on nf-core slack on October 11th.
.mztab
files were missing for me in the Intermediate_Results
directory
.
.
.
In the Readme.md
the additional references are lacking links to the papers.
e.g. there is a output_docs
file opened, but no documentation is created in general.
Also make sure to use a channel for the output_docs
feature if possible to allow staging of the documentation template to different storage providers by nextflow.
The current annotation of A 03:01;A 68:01;B 27:05;B 35:03;C 02:02;C 04:01
give a misconception on how the annotation of the HLA alleles should be.
No response
No response
No response
When running the test or test_full profile mhcquant terminates with
Error: File not found (the file 'test.fasta' could not be found)
The accession of the reference database in the test data repository is erroneous.
Furthermore we need to increment the amount of tests since the current ones do not cover all of the feature implemented (e.g. ion annotation)
nextflow run nf-core/mhcquant -profile test,cfc --outdir test
When trying to install openms and percolator in the same conda environment i get a number of conflicts:
If openms2.3 and percolator3.1 is selected:
libxerces-c-3.1 will be installed but openms requires libxerces-c-3.2
the build works but all tools fail
if openms2.4 and percolator3.1 is selected:
the build fails because of a dependency conflict
conflict:
openms=2.4 -> boost[version='>=1.64.0,<1.64.1.0a0']
percolator -> boost==1.62
percolator -> xerces-c==3.1.2
Some discussions or attempts to fix this can be found here:
bioconda/bioconda-recipes#11871
bioconda/bioconda-recipes#12060
To process labeled data, we could let Comet search also for labeled peptides by specifying them as variable modifications using the unimod nomenclature. A large set of modifications is supported by the CometAdapter, however they are still fixed to distinct accessions and not customizable afaik.
Suggestions welcome
The “theoretical_fragment_ion” parameter instructs Comet whether or not to include signal from the flanking bins in the cross-correlation calculation. See Comet paper
They also state that High-Resolution Runs should have theoretical_fragment_ion = 0
. However in the nextflow.config
this parameter is not specified. This leads to running Comet with theoretical_fragment_ion = 1
since that is the default. See Comet doc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.