medbioinf / pia Goto Github PK
View Code? Open in Web Editor NEW:books: :microscope: PIA - Protein Inference Algorithms
Home Page: https://github.com/medbioinf/pia
License: Other
:books: :microscope: PIA - Protein Inference Algorithms
Home Page: https://github.com/medbioinf/pia
License: Other
We should implement a way to provide parameters in Json or yaml file and move out of xml confía file.
@julianu:
Some comments about the mztab example:
MTD fixed_mod[1] [MS, MS:1002453, No fixed modifications searched, ]
MTD variable_mod[1] [MS, MS:1002454, No variable modifications searched, ]
However, the PSMs contains modifications
MTD software[9]-setting[10] base score for FDR calculation for file 8 = mascot_score
search_engine search_engine_score[1] search_engine_score[2]
Hello,
I continue my self training using PIA in Knime.
Thanks you help that helped me a lot.
Now, I have a simple workflow from mzid files, to PIA compiler and PIA analysis.
I would like to export the result of the grouping algorithm to something like TSV, XLS or other table format.
I've tried to connect the 3 output link to XLS Writer or XLS sheet appender :
This gives me 3 tables, but I don't find Protein Accessions, protein groups and protein clusters.
The last output is clearly marked as unimplemented. I've tried but this gave me no results (mzTab, csv, mzIdentML).
So I'me wondering how to get some table... Is there some other node to use ?
perhaps using the embeded view ?
Is it possible with other tools, with CLI or the web interface ?
Cheers
Olivier
Sorry, it' me again ;-)
I've some difficulties to use the PIA Analysis module and I have some questions about how to use it.
I've set a filter at the PSM level.
I've chosen "X!Tandem expect < 0.01"
saved the resulting tables in XLS sheet adapter (after a column rename).
Then I did the same with :
"X!Tandem expect < 0.05"
I have the same protein list and same peptide list after an "occam's razor".
That's strange for me : I was expecting a different protein/group list.
the PSM results are different, as I was expecting, there are less PSM's in the list with the 0.01 filter than with 0.05.
So the filter is taken into account...
I've you an idea of what I'm missing ?
Cheers,
Olivier
@julianu most of the classes do not have a proper testing classes.
Creating custom parameter files for command line use is difficult. Being able to have a Knime node which write out a param file for command line use would be effective for transferring analysis from Knime to command line.
Edit: I am not sure how to tag this as an Enhancement Issue?
Hi, I am trying to use PIA in Knime for analysis of LC/MS data, but during start of the PIA the node with ID suffix 570 (PIA compiler) crashes, and I receive following error log:
ERROR 01-PIA_first_analysis 0 Unable to load node with ID suffix 570 into workflow, skipping it: javax/xml/bind/JAXBException
ERROR LoadWorkflowRunnable Errors during load: Status: Error: 01-PIA_first_analysis 0 loaded with errors
ERROR LoadWorkflowRunnable Status: Error: 01-PIA_first_analysis 0
ERROR LoadWorkflowRunnable Status: Error: Unable to load node with ID suffix 570 into workflow, skipping it: javax/xml/bind/JAXBException
How should I solve this problem? All necessary items are installed. PC with Win10 x64.
Thank you!
Implement an alternative export for sequences and modifications together in one string.
Hi, my name is Trent and I have been using PIA the past few weeks in attempt to generate some Protein Inference results from ms2 searches using Mascot and Comet.
I have mostly been using the Knime workspace for protein inference but I would like to move to the command line version in the future with a custom param file as I would like to apply PIA to thousands of searches.
Anyway, I was wondering if you could provide any details on maybe why I am getting a Null Pointer Exception in Knime PIA Analysis and perhaps how I can access the more detailed error messages. I have attached the mzidentml from a comet search that generates the error to this issue called 130707-comet-edited.mzid (I am attaching a reduced version of this file as it is too large to attach here). I have also attached a Mascot mzidentml which is able to generate good and expected protein inference results called 109691.mzid.
I guess my question is why am I getting a Null Pointer Exception in Knime when I use the 130707-comet-edited.mzid (as it seems formatted properly - it was converted from a pepxml to a mzidentml using ProteoWizard). Also, through Knime I am not able to see the full Java Stack Trace error message, it simply says:
ERROR PIA Analysis 6:11 Execute failed: ("NullPointerException"): null
So, I am really not able to see the exact issue going on. Am I able to see the full Java Stack Trace error in Knime?
Finally, I was wondering if it is possible to get a PIA Parameter file from the Knime nodes? This would make it substantially easier to run PIA via command line as it seems daunting to create a custom parameter file from scratch. Debugging from the command line would also be easier than debugging through Knime as I would be able to see the full stdout and stderr.
I hope to hear back, thank you for your time.
Trent
Comet PIA Options:
Mascot PIA Options
comet_mzid.zip
mascot_mzid.zip
Hi !
I am experiencing a problem with the PIA node as it fails when the exportation format idxML is selected.
With the other options it works fine. It is still nice when working on protein inference only, but it hampers its connection with other downstream nodes requiring idxML format.
Thanks in advance!
When merging to mztab, error reading Identification protocols.
Hi @julianu the filtering is not working in this pipeline https://github.com/bigbio/nf-workflows/tree/master/xt-msgf-nf .
HEre the files mzids from XTandem and mgf plus + pia config.
output-mzids.zip
The mztab exported contains 50% TP and 50% FP. Is not filtering.
Regards
Yasset
Hi Julian,
I am wondering if PIA can adapt imaging MS data like imzML. This format is used for new bruker MALDI-TOF and it seems there is no way to transfer the imzML file into other format that can be used for PIA?
Reference: https://ms-imaging.org/wp/
Thanks,
Xiwei
Tide support is not working properly when generating the compiler file with this example.
test.mgf.tide.txt
It is set to use only M/Z and RT as default right now.
Hi @julianu
I think we should discuss the dependency of inspector-mz-graph in PIA. This introduces a lot of redundant dependencies that are not needed. This part of PIA can be removed and move into a desktop component. In addition in add a lot of swing classes into a backend algorithm.
What do you think?
The idXML exporter is still missing any filtering functionality.
Include the filtering options on the PSM and protein levels.
The following command:
pia inference -infile ${pia_xml} -paramFile ${pia_config} -proteinExport -psmExport ${pia_xml}.mztab mzTab
export only the last option before ${pia_xml}.mztab
in this case PSMs if I switch -psmExport
by -proteinExport
is exporting only the proteins.
Regards
Yasset
The export to mztab using Proteins and PSMs is not working. Only one of them at a time.
Hello,
I encountered this problem when I use the CMD to run the PIA:
2021-03-22 23:13:47,855 ERROR PIAIntermediateJAXBHandler - Error while parsing PIA XML file
com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character 'f' (code 102) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:653)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2133)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
at com.ctc.wstx.sr.BasicStreamReader.nextTag(BasicStreamReader.java:1204)
at de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler.parseXMLFile(PIAIntermediateJAXBHandler.java:188)
at de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler.parse(PIAIntermediateJAXBHandler.java:162)
at de.mpc.pia.modeller.PIAModeller.parseIntermediate(PIAModeller.java:296)
at de.mpc.pia.modeller.PIAModeller.loadFileName(PIAModeller.java:175)
at de.mpc.pia.modeller.PIAModeller.(PIAModeller.java:119)
at de.mpc.pia.modeller.PIAModeller.processExecuteXMLFile(PIAModeller.java:799)
at de.mpc.pia.modeller.PIAModeller.parseParameterXMLFile(PIAModeller.java:786)
at de.mpc.pia.modeller.PIAModeller.main(PIAModeller.java:733)
I just generated the parameter file by the "-paramOutFile" command.
Thanks for your help.
The current code uses HashCodeBuilder as a mechanism to create the HashCode and equals. This is not efficient because every time a comparison is performed then an Object needs to be created.
Hello,
First of all, I'm a new a newbie using PIA and KNIME. Perhaps the problem I have is in fact entirely due to my bad understanding of either PIA or KNIME.
So, I would like to use the Protein Inference Algorithms software to benchmark it.
My identification engine is X!Tandem VENGEANCE (2015.12.15).
The output files are in mzIdentML produced by X!Tandem, so I've made a workflow with :
"List Files" node (mzIdentML)
"PIA Compiler" node
"PIA Analysis" node
the 2 first steps are OK : right green bullets
The last step fails : middle red cross with the message 👍
ERROR PIA Analysis 0:4 Execute failed: No PIA XML file given! Provide either by datatable (e.g. from PIA Compiler or List Files) or port (Input File)
I've exported the PIA intermediate file with the "Binary Objects to Files" to inspect its content : it is indeed containing PIA xml data.
<ns3:jPiaXML date="2016-09-06T08:50:22.748+02:00" name="compilation" xmlns:ns2="http://psidev.info/psi/pi/mzIdentML/1.1" xmlns:ns3="http://www.medizinisches-proteom-center.de/PIA/piaintermediate">
...
So I don't understand the Error message and what should I do to make it work ?
Do you have some insights that could help me to analyze my data ?
Thanks a lot
Olivier
When the inputfiles of the compiler are mztab without proteins. PIA fails with a null exception in the mztab reader.
@julianu can we do a new release including the new filters for the multi-search engine.
We need to include in PÍA a phosphorylation localización score and FDR .
I see so far two popular scores A-score and PhosphoRS for the localisation score. I will provide some code.
For some reason PSM ms_run[] get repeated in mztab export. Here and exmaple:
PSM FGIAAK 1 P21796 0 databaseName null [MS, MS:1002387, PIA, 1.3.10]|[PSI-MS, MS:1001476, X!Tandem, X! Tandem Alanine (2017.2.1.4)]|[PSI-MS, MS:1002048, MS-GF+, Release (v2017.07.21)] 0.003638683087973093 0.0075 20.0 0.004483837330552659 115.0 1.7413855E-8 0.34254366 null 1729.1622 2 303.68479405403644 303.683456328125 ms_run[1]:index=1433|ms_run[1]:index=1433|ms_run[1]:index=1433|ms_run[2]:index=1433|ms_run[2]:index=1433|ms_run[2]:index=1433 R Y 219 224 0 0 1
For batch processing it would be great, if the name of the exported file would be related to the file name of the searched spectrum.
I get the following error when PIA tries to write its XML output:
[05-Dec-2019 14:00:56 - INFO] "Writing PIA XML file to /home/wout/Downloads/b10000_ZNF230.xml" (de.mpc.pia.intermediate.compiler.PIACompiler:918)
[05-Dec-2019 14:00:56 - INFO] "Stream open, writing PIA XML" (de.mpc.pia.intermediate.compiler.PIACompiler:942)
[05-Dec-2019 14:00:56 - ERROR] "JAXBException while writing XML file" (de.mpc.pia.intermediate.compiler.PIACompiler:989)
javax.xml.bind.JAXBException
- with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:241)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:455)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:652)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:599)
at de.mpc.pia.intermediate.compiler.PIACompiler.createMarshallerForPiaXML(PIACompiler.java:1007)
at de.mpc.pia.intermediate.compiler.PIACompiler.marshalToFormattedFragmentMarshaller(PIACompiler.java:1030)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutJaxbFilesList(PIACompiler.java:1080)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutXML(PIACompiler.java:961)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutXML(PIACompiler.java:919)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutXML(PIACompiler.java:932)
at de.mpc.pia.intermediate.compiler.PIACompiler.main(PIACompiler.java:1273)
Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at javax.xml.bind.ContextFinder.safeLoadClass(ContextFinder.java:573)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:239)
... 10 more
[05-Dec-2019 14:00:56 - INFO] "Writing of PIA XML file finished." (de.mpc.pia.intermediate.compiler.PIACompiler:996)
Could this be due to an outdated version of JAXB that's not compatible with my Java version? For example, see jakartaee/jaxb-api#78 and https://stackoverflow.com/a/43574427.
I'm using Java version 13.0.1, while the pom file seems to indicate a PRIDE version 1.0.22 of JAXB is used. Is this even the standard JAXB version number? If so, it seems a pretty old one and an update to JAXB version 2.4 seems useful to support current Java versions.
Hi all, its me again...
I was wondering if it is possible to include "FDR q values" in the -proteinExport
output from command line. I suppose I can calculate FDR myself by just simply counting decoys based on a score threshold. However, I just wanted to point this out. The output from command line I see is as follows:
"accessions","score","#peptides","#PSMs","#spectra"
"USP9X_HUMAN|Q93008","374.75325142812","155","450","438"
"USP9Y_HUMAN|O00507","141.30843342618374","66","138","138"
"Q6P468_HUMAN|Q6P468","139.16527025922363","59","170","164"
"HSP7C_HUMAN|P11142,V9HW22_HUMAN|V9HW22","110.67590820388949","47","111","111"
"GRP78_HUMAN|P11021,V9HWB4_HUMAN|V9HWB4","99.65650591097625","41","90","90"
"A0A0G2JIW1_HUMAN|A0A0G2JIW1,A8K5I0_HUMAN|A8K5I0,HS71A_HUMAN|P0DMV8,HS71B_HUMAN|P0DMV9","95.12141332736277","41","103","103"
"B7Z4V2_HUMAN|B7Z4V2,GRP75_HUMAN|P38646,V9HW84_HUMAN|V9HW84","87.67906153604937","40","71","71"
"A0A087WZG9_HUMAN|A0A087WZG9,B4DSP0_HUMAN|B4DSP0,PEG10_HUMAN|Q86TG7-2","78.44328812851143","31","358","187"
"A0A087WUL4_HUMAN|A0A087WUL4,A0A087WX23_HUMAN|A0A087WX23,A0A087WXK2_HUMAN|A0A087WXK2,PEG10_HUMAN|Q86TG7","77.50461134847883","31","358","187"
Within Knime this information is output as follows:
Proteins | Score | Coverages | nrPeptides | nrPSM | nrSpectra | ClusterID | Description | Decoy | FDR q value
[USP9X_HUMAN\|Q93008] | 433.7986176 | [?] | 152 | 363 | 426 | 11547 | [Probable ubiquitin carboxyl-terminal hydrolase FAF-X OS=Homo sapiens GN=USP9X PE=1 SV=3] | FALSE | 0
[USP9Y_HUMAN\|O00507] | 175.9273045 | [?] | 66 | 128 | 136 | 11547 | [Probable ubiquitin carboxyl-terminal hydrolase FAF-Y OS=Homo sapiens GN=USP9Y PE=2 SV=2] | FALSE | 0
[Q6P468_HUMAN\|Q6P468] | 160.3044466 | [?] | 56 | 137 | 159 | 11547 | [USP9X protein (Fragment) OS=Homo sapiens GN=USP9X PE=2 SV=1] | FALSE | 0
[GRP78_HUMAN\|P11021, V9HWB4_HUMAN\|V9HWB4] | 122.2464199 | [?, ?] | 42 | 80 | 86 | 68 | [78 kDa glucose-regulated protein OS=Homo sapiens GN=HSPA5 PE=1 SV=2, Epididymis secretory sperm binding protein Li 89n OS=Homo sapiens GN=HEL-S-89n PE=2 SV=1] | FALSE | 0
[HSP7C_HUMAN\|P11142, V9HW22_HUMAN\|V9HW22] | 121.8969185 | [?, ?] | 44 | 97 | 105 | 68 | [Heat shock cognate 71 kDa protein OS=Homo sapiens GN=HSPA8 PE=1 SV=1, Epididymis luminal protein 33 OS=Homo sapiens GN=HEL-S-72p PE=2 SV=1] | FALSE | 0
[A0A0G2JIW1_HUMAN\|A0A0G2JIW1, A8K5I0_HUMAN\|A8K5I0, HS71A_HUMAN\|P0DMV8, HS71B_HUMAN\|P0DMV9] | 110.6020159 | [?, ?, ?, ?] | 39 | 86 | 98 | 68 | [Heat shock 70 kDa protein 1B OS=Homo sapiens GN=HSPA1B PE=1 SV=1, Epididymis secretory protein Li 103 OS=Homo sapiens GN=HSPA1A PE=2 SV=1, Heat shock 70 kDa protein 1A OS=Homo sapiens GN=HSPA1A PE=1 SV=1, Heat shock 70 kDa protein 1B OS=Homo sapiens GN=HSPA1B PE=1 SV=1] | FALSE | 0
[B7Z4V2_HUMAN\|B7Z4V2, GRP75_HUMAN\|P38646, V9HW84_HUMAN\|V9HW84] | 109.0415013 | [?, ?, ?] | 38 | 62 | 67 | 68 | [cDNA FLJ51907, highly similar to Stress-70 protein, mitochondrial OS=Homo sapiens PE=2 SV=1, Stress-70 protein, mitochondrial OS=Homo sapiens GN=HSPA9 PE=1 SV=2, Epididymis secretory sperm binding protein Li 124m OS=Homo sapiens GN=HEL-S-124m PE=2 SV=1] | FALSE | 0
[B4DNT8_HUMAN\|B4DNT8] | 100.6255528 | [?] | 36 | 77 | 88 | 68 | [cDNA FLJ54370, highly similar to Heat shock 70 kDa protein 1 OS=Homo sapiens PE=2 SV=1] | FALSE | 0
Where the ending column is the FDR q value.
Also, How would it be possible to get coverage results? I believe I have seen a workflow that uses OpenMS PeptideIndexer to do this?
Thanks,
Trent
Some pipelines need to filter PSMs taking into account the Search Engine provisioning (which search engines have identified the corresponding spectrum), we see two cased here:
Filter Peptides that are not identified by all search engines. PR https://github.com/mpc-bioinformatics/pia/pull/123
Filter peptides identifications if the corresponding spectrum was identified with different sequences by each search engine. For example Spectrum A identified by SearchEngine A with sequence A and by Search Engine B with sequence B. We should be able to remove that case with a filter.
We need to be able to parse the pepXML files which enable to support original results from TPP and other search engines.
@julianu testPIACompilerNativeFiles take a lot of time to run.
Many programs parse accessions in FASTAs just like:
"everything before the first blank is the accession" (e.g.OpenMS does this)
To make PIA more compatible, make this an optional accession parsing, which can be set and overrides all other parsing options.
@julianu here how we decided to export metadata from mzidentml -> mztab:
and also from PRIDE XML to mztab:
We should check in the current version of the PIA how we converted PRIDE XML and mztab back to mzIdentML, especially the metadata. Some of the information cab be redundant like the softwares, etc.
Can you have a look
Hi,
Does PIA accept PSM result input in simple TSV / CSV format? If yes, what does the format look like?
Thanks,
Yasset requested to add an export of the protein sequences, if available.
It should be in an optional column with the CV MS:1001344 (AA sequence).
I have two score columns in mzTab files generated by ANN-SoLo:
MTD psm_search_engine_score[1] [MS, MS:1001143, search engine specific score for PSMs,]
MTD psm_search_engine_score[2] [MS, MS:1002354, PSM-level q-value,]
PIA doesn't seem to know how to handle a search engine specific score, which makes sense because it can be anything. However, I haven't been able to figure out how to tell PIA to ignore the psm_search_engine_score[1]
column in the mzTab file and use the psm_search_engine_score[2]
column instead. As a result, when I run PIA on these mzTab files I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Type must not be null or of type UNKNOWN_SCORE: [MS, MS:1001143, search engine specific score for PSMs, ]
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.getBasicScoreModelForParam(MzTabParser.java:752)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSMScore(MzTabParser.java:721)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.lambda$parsePSMScores$9(MzTabParser.java:696)
at java.base/java.util.TreeMap.forEach(TreeMap.java:1002)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSMScores(MzTabParser.java:695)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSM(MzTabParser.java:556)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSMs(MzTabParser.java:521)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parseFile(MzTabParser.java:240)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.getDataFromMzTabFile(MzTabParser.java:186)
at de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory$InputFileTypes$3.parseFile(InputFileParserFactory.java:119)
at de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory.getDataFromFile(InputFileParserFactory.java:450)
at de.mpc.pia.intermediate.compiler.PIACompiler.getDataFromFile(PIACompiler.java:257)
at de.mpc.pia.intermediate.compiler.PIACompiler.parseCommandLineInfile(PIACompiler.java:1347)
at de.mpc.pia.intermediate.compiler.PIACompiler.parseCommandLineInfiles(PIACompiler.java:1302)
at de.mpc.pia.intermediate.compiler.PIACompiler.main(PIACompiler.java:1256)
I've used the following command to run PIA version 1.3.10:
java -cp pia-1.3.10/pia-1.3.10.jar de.mpc.pia.intermediate.compiler.PIACompiler -infile b10000_ZNF230.mztab -name pia_test -outfile b10000_ZNF230.xml
I've attached the mzTab file for reference (renamed to .txt to appease GitHub): b10000_ZNF230.txt
How can I process this file using PIA? Thanks.
Hello,
I tried the -proteinExport function for processing the sample data and found problems in the result. I used the command line -infile yeast-gold-015-filtered.pia.xml -paramFile parameter.xml -proteinExport yeast-gold-015-filtered.csv csv
, the parameter file is the sample provided at https://github.com/mpc-bioinformatics/pia/wiki/parameters-XML-file and I changed the score names.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <tool docurl="http://www.medizinisches-proteom-center.de" name="pipeline" version="0.1.23"> <description>This file will contains a pipeline execution for PIA</description> <PARAMETERS> <NODE description="Sets whether PSM sets should be built to combine search results from different search engines / runs." name="PSMCreatePSMSets"> <ITEM value="yes" type="string" name="create sets"/> </NODE> <NODE description="Adds the given score name to the list of preferred scores for FDR calculation." name="PSMAddPreferredFDRScore"> <ITEM value="PSM q-value" type="string" name="score name"/> </NODE> <NODE description="Adds the given score name to the list of preferred scores for FDR calculation." name="PSMAddPreferredFDRScore"> <ITEM value="X!Tandem Expect" type="string" name="score name"/> </NODE> <NODE description="Sets the number of top identifications per spectrum used for all further FDR calculations, 0 meaning all identifications are used." name="PSMSetAllTopidentificationsForFDR"> <ITEM value="1" type="string" name="number of top identifications"/> </NODE> <NODE description="Sets the regular expression used for decoy detection or if 'searchengine' is given as pattern, assumes a decoy search directly performed by the search engine." name="PSMSetAllDecoyPattern"> <ITEM value="s.*" type="string" name="decoy pattern"/> </NODE> <NODE description="Calculates the FDR scores for all files." name="PSMCalculateAllFDR"/> <NODE description="Calculates the combined FDR score. The FDR scores for the single files should be calculated before." name="PSMCalculateCombinedFDRScore"/> <NODE description="Sets whether modifications should be considered while inferring the peptides from the PSMs. Defaults to false" name="PeptideConsiderModifications"> <ITEM value="no" type="string" name="consider modifications"/> </NODE> <NODE description="Adds a filter used by the protein inference. A filter is added by its name, an abbreviation for the comparison, the compared value and (optional), whether the comparison should be negatede.g. "AddInferenceFilter=charge_filter,EQ,2,no"" name="ProteinAddInferenceFilter"> <ITEM value="psm_score_filter_psm_combined_fdr_score" type="string" name="filtername"/> <ITEM value="LEQ" type="string" name="comparison"/> <ITEM value="0.01" type="string" name="value"/> <ITEM value="no" type="string" name="negate"/> </NODE> <NODE description="Inferes the proteins with the given inference method. Any inference filters should be set before this call with calls of AddInferenceFilter. The scoring method is set with the second argument. The scoring settings can be given by athird argument containing setting=value[;setting=value]* (usual settings are used_score and used_spectra)." name="ProteinInfereProteins"> <ITEM value="inference_spectrum_extractor" type="string" name="inference"/> <ITEM value="scoring_multiplicative" type="string" name="scoring"/> <ITEM value="combined_fdr_score" type="string" name="used score"/> <ITEM value="best" type="string" name="used spectra"/> </NODE> </PARAMETERS> </tool>
In the generated result file, the score column is "NaN".
`accessions | score | #peptides | #PSMs | #spectra
P36071 | NaN | 1 | 1 | 1
P25294 | NaN | 3 | 12 | 12
P48415 | NaN | 1 | 1 | 1
P38249 | NaN | 6 | 12 | 12`
But there were score values in the sample result file yeast-gold-015-filtered-proteins.csv, and three columns "isDecoy" "FDR" "q-value" were missed in my result. I'm not sure in which step I failed in the process.
Thank you for your help.
Kai Cheng
Does PIA work also with DIA data?
Is there a way to integrate the OpenSwath pipeline with PIA?
We need to have a way of computing the FDR for classes of PSMs - Peptides and Proteins. This classes can be :
In principle, we should be able to get the list of peptides at 1% FDR and remove all the peptides with 2 miscleavages at 0.001% (Misscleavage FDR).
This is really relevant to perform studies like this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4974352/
Currently, the only way to apply some statistics on top of a standard file format is with PIACompiler and this needs to be written in an PIA file to be consume by modeller rather than a data structure. This needs to be changed for the user in order to do PIAModeller on top of a standard file format.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.