medbioinf / pia Goto Github PK

View Code? Open in Web Editor NEW

20.0 8.0 9.0 50.7 MB

:books: :microscope: PIA - Protein Inference Algorithms

Home Page: https://github.com/medbioinf/pia

License: Other

Java 99.95% Dockerfile 0.05%

proteomics protein-inference protein spectrum-identification search-engine inference

pia's People

Contributors

Stargazers

Watchers

Forkers

bioinformaticsarchive bigbio julianu kevinphu sdf1444 bajramibishi kheshav mpc-bioinformatics

pia's Issues

Json or Yaml config file.

We should implement a way to provide parameters in Json or yaml file and move out of xml confía file.

mztab issues for the PRIDE XML example

@julianu:
Some comments about the mztab example:

In the metadata modifications are annotated like:

MTD fixed_mod[1] [MS, MS:1002453, No fixed modifications searched, ]
MTD variable_mod[1] [MS, MS:1002454, No variable modifications searched, ]

However, the PSMs contains modifications

The theoretical mass is not present for any PSM.
We add in the document some comments saying that this is annotated like a merge.
We need to check the how do you define uniqueness at psm level.
Can be possible to generate statistics at peptide level. For example if we said 1% at Peptide
Level is that possible.
Some of the redundant annotations should be removed for example:

MTD software[9]-setting[10] base score for FDR calculation for file 8 = mascot_score

In the PSM Section the only Mascot is expressed in the search_engine column as search engine. However PIA in providing a search engine score but is not annotated like that:

search_engine search_engine_score[1] search_engine_score[2]

how to export PIA analysis results ?

Hello,

I continue my self training using PIA in Knime.
Thanks you help that helped me a lot.

Now, I have a simple workflow from mzid files, to PIA compiler and PIA analysis.

I would like to export the result of the grouping algorithm to something like TSV, XLS or other table format.

I've tried to connect the 3 output link to XLS Writer or XLS sheet appender :
This gives me 3 tables, but I don't find Protein Accessions, protein groups and protein clusters.

The last output is clearly marked as unimplemented. I've tried but this gave me no results (mzTab, csv, mzIdentML).

So I'me wondering how to get some table... Is there some other node to use ?
perhaps using the embeded view ?
Is it possible with other tools, with CLI or the web interface ?

Cheers
Olivier

problem trying to apply a PSM filter on X!Tandem Expect value

Sorry, it' me again ;-)

I've some difficulties to use the PIA Analysis module and I have some questions about how to use it.

I've set a filter at the PSM level.
I've chosen "X!Tandem expect < 0.01"
saved the resulting tables in XLS sheet adapter (after a column rename).
Then I did the same with :
"X!Tandem expect < 0.05"
I have the same protein list and same peptide list after an "occam's razor".
That's strange for me : I was expecting a different protein/group list.
the PSM results are different, as I was expecting, there are less PSM's in the list with the 0.01 filter than with 0.05.
So the filter is taken into account...
I've you an idea of what I'm missing ?

Cheers,
Olivier

Testing files are missing

@julianu most of the classes do not have a proper testing classes.

Make a Knime node that creates a PIA Parameter File

Creating custom parameter files for command line use is difficult. Being able to have a Knime node which write out a param file for command line use would be effective for transferring analysis from Knime to command line.

Edit: I am not sure how to tag this as an Enhancement Issue?

PIA error in KNIME

Hi, I am trying to use PIA in Knime for analysis of LC/MS data, but during start of the PIA the node with ID suffix 570 (PIA compiler) crashes, and I receive following error log:
ERROR 01-PIA_first_analysis 0 Unable to load node with ID suffix 570 into workflow, skipping it: javax/xml/bind/JAXBException
ERROR LoadWorkflowRunnable Errors during load: Status: Error: 01-PIA_first_analysis 0 loaded with errors
ERROR LoadWorkflowRunnable Status: Error: 01-PIA_first_analysis 0
ERROR LoadWorkflowRunnable Status: Error: Unable to load node with ID suffix 570 into workflow, skipping it: javax/xml/bind/JAXBException

How should I solve this problem? All necessary items are installed. PC with Win10 x64.
Thank you!

Alternative sequence and modification encoding

Implement an alternative export for sequences and modifications together in one string.

the very simple lowercase-encoding for any modification (GQLTDLGAVNVmTGIyTGR)
or a better explaining one like GQLTDLGAVNVM(Oxidation)TGIY(Phosphorylation)TGR

Null Pointer Exception in Knime PIA analysis using mzIdentml from Comet search results

Hi, my name is Trent and I have been using PIA the past few weeks in attempt to generate some Protein Inference results from ms2 searches using Mascot and Comet.
I have mostly been using the Knime workspace for protein inference but I would like to move to the command line version in the future with a custom param file as I would like to apply PIA to thousands of searches.

Anyway, I was wondering if you could provide any details on maybe why I am getting a Null Pointer Exception in Knime PIA Analysis and perhaps how I can access the more detailed error messages. I have attached the mzidentml from a comet search that generates the error to this issue called 130707-comet-edited.mzid (I am attaching a reduced version of this file as it is too large to attach here). I have also attached a Mascot mzidentml which is able to generate good and expected protein inference results called 109691.mzid.

I guess my question is why am I getting a Null Pointer Exception in Knime when I use the 130707-comet-edited.mzid (as it seems formatted properly - it was converted from a pepxml to a mzidentml using ProteoWizard). Also, through Knime I am not able to see the full Java Stack Trace error message, it simply says:
ERROR PIA Analysis 6:11 Execute failed: ("NullPointerException"): null
So, I am really not able to see the exact issue going on. Am I able to see the full Java Stack Trace error in Knime?

Finally, I was wondering if it is possible to get a PIA Parameter file from the Knime nodes? This would make it substantially easier to run PIA via command line as it seems daunting to create a custom parameter file from scratch. Debugging from the command line would also be easier than debugging through Knime as I would be able to see the full stdout and stderr.

I hope to hear back, thank you for your time.

Trent

Comet PIA Options:

Mascot PIA Options

comet_mzid.zip
mascot_mzid.zip

Last version of PIA fails on idxML format export file

Hi !

I am experiencing a problem with the PIA node as it fails when the exportation format idxML is selected.

With the other options it works fine. It is still nice when working on protein inference only, but it hampers its connection with other downstream nodes requiring idxML format.

Thanks in advance!

Error in mzTab Parser reading the identification protocols affect the Generation of Compiler file

When merging to mztab, error reading Identification protocols.

Filtering not working

Hi @julianu the filtering is not working in this pipeline https://github.com/bigbio/nf-workflows/tree/master/xt-msgf-nf .

HEre the files mzids from XTandem and mgf plus + pia config.
output-mzids.zip

The mztab exported contains 50% TP and 50% FP. Is not filtering.

Regards
Yasset

Data processing with MSI dataset

Hi Julian,

I am wondering if PIA can adapt imaging MS data like imzML. This format is used for new bruker MALDI-TOF and it seems there is no way to transfer the imzML file into other format that can be used for PIA?

Reference: https://ms-imaging.org/wp/

Thanks,
Xiwei

The support to Tide is not working properly.

Tide support is not working properly when generating the compiler file with this example.
test.mgf.tide.txt

Make a better choice of how the PSM sets are created

It is set to use only M/Z and RT as default right now.

if only one file, more fields (especially sourceID, if given) could be used without problem
if multiple files should be compared, make choice possible in KNIME (command line has this anyways), also show recommendations for this in the comparison node
maybe add something to the command line, which will allow an educate guess on what to use

Review the dependency of inspector-ms-graph

Hi @julianu

I think we should discuss the dependency of inspector-mz-graph in PIA. This introduces a lot of redundant dependencies that are not needed. This part of PIA can be removed and move into a desktop component. In addition in add a lot of swing classes into a backend algorithm.

What do you think?

Allow filtering in idXML export

The idXML exporter is still missing any filtering functionality.
Include the filtering options on the PSM and protein levels.

Export not working combined PSM + Proteins in the mzTab

The following command:

pia inference -infile ${pia_xml} -paramFile ${pia_config} -proteinExport -psmExport ${pia_xml}.mztab mzTab

export only the last option before ${pia_xml}.mztab in this case PSMs if I switch -psmExport by -proteinExport is exporting only the proteins.

Regards
Yasset

Export to mztab with Protein + PSMs is not working

The export to mztab using Proteins and PSMs is not working. Only one of them at a time.

Problem when running CMD

Hello,

I encountered this problem when I use the CMD to run the PIA:

2021-03-22 23:13:47,855 ERROR PIAIntermediateJAXBHandler - Error while parsing PIA XML file
com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character 'f' (code 102) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:653)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2133)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
at com.ctc.wstx.sr.BasicStreamReader.nextTag(BasicStreamReader.java:1204)
at de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler.parseXMLFile(PIAIntermediateJAXBHandler.java:188)
at de.mpc.pia.intermediate.xmlhandler.PIAIntermediateJAXBHandler.parse(PIAIntermediateJAXBHandler.java:162)
at de.mpc.pia.modeller.PIAModeller.parseIntermediate(PIAModeller.java:296)
at de.mpc.pia.modeller.PIAModeller.loadFileName(PIAModeller.java:175)
at de.mpc.pia.modeller.PIAModeller.(PIAModeller.java:119)
at de.mpc.pia.modeller.PIAModeller.processExecuteXMLFile(PIAModeller.java:799)
at de.mpc.pia.modeller.PIAModeller.parseParameterXMLFile(PIAModeller.java:786)
at de.mpc.pia.modeller.PIAModeller.main(PIAModeller.java:733)

I just generated the parameter file by the "-paramOutFile" command.
Thanks for your help.

Code refactoring for HashCode generation

The current code uses HashCodeBuilder as a mechanism to create the HashCode and equals. This is not efficient because every time a comparison is performed then an Object needs to be created.

@julianu

PIA Analysis error : No PIA XML file given

Hello,

First of all, I'm a new a newbie using PIA and KNIME. Perhaps the problem I have is in fact entirely due to my bad understanding of either PIA or KNIME.

So, I would like to use the Protein Inference Algorithms software to benchmark it.
My identification engine is X!Tandem VENGEANCE (2015.12.15).
The output files are in mzIdentML produced by X!Tandem, so I've made a workflow with :
"List Files" node (mzIdentML)
"PIA Compiler" node
"PIA Analysis" node
the 2 first steps are OK : right green bullets
The last step fails : middle red cross with the message 👍
ERROR PIA Analysis 0:4 Execute failed: No PIA XML file given! Provide either by datatable (e.g. from PIA Compiler or List Files) or port (Input File)

I've exported the PIA intermediate file with the "Binary Objects to Files" to inspect its content : it is indeed containing PIA xml data.
<ns3:jPiaXML date="2016-09-06T08:50:22.748+02:00" name="compilation" xmlns:ns2="http://psidev.info/psi/pi/mzIdentML/1.1" xmlns:ns3="http://www.medizinisches-proteom-center.de/PIA/piaintermediate">
...

So I don't understand the Error message and what should I do to make it work ?
Do you have some insights that could help me to analyze my data ?

Thanks a lot
Olivier

Error fixing when input mzTab do not contains proteins.

When the inputfiles of the compiler are mztab without proteins. PIA fails with a null exception in the mztab reader.

Can we do a new release.

@julianu can we do a new release including the new filters for the multi-search engine.

Phosphorylation localisation score, fdr.

We need to include in PÍA a phosphorylation localización score and FDR .
I see so far two popular scores A-score and PhosphoRS for the localisation score. I will provide some code.

PSM ms_run[] get repeated in mztab export.

For some reason PSM ms_run[] get repeated in mztab export. Here and exmaple:

PSM FGIAAK 1 P21796 0 databaseName null [MS, MS:1002387, PIA, 1.3.10]|[PSI-MS, MS:1001476, X!Tandem, X! Tandem Alanine (2017.2.1.4)]|[PSI-MS, MS:1002048, MS-GF+, Release (v2017.07.21)] 0.003638683087973093 0.0075 20.0 0.004483837330552659 115.0 1.7413855E-8 0.34254366 null 1729.1622 2 303.68479405403644 303.683456328125 ms_run[1]:index=1433|ms_run[1]:index=1433|ms_run[1]:index=1433|ms_run[2]:index=1433|ms_run[2]:index=1433|ms_run[2]:index=1433 R Y 219 224 0 0 1

Naming of exported file

For batch processing it would be great, if the name of the exported file would be related to the file name of the searched spectrum.

XML export error

I get the following error when PIA tries to write its XML output:

[05-Dec-2019 14:00:56 - INFO] "Writing PIA XML file to /home/wout/Downloads/b10000_ZNF230.xml" (de.mpc.pia.intermediate.compiler.PIACompiler:918)
[05-Dec-2019 14:00:56 - INFO] "Stream open, writing PIA XML" (de.mpc.pia.intermediate.compiler.PIACompiler:942)
[05-Dec-2019 14:00:56 - ERROR] "JAXBException while writing XML file" (de.mpc.pia.intermediate.compiler.PIACompiler:989)
javax.xml.bind.JAXBException
- with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:241)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:455)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:652)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:599)
at de.mpc.pia.intermediate.compiler.PIACompiler.createMarshallerForPiaXML(PIACompiler.java:1007)
at de.mpc.pia.intermediate.compiler.PIACompiler.marshalToFormattedFragmentMarshaller(PIACompiler.java:1030)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutJaxbFilesList(PIACompiler.java:1080)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutXML(PIACompiler.java:961)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutXML(PIACompiler.java:919)
at de.mpc.pia.intermediate.compiler.PIACompiler.writeOutXML(PIACompiler.java:932)
at de.mpc.pia.intermediate.compiler.PIACompiler.main(PIACompiler.java:1273)
Caused by: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at javax.xml.bind.ContextFinder.safeLoadClass(ContextFinder.java:573)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:239)
... 10 more
[05-Dec-2019 14:00:56 - INFO] "Writing of PIA XML file finished." (de.mpc.pia.intermediate.compiler.PIACompiler:996)

Could this be due to an outdated version of JAXB that's not compatible with my Java version? For example, see jakartaee/jaxb-api#78 and https://stackoverflow.com/a/43574427.

I'm using Java version 13.0.1, while the pom file seems to indicate a PRIDE version 1.0.22 of JAXB is used. Is this even the standard JAXB version number? If so, it seems a pretty old one and an update to JAXB version 2.4 seems useful to support current Java versions.

Include FDR q value in command line output

Hi all, its me again...

I was wondering if it is possible to include "FDR q values" in the -proteinExport output from command line. I suppose I can calculate FDR myself by just simply counting decoys based on a score threshold. However, I just wanted to point this out. The output from command line I see is as follows:

"accessions","score","#peptides","#PSMs","#spectra"
"USP9X_HUMAN|Q93008","374.75325142812","155","450","438"
"USP9Y_HUMAN|O00507","141.30843342618374","66","138","138"
"Q6P468_HUMAN|Q6P468","139.16527025922363","59","170","164"
"HSP7C_HUMAN|P11142,V9HW22_HUMAN|V9HW22","110.67590820388949","47","111","111"
"GRP78_HUMAN|P11021,V9HWB4_HUMAN|V9HWB4","99.65650591097625","41","90","90"
"A0A0G2JIW1_HUMAN|A0A0G2JIW1,A8K5I0_HUMAN|A8K5I0,HS71A_HUMAN|P0DMV8,HS71B_HUMAN|P0DMV9","95.12141332736277","41","103","103"
"B7Z4V2_HUMAN|B7Z4V2,GRP75_HUMAN|P38646,V9HW84_HUMAN|V9HW84","87.67906153604937","40","71","71"
"A0A087WZG9_HUMAN|A0A087WZG9,B4DSP0_HUMAN|B4DSP0,PEG10_HUMAN|Q86TG7-2","78.44328812851143","31","358","187"
"A0A087WUL4_HUMAN|A0A087WUL4,A0A087WX23_HUMAN|A0A087WX23,A0A087WXK2_HUMAN|A0A087WXK2,PEG10_HUMAN|Q86TG7","77.50461134847883","31","358","187"

Within Knime this information is output as follows:

Proteins | Score | Coverages | nrPeptides | nrPSM | nrSpectra | ClusterID | Description | Decoy | FDR q value
[USP9X_HUMAN\|Q93008] | 433.7986176 | [?] | 152 | 363 | 426 | 11547 | [Probable ubiquitin carboxyl-terminal hydrolase FAF-X OS=Homo sapiens   GN=USP9X PE=1 SV=3] | FALSE | 0
[USP9Y_HUMAN\|O00507] | 175.9273045 | [?] | 66 | 128 | 136 | 11547 | [Probable ubiquitin carboxyl-terminal hydrolase FAF-Y OS=Homo sapiens   GN=USP9Y PE=2 SV=2] | FALSE | 0
[Q6P468_HUMAN\|Q6P468] | 160.3044466 | [?] | 56 | 137 | 159 | 11547 | [USP9X protein (Fragment) OS=Homo sapiens GN=USP9X PE=2 SV=1] | FALSE | 0
[GRP78_HUMAN\|P11021, V9HWB4_HUMAN\|V9HWB4] | 122.2464199 | [?, ?] | 42 | 80 | 86 | 68 | [78 kDa glucose-regulated protein OS=Homo sapiens GN=HSPA5 PE=1 SV=2,   Epididymis secretory sperm binding protein Li 89n OS=Homo sapiens   GN=HEL-S-89n PE=2 SV=1] | FALSE | 0
[HSP7C_HUMAN\|P11142, V9HW22_HUMAN\|V9HW22] | 121.8969185 | [?, ?] | 44 | 97 | 105 | 68 | [Heat shock cognate 71 kDa protein OS=Homo sapiens GN=HSPA8 PE=1 SV=1,   Epididymis luminal protein 33 OS=Homo sapiens GN=HEL-S-72p PE=2 SV=1] | FALSE | 0
[A0A0G2JIW1_HUMAN\|A0A0G2JIW1,   A8K5I0_HUMAN\|A8K5I0, HS71A_HUMAN\|P0DMV8, HS71B_HUMAN\|P0DMV9] | 110.6020159 | [?, ?, ?, ?] | 39 | 86 | 98 | 68 | [Heat shock 70 kDa protein 1B OS=Homo sapiens GN=HSPA1B PE=1 SV=1,   Epididymis secretory protein Li 103 OS=Homo sapiens GN=HSPA1A PE=2 SV=1, Heat   shock 70 kDa protein 1A OS=Homo sapiens GN=HSPA1A PE=1 SV=1, Heat shock 70   kDa protein 1B OS=Homo sapiens GN=HSPA1B PE=1 SV=1] | FALSE | 0
[B7Z4V2_HUMAN\|B7Z4V2, GRP75_HUMAN\|P38646,   V9HW84_HUMAN\|V9HW84] | 109.0415013 | [?, ?, ?] | 38 | 62 | 67 | 68 | [cDNA FLJ51907, highly similar to Stress-70 protein, mitochondrial   OS=Homo sapiens PE=2 SV=1, Stress-70 protein, mitochondrial OS=Homo sapiens   GN=HSPA9 PE=1 SV=2, Epididymis secretory sperm binding protein Li 124m   OS=Homo sapiens GN=HEL-S-124m PE=2 SV=1] | FALSE | 0
[B4DNT8_HUMAN\|B4DNT8] | 100.6255528 | [?] | 36 | 77 | 88 | 68 | [cDNA FLJ54370, highly similar to Heat shock 70 kDa protein 1 OS=Homo   sapiens PE=2 SV=1] | FALSE | 0

Where the ending column is the FDR q value.

Also, How would it be possible to get coverage results? I believe I have seen a workflow that uses OpenMS PeptideIndexer to do this?

Thanks,

Trent

Filters for PSM-Peptides using Search Engine provisioning

Some pipelines need to filter PSMs taking into account the Search Engine provisioning (which search engines have identified the corresponding spectrum), we see two cased here:

Filter Peptides that are not identified by all search engines. PR https://github.com/mpc-bioinformatics/pia/pull/123
Filter peptides identifications if the corresponding spectrum was identified with different sequences by each search engine. For example Spectrum A identified by SearchEngine A with sequence A and by Search Engine B with sequence B. We should be able to remove that case with a filter.

Parsing the PepXML files

We need to be able to parse the pepXML files which enable to support original results from TPP and other search engines.

Taking time the test running

@julianu testPIACompilerNativeFiles take a lot of time to run.

Make accession parsing optional

Many programs parse accessions in FASTAs just like:
"everything before the first blank is the accession" (e.g.OpenMS does this)

To make PIA more compatible, make this an optional accession parsing, which can be set and overrides all other parsing options.

metadata in PRIDE XML and mztab

@julianu here how we decided to export metadata from mzidentml -> mztab:

https://github.com/PRIDE-Utilities/ms-data-core-api/blob/master/src/main/java/uk/ac/ebi/pride/utilities/data/exporters/MzIdentMLMzTabConverter.java

and also from PRIDE XML to mztab:
- https://github.com/PRIDE-Utilities/ms-data-core-api/blob/master/src/main/java/uk/ac/ebi/pride/utilities/data/exporters/PRIDEMzTabConverter.java

We should check in the current version of the PIA how we converted PRIDE XML and mztab back to mzIdentML, especially the metadata. Some of the information cab be redundant like the softwares, etc.

Can you have a look

test issue

CSV format for PSM results as input file

Hi,

Does PIA accept PSM result input in simple TSV / CSV format? If yes, what does the format look like?

Thanks,

mzTab: add an (optional) export of the protein sequences

Yasset requested to add an export of the protein sequences, if available.

It should be in an optional column with the CV MS:1001344 (AA sequence).

Specify score column in mzTab file

I have two score columns in mzTab files generated by ANN-SoLo:

The shifted dot product score, accession MTD psm_search_engine_score[1] [MS, MS:1001143, search engine specific score for PSMs,]
The (subgroup) FDR, accession MTD psm_search_engine_score[2] [MS, MS:1002354, PSM-level q-value,]

PIA doesn't seem to know how to handle a search engine specific score, which makes sense because it can be anything. However, I haven't been able to figure out how to tell PIA to ignore the psm_search_engine_score[1] column in the mzTab file and use the psm_search_engine_score[2] column instead. As a result, when I run PIA on these mzTab files I get the following error:

Exception in thread "main" java.lang.IllegalArgumentException: Type must not be null or of type UNKNOWN_SCORE: [MS, MS:1001143, search engine specific score for PSMs, ]
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.getBasicScoreModelForParam(MzTabParser.java:752)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSMScore(MzTabParser.java:721)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.lambda$parsePSMScores$9(MzTabParser.java:696)
at java.base/java.util.TreeMap.forEach(TreeMap.java:1002)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSMScores(MzTabParser.java:695)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSM(MzTabParser.java:556)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parsePSMs(MzTabParser.java:521)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.parseFile(MzTabParser.java:240)
at de.mpc.pia.intermediate.compiler.parser.MzTabParser.getDataFromMzTabFile(MzTabParser.java:186)
at de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory$InputFileTypes$3.parseFile(InputFileParserFactory.java:119)
at de.mpc.pia.intermediate.compiler.parser.InputFileParserFactory.getDataFromFile(InputFileParserFactory.java:450)
at de.mpc.pia.intermediate.compiler.PIACompiler.getDataFromFile(PIACompiler.java:257)
at de.mpc.pia.intermediate.compiler.PIACompiler.parseCommandLineInfile(PIACompiler.java:1347)
at de.mpc.pia.intermediate.compiler.PIACompiler.parseCommandLineInfiles(PIACompiler.java:1302)
at de.mpc.pia.intermediate.compiler.PIACompiler.main(PIACompiler.java:1256)

I've used the following command to run PIA version 1.3.10:

java -cp pia-1.3.10/pia-1.3.10.jar de.mpc.pia.intermediate.compiler.PIACompiler -infile b10000_ZNF230.mztab -name pia_test -outfile b10000_ZNF230.xml

I've attached the mzTab file for reference (renamed to .txt to appease GitHub): b10000_ZNF230.txt

How can I process this file using PIA? Thanks.

Question about the -proteinExport function

Hello,
I tried the -proteinExport function for processing the sample data and found problems in the result. I used the command line -infile yeast-gold-015-filtered.pia.xml -paramFile parameter.xml -proteinExport yeast-gold-015-filtered.csv csv, the parameter file is the sample provided at https://github.com/mpc-bioinformatics/pia/wiki/parameters-XML-file and I changed the score names.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <tool docurl="http://www.medizinisches-proteom-center.de" name="pipeline" version="0.1.23"> <description>This file will contains a pipeline execution for PIA</description> <PARAMETERS> <NODE description="Sets whether PSM sets should be built to combine search results from different search engines / runs." name="PSMCreatePSMSets"> <ITEM value="yes" type="string" name="create sets"/> </NODE> <NODE description="Adds the given score name to the list of preferred scores for FDR calculation." name="PSMAddPreferredFDRScore"> <ITEM value="PSM q-value" type="string" name="score name"/> </NODE> <NODE description="Adds the given score name to the list of preferred scores for FDR calculation." name="PSMAddPreferredFDRScore"> <ITEM value="X!Tandem Expect" type="string" name="score name"/> </NODE> <NODE description="Sets the number of top identifications per spectrum used for all further FDR calculations, 0 meaning all identifications are used." name="PSMSetAllTopidentificationsForFDR"> <ITEM value="1" type="string" name="number of top identifications"/> </NODE> <NODE description="Sets the regular expression used for decoy detection or if 'searchengine' is given as pattern, assumes a decoy search directly performed by the search engine." name="PSMSetAllDecoyPattern"> <ITEM value="s.*" type="string" name="decoy pattern"/> </NODE> <NODE description="Calculates the FDR scores for all files." name="PSMCalculateAllFDR"/> <NODE description="Calculates the combined FDR score. The FDR scores for the single files should be calculated before." name="PSMCalculateCombinedFDRScore"/> <NODE description="Sets whether modifications should be considered while inferring the peptides from the PSMs. Defaults to false" name="PeptideConsiderModifications"> <ITEM value="no" type="string" name="consider modifications"/> </NODE> <NODE description="Adds a filter used by the protein inference. A filter is added by its name, an abbreviation for the comparison, the compared value and (optional), whether the comparison should be negatede.g. "AddInferenceFilter=charge_filter,EQ,2,no"" name="ProteinAddInferenceFilter"> <ITEM value="psm_score_filter_psm_combined_fdr_score" type="string" name="filtername"/> <ITEM value="LEQ" type="string" name="comparison"/> <ITEM value="0.01" type="string" name="value"/> <ITEM value="no" type="string" name="negate"/> </NODE> <NODE description="Inferes the proteins with the given inference method. Any inference filters should be set before this call with calls of AddInferenceFilter. The scoring method is set with the second argument. The scoring settings can be given by athird argument containing setting=value[;setting=value]* (usual settings are used_score and used_spectra)." name="ProteinInfereProteins"> <ITEM value="inference_spectrum_extractor" type="string" name="inference"/> <ITEM value="scoring_multiplicative" type="string" name="scoring"/> <ITEM value="combined_fdr_score" type="string" name="used score"/> <ITEM value="best" type="string" name="used spectra"/> </NODE> </PARAMETERS> </tool>

In the generated result file, the score column is "NaN".
`accessions | score | #peptides | #PSMs | #spectra

P36071 | NaN | 1 | 1 | 1

P25294 | NaN | 3 | 12 | 12

P48415 | NaN | 1 | 1 | 1

P38249 | NaN | 6 | 12 | 12`

But there were score values in the sample result file yeast-gold-015-filtered-proteins.csv, and three columns "isDecoy" "FDR" "q-value" were missed in my result. I'm not sure in which step I failed in the process.

Thank you for your help.

Kai Cheng

DIA

Does PIA work also with DIA data?

Is there a way to integrate the OpenSwath pipeline with PIA?

Class FDR, q-values

We need to have a way of computing the FDR for classes of PSMs - Peptides and Proteins. This classes can be :

General Modified: FDR for peptides modified.
Specific Modification FDR. FDR for Phosphorylated Peptides.
Misscleavage (0 , 1, 2)
CvParam Classification. We can assign to each psm (using cvParam) the tissue, cell type, or disease where it has been found and we can compute the FDR for the specific Characteristic (apart of the global FDR).

In principle, we should be able to get the list of peptides at 1% FDR and remove all the peptides with 2 miscleavages at 0.001% (Misscleavage FDR).

This is really relevant to perform studies like this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4974352/

The PIAModeller which contains all the nice functions should consume a file (or PIACompiler) instead of PIA compilation file

Currently, the only way to apply some statistics on top of a standard file format is with PIACompiler and this needs to be written in an PIA file to be consume by modeller rather than a data structure. This needs to be changed for the user in order to do PIAModeller on top of a standard file format.

medbioinf / pia Goto Github PK

pia's People

Contributors

Stargazers

Watchers

Forkers

pia's Issues

Recommend Projects

Recommend Topics

Recommend Org