hupo-psi / usi Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 3.0 2.72 MB

Universal Spectrum Identifier for Mass Spectrometry

License: Apache License 2.0

usi's People

Contributors

Stargazers

Watchers

Forkers

tivdnbos oscar-gr ralfg

usi's Issues

At least one USI example in the USI publication is not working

At least the proteomicsdb.org example in the Methods section of https://doi.org/10.1038/s41592-021-01184-6 is not working (both PDF and online version). See the screenshot below.

"USI not found at any of the repositories!" appears when using Universal Spectrum Identifier in ProteomeXchange?

Dear Professor

Sorry for disturbing you. I'd like to ask you a question. 

The Universal Spectrum identifiers is a very professional tool. I recently used Universal Spectrum identifiers to look up the PSM, but clicking "look up USI" kept saying "USI not found at any of the repositories!" (Universal Spectrum Identifier // ProteomeXchange). But I have checked the relevant information and there is no mistake, and tried other USIs under this project are not running.

I found that the USI that can be looked up successfully displays "Updated project metadata" in "2014" in the column of DATASET HISTORY in the information page of the corresponding project.

For example:
mzspec:PXD000966:CPTAC_CompRef_00_iTRAQ_05_2Feb12_Cougar_11-10-09.mzML:scan:12298:[iTRAQ4plex]-LHFFM[Oxidation]PGFAPLTSR/3

The " Updated project metadata in 2014" is not displayed in the DATASET HISTORY column of the project which can not be looked up successfully.

For example: mzspec:PXD006512:CNHPP_HCC_LC_profiling_L006_P_F1:scan:64442:VADALTNAVAHVDDMPNALSALSDLHAHK/3

mzspec:PXD006201:20150804SL_Qe2_HEP2_UBISITE_rep1_A_15_HpH_6:scan:15223:TLSDYNIQK[UNIMOD:1290]/2

I guess if some PSM could not be displayed on USI because there was no "Updated project metadata". Will the tool be able to update project metadata on all projects in the future?

Can you help me with this problem?  Relevant information is provided below

Thanks for your attention and time.

Native scan identification and ion mobility spectra

ACTION: Everyone: Consider the following TripleTOF spectrum and see how to support it

mzspec:PXD013210:TTB20160722_ISBHJOMXX001879_r01:scan:19809:SITS[phospho]PTTLYDR/2
https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/ShowObservedSpectrum?usi=mzspec:PXD013210:TTB20160722_ISBHJOMXX001879_r01:scan:19809:SITS[phospho]PTTLYDR/2

Scan numbers should absolutely not be used to identify WIFF spectra. There's simply no reliable way to get back from a scan number to the <sample, period, cycle, experiment> tuple that is necessary to actually pinpoint a spectrum in a WIFF file. Limiting the id to a single number makes the "universal" modifier rather inaccurate. :) The same goes for Waters spectra, where function and scan are orthogonal and both are needed to pinpoint a spectrum in the .raw data.

Index is also unsuitable for maintaining a link back to the native spectrum, especially for multi-dimension formats (WIFF and Waters .raw). Because the enumeration order of the dimensions is not guaranteed nor is there any clarity that the indexes used for any format are based on a completely unfiltered enumeration of data. In other words, someone generating USIs from a DDA mzML that has been filtered to only MS1s will get different indices than someone looking them up in an unfiltered file. It's simply not worth the potential for confusion!

We already solved this problem a decade ago with mzML and nativeIDs. Since they can be a bit verbose in a USI which is already quite long, I suggest we use an abbreviated format. Instead of "controllerType=0 controllerNumber=1 scan=123" we can put "MS:1000768:0.1.123" which is the combination of the Thermo nativeID accession and the abbreviated nativeID. Likewise:

WIFF: MS:1000770:1.1.123.2
Waters: MS:1000769:1.0.123
Bruker BAF: MS:1000772:123
Bruker FID: MS:1000773:_x0031_00_x0020_fmol_x0020_BSA_x002f_0_B1_x002f_1_x002f_1SRef_x002f_fid (this is an encoded version of 100 fmol BSA/0_B1/1/1SRef/fid because IDREF is the datatype)
MGF: MS:1000774:123
mzXML: MS:100776:123

The WIFF nativeID also solves another problem described here: the sample index in the WIFF file which can contain multiple samples which are NOT necessarily named uniquely. For a WIFF file, the "run name" part of the USI should refer ONLY to the WIFF filename, not the sample name.

However, there is an unresolved discussion about nativeIDs in the soon-to-be-recommended 3-array representation for ion mobility spectra in mzML. That discussion should apply to USIs as well, probably even more urgently because USIs may be paired with a spectrum interpretation. A single 3-array diaPASEF (or Agilent/Waters full IM frame) spectrum may correspond with multiple peptides. When the peptides are separated in the IM dimension, then creating a combined spectrum actually combines evidence that could otherwise be kept separate and combined for each peptide individually (using a unique range of mobility scans).

For example, let's say there is a Waters IM frame, which has 200 mobility scans (they all have the same retention time but cover a range of drift times). One peptide at drift time 5ms is supported by scans 50-60, and another peptide at drift time 10ms is supported by scans 120-130. If the combined spectrum was the entire frame of 200 scans (as @edeutsch suggested in email), then that evidence would all be combined in the same spectrum, and USIs to the spectrum would be ambiguous (kind of like a chimeric spectrum). When reading/converting the raw data, there's no interpretation of course, so a reader/converter can't know that the spectra should be separated by drift time. I was going to suggest that the raw spectra be given the full range of drift scans explicitly, like frame=123 scanStart=1 scanEnd=200 and the interpreting software can make a USI with a subset of the start/end range to refer to a specific subset of mobility scans. But I feel that's too complex if accessing the full combined spectrum in mzML. I think it makes more sense to make sure the USIs for ion mobility identifications include the IM window so reader code can do its own filtering (similar to using the peptide sequence to infer the precursor and product m/zs). The same logic would apply for diaPASEF, but not ddaPASEF. The latter can be easily separated into combined spectra with just the subset of the mobility range relevant to a specific precursor (e.g. frame=123 scanStart=456 scanEnd=567 for precursor 678.9). It's worth noting that ddaPASEF spectra are usually further merged (between frames) for searching purposes, and I think representing that is outside the scope of nativeIds. So those spectra, if searched, could only be tracked back to the mzML or MGF file (a merged=123 spectrum).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.