Git Product home page Git Product logo

msquality's People

Contributors

jorainer avatar jwokaty avatar tnaake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

msquality's Issues

Implementation of metrics

Based on HUPO-PSI (https://github.com/HUPO-PSI/mzQC/blob/master/cv/qc-cv.obo)

Based on Spectra /MSnExp

  • QC:4000053, single value, RT duration, The retention time duration of the MS run in seconds, similar to the highest scan time minus the lowest scan time;
  • QC:4000054, n-tuple, RT over TIC quantile, The interval when the respective quantile of the TIC accumulates divided by retention time duration. The number of quantiles observed is given by the size of the tuple;
  • QC:4000055, n-tuple, MS1 quantiles RT fraction, The interval used for acquisition of the first, second, third, and fourth quarter of all MS1 events divided by RT-Duration;
  • QC:4000056, n-tuple, MS2 quantiles RT fraction, The interval used for acquisition of the first, second, third, and fourth quarter of all MS2 events divided by RT-Duration;
  • QC:4000057, n-tuple, MS1 quantile TIC change ratio to Q1, The log ratio for the second to n-th quantile of TIC changes over first quantile of TIC changes;
  • QC:4000059, single value, Number of MS1 spectra, The number of MS1 events in the run;
  • QC:4000060, single value, Number of MS2 spectra, The number of MS2 events in the run;
  • QC:4000065, single value, Precursor median m/z for IDs, Median m/z value for all identified peptides (unique ions) after FDR; --> does not take into account identified peptides
  • QC:4000072, single value, Interquartile RT period for peptide identifications, The interquartile retention time period, in seconds, for all peptide identifications over the complete run;
  • QC:4000073, single value, Peptide identification rate of the interquartile RT period, The identification rate of peptides for the interquartile retention time period, in peptides per second;
  • QC:4000077, single value, Area under TIC, The area under the total ion chromatogram;
  • QC:4000078, n-tuple, Area under TIC RT quantiles, The area under the total ion chromatogram of the retention time quantiles. Number of quantiles are given by the n-tuple;
  • QC:4000125, single value, Extent of identified precursor intensity, Ratio of 95th over 5th percentile of precursor intensity for identified peptides;
  • QC:4000130, single value, Median of TIC values in the RT range in which the middle half of peptides are identified, Median of TIC values in the RT range in which half of peptides are identified (RT values of Q1 to Q3 of identifications);
  • QC:4000132, single value, Median of TIC values in the shortest RT range in which half of the peptides are identified;
  • QC:4000138, n-tuple, MZ acquisition range, Upper and lower limit of m/z values at which spectra are recorded;
  • QC:4000139, n-tuple, RT acquisition range, Upper and lower limit of time at which spectra are recorded;
  • QC:4000144, n-tuple, Precursor intensity range, Minimum and maximum precursor intensity recorded;
  • QC:4000167, n-tuple, Precursor intensity distribution Q1, Q2, Q3, From the distribution of precursor intensities, the quartiles Q1, Q2, Q3;
  • QC:4000168, single value, Precursor intensity distribution mean, From the distribution of precursor intensities, the mean;
  • QC:4000169, single value, Precursor intensity distribution sigma, From the distribution of precursor intensities, the sigma value;
  • QC:4000172, single value, MS1 signal jump (10x) count, The count of MS1 signal jump (spectra sum) by a factor of ten or more (10x) between two subsequent scans;
  • QC:4000173, single value, MS1 signal fall (10x) count, The count of MS1 signal decline (spectra sum) by a factor of ten or more (10x) between two subsequent scans;

Based on Spectra only

  • QC:4000174, single value, Charged peptides ratio 1+ over 2+, Ratio of 1+ peptide count over 2+ peptide count in identified spectra;
  • QC:4000175, single value, Charged peptides ratio 3+ over 2+, Ratio of 3+ peptide count over 2+ peptide count in identified spectra;
  • QC:4000176, single value, Charged peptides ratio 4+ over 2+, Ratio of 4+ peptide count over 2+ peptide count in identified spectra;
  • QC:4000177, single value, Mean charge in identified spectra;
  • QC:4000178, single value, Median charge in identified spectra;

Based on MSnExp only

  • QC:4000140, single value, Fastest frequency for MS level 1 collection;
  • QC:4000142, single value, Slowest frequency for MS level 1 collection;

Based on Chromatogram (requires peak detection and/or alignment --> parameters?)

  • QC:4000050, single value, XIC-WideFrac, The fraction of precursor ions accounting for the top half of all peak widths;
  • QC:4000051, n-tuple, XIC-FWHM quantiles, The first to n-th quantile of peak widths for the wide XICs;
  • QC:4000052, n-tuple, XIC-Height quantiles ratio to Q1, The log ratio for the second to n-th quantile of wide XIC heights over previous quantile of heights. For the boundary elements min/max are used;
  • QC:4000074, single value, Median MS1 peak FWHM for peptides, Median of all MS1 peak widths at half maximum (FWHM) for all identified peptides, in seconds;
  • QC:4000075, single value, Interquartile distance of MS1 peak FWHM for identifications, Interquartile distance of all MS1 peak widths at half maximum (FWHM) for all identifications, in seconds;
  • QC:4000133, single value, Explained base peak intensity median, Median of the ratio of 'max survey scan intensity' over 'sampled precursor intensity' for all peptides identified;
  • QC:4000135, single value, Number of chromatograms;
  • QC:4000218, n-tuple, Signal-to-noise ratio in MS1 - Q1, Q2, Q3, From the distribution of signal-to-noise ratio in MS1, the quartiles Q1, Q2, Q3 value;
  • QC:4000219, single value, Signal-to-noise ratio in MS1 - mean, From the distribution of signal-to-noise ratio in MS1, the mean;
  • QC:4000220, single value, Signal-to-noise ratio in MS1 - sigma, From the distribution of signal-to-noise ratio in MS1, the sigma value;
  • QC:4000262, single value, Retention time mean shift, Based on reference retention times of detected features the mean shift of all features is calculated in seconds;

unknown:

  • QC:4000131, single value, Median S/N for MS1 spectra in the shortest RT range in which half of the peptides are identified;
  • QC:4000148, single value, MS1 ion collection time mean, From the distribution of ion injection times (MS:1000927) for MS1, the mean;
  • QC:4000149, single value, MS1 ion collection time sigma, From the distribution of ion injection times (MS:1000927) for MS1, the sigma value;
  • QC:4000158, single value, Peak density distribution MS1 mean, From the distribution of peak densities in MS1, the mean;
  • QC:4000159, single value, Peak density distribution MS1 sigma, From the distribution of peak densities in MS1, the sigma value;
  • QC:4000263, single value, Pump pressure mean, The mean pump pressure in bar for the whole run

requires identification information:

  • QC:4000184, single value, Number of different distinct proteins from all PSM, Number of different distinct protein from all PSM after FDR filtering. (No undistinguishability groups.);
  • QC:4000185, n-tuple, Number of identified proteins, Number of identified proteins at given FDR threshold, first number is the number of proteins (considering sequence only), second number is the FDR threshold applied (negative if no threshold applied);
  • QC:4000186, single value, Total number of PSM, Total number of PSM before FDR filtering;
  • QC:4000187, n-tuple, Number of identified peptides, Number of identified peptides at given FDR threshold, first number is the number of peptides (considering sequence only), second number is the FDR threshold applied (negative if no threshold applied);
  • QC:4000191, single value, Precursor errors (Da) mean, From the distribution of Precursor errors (mass deviation of precursor to identified peptide in Da), the mean;
  • QC:4000192, single value, Precursor errors (Da) sigma, From the distribution of Precursor errors (mass deviation of precursor to identified peptide in Da), the sigma value;
  • QC:4000196, single value, Precursor errors (ppm) mean, From the distribution of Precursor errors (ppm), the mean;
  • QC:4000197, single value, Precursor errors (ppm) sigma, From the distribution of Precursor errors (ppm), the sigma value;
  • QC:4000201, single value, Precursor errors (ppm) median, From the distribution of Precursor errors (ppm), the median
  • QC:4000202, single value, Precursor errors (ppm) IQR, From the distribution of Precursor errors (ppm), the IQR;
  • QC:4000203, n-tuple, Identification score - Q1, Q2, Q3, From the distribution of Identification score, the Q1, Q2, Q3 value;
  • QC:4000204, single value, Identification score - mean, From the distribution of Identification score, the mean value;
  • QC:4000205, single value, Identification score - sigma, From the distribution of Identification score, the sigma value;
  • QC:4000213, n-tuple, Identified peptide lengths - Q1, Q2, Q3, From the distribution of identified peptide lengths the quartiles Q1, Q2, Q3 value;
  • QC:4000214, single value, Identified peptide lengths - mean, From the distribution of identified peptide lengths the mean;
  • QC:4000215, single value, Identified peptide lengths - sigma, From the distribution of identified peptide lengths the sigma value;
  • QC:4000228, n-tuple, Identified precursor intensity distribution Q1, Q2, Q3, From the distribution of identified precursor intensities, the quartiles Q1, Q2, Q3;
  • QC:4000229, single value, Identified precursor intensity distribution - mean, From the distribution of identified precursor intensities, the mean;
  • QC:4000230, single value, Identified precursor intensity distribution - sigma, From the distribution of identified precursor intensities, the sigma value;
  • QC:4000233, n-tuple, Unidentified precursor intensity distribution - Q1, Q2, Q3, From the distribution of unidentified precursor intensities, the quartiles Q1, Q2, Q3;
  • QC:4000234, single value, Unidentified precursor intensity distribution - mean, From the distribution of unidentified precursor intensities, the mean;
  • QC:4000235, single value, Unidentified precursor intensity distribution - sigma, From the distribution of unidentified precursor intensities, the sigma value;
  • QC:4000245, single value, Number of different undistinguishable proteins groups from all PSM, Number of different undistinguishable proteins groups from all PSM after FDR filtering. (Only undistinguishability groups.);
  • QC:4000257, single value, Detected Compounds, Number of detected compounds from a given library of target compounds in a specific run;

review the following metrics/functions

Hi @jorainer

I compiled here a list of the functions that we should double-check again before releasing the package. Some of these might be relatively straightforward (e.g. `rtDuration) others due to the cryptic description a bit more complicated. Probably the documentation has to be extended for those where the mzQC definition is unclear that we at least tell the users about our interpretation of those metrics.

metrics functions:

  • rtDuration
  • rtOverTICquantile
  • .rt_order_spectra
  • rtOverMSQuarters
  • ticQuantileToQuantileLogRatio
  • numberSpectra
  • medianPrecursorMZ
  • medianPrecursorMZ
  • rtIQRrate
  • areaUnderTIC
  • areaUnderTICRTquantiles
  • extentIdentifiedPrecursorIntensity
  • medianTICRTIQR
  • medianTICofRTRange
  • mzAcquisitionRange
  • rtAcquisitionRange
  • precursorIntensityRange
  • precursorIntensityQuartiles
  • precursorIntensityMean
  • precursorIntensitySD
  • msSignal10XChange
  • ratioCharge1over2
  • ratioCharge3over2
  • ratioCharge4over2
  • meanCharge
  • medianCharge

wrapper functions:

  • calculateMetricsFromSpectra
  • calculateMetricsFromMsExperiment
  • calculateMetrics

Instruction needed

Hi!

In the manuscript you write:

Thus, MsQuality supports a large variety of data input formats (ranging from mzML, mzXML, CDF, MGF, MSP to some raw vendor file formats, such as Bruker TimsTOF and Thermo raw files)

Can you provide some instruction on how to load the .raw files and .d folders into an object? I couldn't find any info about this.

Greets

Witek

List all available QC metrics in one place?

I was wondering if it would not be easier for the user if you would list all the QC metrics that your calculate in the vignette? Maybe in a form like

  • QC metric name (being a link to the definition at PSI:QC?): description. on which objects can this be calculated.

Create test data set

I would suggest to move all the code in the vignette to create the RPLC-based test dataset into inst/script/ and save the resulting object as an RData file that should go to the /data directory. This could then simply loaded with data() in the vignette. That way you can focus on the QC metrics and how they can be calculated in the vignette.

Note that then also a documentation file for that data object will be needed - but most of the textual description from the current vignette (and the reference) can be reused for that.

Information on interpretability of the metrics

Add interpretation of the metrics in the vignette. Currently the interpretation is still lacking for the metrics

  • chromatographyDuration
  • ticQuartersRtFraction
  • rtOverMsQuarters
  • ticQuartileToQuartileLogRatio
  • numberSpectra
  • mzAcquisitionRange
  • rtAcquisitionRange
  • numberEmptyScans
  • medianPrecursorMz
  • areaUnderTic
  • areaUnderTicRtQuantiles
  • medianTicRtIqr
  • medianTicOfRtRange
  • meanCharge
  • medianCharge

Should these interpretations be added to the mzQC obo file? @jorainer

Let's keep it rather basic (as this was also previously done for other metrics in the mzQC obo file). I have added some suggestions for each metric that we can discuss separately.

Change attributes for MS levels

For some of the metrics only certain MS levels are allowed. Only return the MS term as attributes if it matches the specified MS level

  • ticQuartileToQuartileLogRatio
  • medianPrecursorMz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.