Git Product home page Git Product logo

nf-core / metaboigniter Goto Github PK

View Code? Open in Web Editor NEW
15.0 34.0 14.0 47.28 MB

Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.

Home Page: https://nf-co.re/metaboigniter

License: MIT License

HTML 0.97% Python 29.01% Nextflow 66.23% Shell 3.78%
workflow metabolomics identification quantification mass-spectrometry nextflow pipeline nf-core ms1 ms2

metaboigniter's Introduction

nf-core/metaboigniter

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo nf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/metaboigniter is a bioinformatics pipeline that ingests raw mass spectrometry data in mzML format, typically in the form of peak lists and MS2 spectral data, for comprehensive metabolomics analysis. The key stages involve centroiding, feature detection, adduct detection, alignment, and linking, which progressively refine and align the data. The pipeline can also perform requantification to compensate for missing values and leverages MS2Query for compound identification based on MS2 data, outputting a comprehensive list of detected and potentially identified metabolites.

nf-core/metaboigniter workflow

  1. Centroiding: Converts the continuous mass spectra into a series of discrete points.
  2. Feature Detection: Identifies unique signals or 'features' in the spectra.
  3. Adduct Detection: Identifies adduct ions, which are formed by the interaction of the sample with the ion source.
  4. Alignment: Ensures that the same features across different samples are matched together.
  5. Linking: Establishes connections between features across different ionization modes or adducts.
  6. Requantification: Fills in missing values in the data set for a more complete analysis.
  7. Identification: Uses MS2Query and SIRIUS to identify compounds based on their MS2 spectral data.
  8. Output Generation: Produces a comprehensive list of detected and potentially identified metabolites.

nf-core/metaboigniter metro map

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,level,type,msfile
CONTROL_REP1,MS1,normal,mzML_POS_Quant/X2_Rep1.mzML
CONTROL_REP2,MS1,normal,mzML_POS_Quant/X2_Rep2.mzML
POOL_MS2,MS2,normal,mzML_POS_ID/POOL_MS2.mzML

Each row in this CSV file represents a unique sample, with the details provided in the columns.

  1. sample: This column should contain unique names for each sample. No two samples should share the same name in this column.
  2. level: This column should specify the level of mass spectrometry data contained in each sample file. This can be 'MS1' for files containing only MS1 data, 'MS2' for files containing only MS2 data, and 'MS12' for files containing both MS1 and MS2 data.
  3. type: This column can contain any descriptor of your choice, such as 'normal', 'disease', etc. This is usually used to provide some classification or group identification to your samples.
  4. msfile: This column should contain the path to the mzML file for each sample.

Now, you can run the pipeline using:

nextflow run nf-core/metaboigniter \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/metaboigniter was originally written by Payam Emami. The DSL2 version was developed with significant contributions from Axel Walter and Efi Kontou.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #metaboigniter channel (you can join with this invite).

Citations

If you use nf-core/metaboigniter for your analysis, please cite it using the following doi: 10.5281/zenodo.4743790

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

metaboigniter's People

Contributors

axelwalter avatar egonw avatar ewels avatar kevinmenden avatar maxulysse avatar nf-core-bot avatar payamemami avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metaboigniter's Issues

Process to create library using MSnbase fails on a few files

Description of the bug

In the identification subpipeline, I am trying to perform the identification using internal standards. I have a few library .mzML files and an associated library description file. The process process_create_library_pos_msnbase fails for some of the files but pass for some others

Steps to reproduce

Steps to reproduce the behaviour:

Sorry the data cannot be provided to reproduce the error :(

  1. Command line: nextflow run metaboigniter/main.nf -c metaboigniter/conf/custom.config -profile singularity

  2. System:

  • Hardware: HPC
  • Executor: slurm
  • OS: CentOS Linux
  • Version: 7
  1. Nextflow Installation:
  • Version: 20.10.0
  1. Container engine:
  • Engine: Singularity
  • version: 3.7.4-1.el7
  • Image tag: nfcore/metaboigniter:dev

Errors

Before the latest dev version, the process process_create_library_pos_msnbase failed on some of my library files, but the error was not the same for all these files, either this error (A) :

Error in strsplit(x = hitTMP[, "parentmzs"], split = ";", fixed = T)[[1]] : 
  subscript out of bounds
Calls: createLibrary
Execution halted

or this error (B) :

Error in parentMS2s[[p]] : subscript out of bounds
Calls: createLibrary
Execution halted

In the latest dev version (9c86f6f), with the modifications in the createLibrary.R file, the files which failed with error B (Error in parentMS2s[[p]] : subscript out of bounds) now pass this process, but the files which failed with error A (Error in strsplit(x = hitTMP[, "parentmzs"], split = ";", fixed = T)[[1]] : subscript out of bounds) still fail

If you have any idea on this issue it would be of great help 💪
Thanks in advance

URGENT: pin nf-validation version

Description of the bug

To prevent breaking this pipeline in the near future, the nf-validation version should be pinned to version 1.1.3 like:

plugins {
    id '[email protected]'
}

Command used and terminal output

No response

Relevant files

No response

System information

No response

Process to create library using MSnbase fails

Description of the bug

In the identification subpipeline, I am trying to perform the identification using internal standards. I have a few library .mzML files and an associated library description file. The process process_create_library_pos_msnbase fails gives different errors when different values of the following parameters are set in the conf/parameters.config file:

raw_file_name_preparelibrary_pos_msnbase
compund_id_preparelibrary_pos_msnbase
compound_name_preparelibrary_pos_msnbase
mz_col_preparelibrary_pos_msnbase

Steps to reproduce

Sorry the data cannot be given to reproduce the error :(

  1. Command line: nextflow run metaboigniter/main.nf -c metaboigniter/conf/custom.config -profile singularity

  2. Log file:
    log.txt (.nextflow.log renamed in log.txt)

  3. System:

  • Hardware: HPC
  • Executor: slurm
  • OS: CentOS Linux
  • Version: 7
  1. Nextflow Installation:
  • Version: 20.10.0
  1. Container engine:
  • Engine: Singularity
  • version: 3.7.4-1.el7
  • Image tag: nfcore/metaboigniter:1.0.1

Errors

I found that when we set the parameters (these following four with values different than default) :

raw_file_name_preparelibrary_pos_msnbase = 'RAW_FILE'
compund_id_preparelibrary_pos_msnbase = 'IARC_ID'
compound_name_preparelibrary_pos_msnbase = 'NAME'
mz_col_preparelibrary_pos_msnbase = 'MZ'

the process process_create_library_pos_msnbase fails with the error :

Loading required package: stringr
  Error in `[.data.frame`(libraryInfo, , requiredHeader["mzCol"]) : 
    undefined columns selected
  Calls: createLibrary -> IntervalMerge -> [ -> [.data.frame
  Execution halted

 
When we set the parameter for mz column to default ‘mz’ but the other three to values different than default :

raw_file_name_preparelibrary_pos_msnbase = 'RAW_FILE'
compund_id_preparelibrary_pos_msnbase = 'IARC_ID'
compound_name_preparelibrary_pos_msnbase = 'NAME'
mz_col_preparelibrary_pos_msnbase = 'mz'

the process process_create_library_pos_msnbase also fails but with a different error :

  Loading required package: stringr
  Error in data.frame(startRT = startRT, endRT = endRT, startMZ = startMZ,  : 
    arguments imply differing number of rows: 1, 0
  Calls: createLibrary -> IntervalMerge -> data.frame
  Execution halted

 
When we set all the four parameters to their default values :

raw_file_name_preparelibrary_pos_msnbase = ‘rawFile’
compound_id_preparelibrary_pos_msnbase = ‘HMDB.YMDB.ID’
compound_name_preparelibrary_pos_msnbase = ‘PRIMARY_NAME’
mz_col_preparelibrary_pos_msnbase = ‘mz’

the process succeeds for a few tasks (for a few identification file), but fails for others, giving the following error :

Loading required package: stringr
Error in strsplit(x = hitTMP[, "parentmzs"], split = ";", fixed = T)[[1]] : 
  subscript out of bounds
Calls: createLibrary
Execution halted

 
In the bin folder, I dug into the R scripts involved in the process process_create_library_pos_msnbase (createLibrary.R and createLibraryFun.R) and found that it is related to the dataframe MSlibrary in the script createLibraryFun.R. For the tasks failing, in the dataframe MSlibrary, the columns parentmzs, parentrts, parentInts and MS2s are empty, therefore the line MSlibrary[MSlibrary[,"MS2s"]!="",] returns an empty dataframe and further creates an empty hitTMP dataframe. While for tasks succeeding, these columns are not empty, giving further a non-empty hitTMP dataframe !

I still can’t understand what could have happened leading to this issue 😢
 
If you have any idea that would be great !
Once again thank you so much in advance for your answer 💪
 

Indexed mzML input

Description of the bug

If the input MSMS files are not indexed mzML the workflow fails at the parameter generation. We need to make sure that the files are index or do a reindexing.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Filename too long error when aligning multiple files

Description of the bug

metaboigniter completes without error when running only a couple files, but when running a full batch (63 files) crashes at the alignment step giving a "filename too long" error when trying to create the output from the mapalign step. it looks as though it's trying to pass an array of sample names as a filename to the /alignment/ folder.

Command used and terminal output

command: nextflow run nf-core/metaboigniter -profile docker

output: Mar-01 19:13:26.604 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: /home/laytox/projects/smoke/smoke_taint/99-output/alignment/[c3r1-r001, c3r1-r002, c1r3-r003, c1r3-r001, c3r1-r003, c1r1-r002, c1r1-r003, c1r1-r001, c1r3-r002, c3r3-r001, c3r3-r002, c4r1-r001, c3r3-r003, c4r1-r002, s1r1-r001, c4r1-r003, c4r3-r002, c4r3-r003, c4r3-r001, s1r1-r002, s1r2-r001, s1r1-r003, s1r2-r002, s1r2-r003, s1r3-r001, s2r1-r001, s1r3-r002, s1r3-r003, s2r3-r001, s2r1-r002, s2r1-r003, s2r2-r001, s2r2-r003, s2r2-r002, s3r1-r003, s2r3-r002, s2r3-r003, s3r1-r001, s3r3-r001, s3r1-r002, s3r2-r001, s3r2-r002, s3r2-r003, s3r3-r002, s4br2-r001, s3r3-r003, s4br1-r001, s4br1-r002, s4br1-r003, s4br3-r003, s4br2-r002, s4br2-r003, s4br3-r001, s4br3-r002, s4fr2-r001, s4fr1-r001, s4fr3-r001, s4fr1-r002, s4fr1-r003, s4fr2-r002, s4fr2-r003, s4fr3-r002, s4fr3-r003]: File name too long

Relevant files

files.zip

System information

Nextflow version: 23.10.1
Metaboigniter version: 2.0.0
Hardware: desktop
Executor: local
container engine: docker
OS: Linux (Fedora 39)

METABOIGNITER: Migrate all docs to JSON parameter schema

Hi!

this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.

This will then supersede any further parameter documentation, thus making things a bit easier :-)

If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR

Pipeline has no release but no UNDER CONSTRUCTION warning

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
  2. See error:

Expected behaviour

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine:
  • version:
  • Image tag:

Additional context

Library search retention time tolerance is missing

Check Documentation

I have checked the following places for your error:

Description of the bug

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
  2. See error:

Expected behaviour

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine:
  • version:
  • Image tag:

Additional context

Missing output file(s) error when centroiding data

Description of the bug

After centroiding first data file the workflow gives an error and stops, saying it can't find the centroided data file it just created.

It appears that changing line 48 in modules.config to be

ext.prefix = { " ${meta.id}.centroided " }

fixes the issue. The workflow appears to create a new centroided file with the original filename.mzML instead of filename.centroided.mzML which is what the workflow looks for in future steps

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow version: 23.10.01
Metaboigniter version: 2.0.0
Hardware: Desktop
Executor: Local
Container Engine: Docker
OS: Linux (Fedora 39)

"Path value cannot be null" error during featurelinkerunlabeledkd step

Description of the bug

During workflow with peakpickerhires->featurefindermetabo->mapalignerposecluster->maprttransformer workflow stops after mapalignerposecluster step with error "Error executing process Caused by: Path value cannot be null"

Previously had to make adjustment to the modules.config file to get the peakpickerhires step to work (added .centroided to the filename.mzML in line 48) and had to adjust paths on lines 137 and 175 to remove ${meta.id} to fix "filename too long" error.

Command used and terminal output

command: nextflow run nf-core/metaboigniter -profile docker

output: 
WARN: Input tuple does not match input set cardinality declared by process `NFCORE_METABOIGNITER:METABOIGNITER:LINKER:OPENMS_FEATURELINKERUNLABELEDKD` -- offending value: [id:Linked_data]
ERROR ~ Error executing process > 'NFCORE_METABOIGNITER:METABOIGNITER:LINKER:OPENMS_FEATURELINKERUNLABELEDKD (1)'

Caused by:
  Path value cannot be null

Relevant files

files.zip

System information

Nextflow version: 23.10.1
Metaboigniter version: 2.0.0
Hardware: Desktop
Executor: local
Container engine: Docker
OS: Linux (Fedora 39)

Software in bioconda

Generally we aim to use software packaged in BioConda for nf-core pipelines. By doing so we get support for conda, docker and singularity (the 2nd two via https://biocontainers.pro). We also avoid taking on maintenance of software packaging as well as pipeline maintenance.

Currently, this pipeline is using a suite of custom Docker containers. These are all built using dedicated repos at https://github.com/MetaboIGNITER

If possible, it would be great to switch from using these to using Bioconda. Here's my quick googling for them:

So nearly all seem to be available already on the face of it.

As you're currently using one container per process, the quickest way to use them is just to add them to the main script, e.g.:

process xcms {
    container "quay.io/biocontainers/bioconductor-xcms:3.12.0--r40h5f743cb_0"
    conda "bioconductor-xcms:3.12.0-0"

    script:
    """
    normal nextflow stuff here
    """
}

However, if it works, it might be nicer to add an environment.yml file back with the bioconda deps in, if they play well together. That gives a couple of advantages:

  • We can make the get_software_versions process run each command in one process to get the software version numbers reported
  • Simpler administration - nf-core lint checks this file for available updates for example
  • Smaller total file size for Singularity users

If they don't work together then that's fine. Pretty soon we will be moving all pipelines to DSL2 and rewriting pipelines to use a central repository of software wrappers at https://github.com/nf-core/modules - then each process will have to have its own container. If we're not using the main pipeline docker image at all we should delete the Dockerfile though and remove mention of the top-level process.container attribute.

Let me know what you think!

Phil

Do not define "NULL" string as a default value at nextflow_schema.json

Description

Some fields at nextflow_schema.json file define default values like an string ("default": "NULL") this will be a problem in the upcoming version of tower.nf. The "NULL" string will be set at the launchpad form and send to Nextflow when launching the pipeline. Finally the run will fail because Nextflow will interpret it as a string and not as an empty parameter.

Solution

Aligned with the discussion here about enforcing stricter rules for initialising params with no default value, I suggest to just set this fields to null at nextflow.config file and remove the default setting from the schema file. This will be compatible with the future tower.nf release.

negative run error

Description of the bug

i try a lot and always meet the same error message when run negative data

Command used and terminal output

Command error:
  Adding neutral: ---------- Adduct -----------------
  Charge: 0
  Amount: 1
  MassSingle: -18.0106
  Formula: H-2O-1
  log P: -2.99573
  
  Adding neutral: ---------- Adduct -----------------
  Charge: 0
  Amount: 1
  MassSingle: 46.0055
  Formula: C1H2O2
  log P: -0.693147
  
  MassExplainer table size: 4
  Error: Unexpected internal error (WARNING!!! implicit number of default adduct is negative!!! left:-1 right: -1
  )
  Generating Masses with threshold: -2.99573 ...
  done

Relevant files

No response

System information

No response

Maximum number of CPUs

Description of the bug

Maximal number of CPUs doesn't go over 12 even when though I set it to 32 max_cpusin my config file.

Relevant files

Screenshot 2024-04-22 at 17 21 49

System information

  • Nextflow version 23.10.1.5891
  • Desktop computer with 64 core CPUs, 252GB RAM
  • Executor : local
  • Container : Docker
  • OS : Ubuntu 22.04.4 LTS
  • Metaboigniter version : 2.0.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.