Git Product home page Git Product logo

funcscan's Introduction

nf-core/funscan nf-core/funscan

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo

Nextflow

run with conda run with docker run with singularity Launch on Nextflow Tower

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/funcscan is a bioinformatics best-practice analysis pipeline for the screening of nucleotide sequences such as assembled contigs for functional genes. It currently features mining for antimicrobial peptides, antibiotic resistance genes and biosynthetic gene clusters.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify service from the ENA. We used contigs generated from assemblies of chicken cecum shotgun metagenomes (study accession: MGYS00005631).

Pipeline summary

  1. Annotation of assembled prokaryotic contigs with Prodigal, Pyrodigal, Prokka, or Bakta
  2. Screening contigs for antimicrobial peptide-like sequences with ampir, Macrel, HMMER, AMPlify
  3. Screening contigs for antibiotic resistant gene-like sequences with ABRicate, AMRFinderPlus, fARGene, RGI, DeepARG
  4. Screening contigs for biosynthetic gene cluster-like sequences with antiSMASH, DeepBGC, GECCO, HMMER
  5. Creating aggregated reports for all samples across the workflows with AMPcombi for AMPs, hAMRonization for ARGs, and comBGC for BGCs
  6. Software version and methods text reporting with MultiQC

funcscan metro workflow

Usage

:::note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data. :::

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fasta
CONTROL_REP1,AEG588A1_001.fasta
CONTROL_REP2,AEG588A1_002.fasta
CONTROL_REP3,AEG588A1_003.fasta

Each row represents a (multi-)fasta file of assembled contig sequences.

Now, you can run the pipeline using:

nextflow run nf-core/funcscan \
   -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --run_amp_screening \
   --run_arg_screening \
   --run_bgc_screening

:::warning Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs. :::

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/funcscan was originally written by Jasmin Frangenberg, Anan Ibrahim, Louisa Perelo, Moritz E. Beber, James A. Fellows Yates.

We thank the following people for their extensive assistance in the development of this pipeline:

Rosa Herbst, Martin Klapper.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #funcscan channel (you can join with this invite).

Citations

If you use nf-core/funcscan for your analysis, please cite it using the following doi: 10.5281/zenodo.7643099

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

funcscan's People

Contributors

adamrtalbot avatar darcy220606 avatar drpatelh avatar jasmezz avatar jfy133 avatar louperelo avatar midnighter avatar nf-core-bot avatar robsyme avatar tavareshugo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

funcscan's Issues

Add fargene

Description of feature

Add fargene

[transferred from old repository]

Have MultiQC output display pipeline summary tables

Description of feature

ORIGINALLY WE PLANNED TO ADD MULTIQC MODULES, HOWEVER FEW PRODUCES USEFUL SUMMARY STATISTICS. SEE THREAD

Our modules (first release):

  • amplify/predict
    • Log file in stdout, however no summary statistcs. Same as output TSV (could summarise number of lines in TSV)
  • deeparg
    • No log file, no info in Stdout, output files just tables
  • fargene
    • currently no sample-id information in any log file,
    • @louperelo may try updating the tool but little activite on fARGene repo)
  • hamronisation
  • hmmer/hmmsearch
    • No log files, no stdout info
  • macrel
    • There is a log file, slightly odd as it ust spits out a hdeader node and Prodigal output, and hten when there is a hit (I think) it prints the contig name. I guess could count those?
  • prokka
  • antismash
  • ampir
  • prodigal

Clean output directories per sample

Description of feature

We need to clean up some output directories where multiple files per sample need to be put in individual sample folders.
For example this:
image
is supposed to look like this:
image

This is currently necessary for:

  • ampir
  • gecco
  • hamronization
  • prodigal
  • rgi
  • prodigal

MultiQC module for DeepARG

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for deepARG that we can make for MultiQC.

DeepARG Database download regularly fails

Description of the bug

DeepARG Database download regularly fails, blocking assessment if other tests pass in the CI tests

We should separate the CI test specifically for DeepARG so we can see if everything else works

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add Macrel

Description of feature

Add Macrel module

[transferred from old repository]

Add Macrel_Peptides to modules

Description of feature

So far we included Macrel_Contigs in the pipeline. In this option the tool uses Prodigal to predict genes. We should think about replacing it with Macrel_Peptides, in order not to run a gene prediction (with Prokka or Prodigal) two times during the pipeline.

Add hmmsearch

Description of feature

Add hmmsearch module

[transferred from old repository]

deeparg bioconda singularity container is borked

Description of the bug

Need to to fix it (this module will plague me forever it seems...)

I'm hoping warning and error is related. I'll try and add g++ to the conda recipe under run

❯ cat .command.log
/usr/local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
  "downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
  File "/usr/local/bin/deeparg", line 7, in <module>
    from deeparg.entry import main
  File "/usr/local/lib/python2.7/site-packages/deeparg/entry.py", line 10, in <module>
    import deeparg.predict.bin.deepARG as clf
  File "/usr/local/lib/python2.7/site-packages/deeparg/predict/bin/deepARG.py", line 12, in <module>
    from lasagne import layers
  File "/usr/local/lib/python2.7/site-packages/lasagne/__init__.py", line 27, in <module>
    import pkg_resources
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3251, in <module>
    @_call_aside
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 574, in _build_master
    ws = cls()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 567, in __init__
    self.add_entry(entry)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 623, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2065, in find_on_path
    for dist in factory(fullpath):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2135, in distributions_from_metadata
    root, entry, metadata, precedence=DEVELOP_DIST,
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2592, in from_location
    py_version=py_version, platform=platform, **kw
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2994, in _reload_version
    md_version = self._get_version()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2772, in _get_version
    version = _version_from_file(lines)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2556, in _version_from_file
    line = next(iter(version_lines), '')
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2767, in _get_metadata
    for line in self.get_metadata_lines(name):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1432, in get_metadata_lines
    return yield_lines(self.get_metadata(name))
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1420, in get_metadata
    value = self._get(path)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1616, in _get
    with open(path, 'rb') as stream:
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg-info/PKG-INFO'

Command used and terminal output

No response

Relevant files

No response

System information

No response

AMP Summary Table

Description of feature

Should produce two files:

  • summary (Sample_Name,Tool,No_Hits)
  • aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

Workflow diagram

Update the funcscan workflow diagram (add new tools, outputs...)

MultiQC module for macrel

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for macrel/contigs that we can make for MultiQC.

Make it optional to save databases

Description of feature

In some cases it may not be worth 'publishing' internally downloaded databases as thye take up a lot of space. We should provide an opt-in flag that if provided, we also publish (via copy) the database to results/, and if not we leave in work and will be removed with cleanup.

Add deepARG

Description of feature

Add deepARG module

[transferred from old repository]

Evaluate additional AMP tools for inclusion

Description of feature

New modules for AMP detection included in funcscan

The predicted AMP Probabilities of the different tools should be combined in an output like this

Contig Sequence Ampir Amplify EnsembleAMPPred ACEP AMP-app AI4AMP

[transferred from old repository]

Disable antiSMASH Warning when the tool is not run

Description of the bug

The warning WARN: Warning: No antiSMASH database and/or directory supplied – they will be downloaded by the pipeline. appears even if antiSMASH was disabled in the run. Might be confusing for the user and should be disabled for this case.

Command used and terminal output

$ nextflow run . -c conf/test_bgc.config --bgc_skip_gecco true --bgc_skip_antismash true -profile conda --outdir hmm_bgc

Relevant files

No response

System information

No response

Add antiSMASH

Description of feature

Add antismash module

[transferred from old repository]

Clean up RGI output

Description of feature

Make it optional to output temporary files in the RGI results. Default: no temp files.
image

Add documentation

Description of feature

FuncScan needs a documentation:

  • Describe purpose and all the tools
  • Clarify inputs/outputs
  • Describe the logic
  • Create/update schematic
  • ...

Perfect timing during the hackathon in March!

Remove DeeepARG

Description of the bug

It's not compatible with containers due to a archaic version of one of the dependencies

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add ampir

Description of feature

Add ampir module

[transferred from old repository]

Consider creating another module for macrel (peptides)

Description of feature

As the 6 other amp detection tools require prokka annotated files as input, it would be better if the macrel module also accepts prokka output as input. The current macrel/contig module only runs na sequences.

make db storage in outdir optional for amrfinderplus

Description of feature

Add the option to store the database downloaded by amrfinderplus_update in the output directory. If the database is stored there, it should be put into the folder databases/amrfinder_db (currently it is in amp/amrfinderplus/db

Add AMPEP module

Description of feature

Add AMPEP module

Waiting on feedback from @RosaLuzia whether it should be included or not

[transferred from old repository]

make fargene scan for all ARG classes or list of ARG classes

Description of feature

In one run, fargene can scan for one out of ten antibiotic classes as pre-defined models:
(class_a, class_b_1_2, class_b_3, class_c, class_d_1, class_d_2, qnr, tet_efflux, tet_rpg, tet_enzyme)
Suggestion: The funcscan pipeline by default should scan for all classes and be able to accept a list of models / specific model defined by the user

Fix conflicting folder names of antiSMASH input channels

Description of the bug

Bug

The pipeline fails if the user's directories of the staged antismash input channels databases and antismash_dir have identical folder names (despite different paths).

Solution

  • Print a help message + add note in usage docs that this has to be avoided.
  • Maybe to be even more userfriendly: Get groovy to read the folder name and include it in the help message.
    Fix conflicting folder names of antiSMASH input channels

Command used and terminal output

No response

Relevant files

No response

System information

No response

Clean up fARGene output

Make it optional to output temporary files in the fARGene results: tmpdir in each arg class subfolder might not be useful to the user. Default: no tmpdir.
image

MultiQC module for fARGene

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for fARGene that we can make for MultiQC.

Add DeepBGC

Description of feature

This tool predicts BGCs with a different approach than the pipeline's other BGC modules GECCO and antiSMASH, still using AI (deep learning, see the Readme on its GitHub). It's output table is compatible to the other tools, so that the BGC summarizing tool ("comBGC" or whatever it's gonna be) can be easily adapted to parse DeepBGC tables as well.

Add HAMRONIZATION_RGI

Description of feature

Now we are adding RGI, we need to add the HAMRONIZATION module for it both to nf-core/modules and the pipeline

deepARG database currently inaccessible

Description of the bug

As the deepARG databases are currently inaccessible, --arg_skip_deeparg was set to true temporarily in the funcscan test.config to avoid the failing of the test runs. As soon as the problem is resolved by the maintainers of deepARG, the tool should be included in the test runs again.

Command used and terminal output

No response

Relevant files

No response

System information

No response

BGC Summary Table

Description of feature

Should produce two files:

summary (Sample_Name,Tool,No_Hits)
aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

Consider adding annotated contigs as main input to the pipeline

Description of feature

After talking to Martin Klapper today, he suggests that we add annoted contigs as a main input to the pipeline (besided assembly contigs) and and mke it an option to switch off the entire annotation step (prodigal and prokka), especially if we also cite that the output from MAG pipeline can be used as an input to funcscan.

Add the Prodigal module to the pipeline

Description of feature

Prodigal should be added to be the default tool for gene annotation, which is needed for several tools (AMP tools and DeepARG, etc.). Prokka should be the optional way, if the user wants functional annotation as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.