nf-core / funcscan Goto Github PK

The pipeline fails if the user's directories of the staged antismash input channels databases and antismash_dir have identical folder names (despite different paths).

Solution

Print a help message + add note in usage docs that this has to be avoided.
Maybe to be even more userfriendly: Get groovy to read the folder name and include it in the help message.
Fix conflicting folder names of antiSMASH input channels

Command used and terminal output

No response

Relevant files

No response

System information

No response

Description of feature

Add deepARG module

[transferred from old repository]

Description of feature

Add AMPEP module

Waiting on feedback from @RosaLuzia whether it should be included or not

[transferred from old repository]

Description of feature

The module is available in nf-core/modules now and can be added.

DeepARG Database download regularly fails

Description of the bug

DeepARG Database download regularly fails, blocking assessment if other tests pass in the CI tests

We should separate the CI test specifically for DeepARG so we can see if everything else works

Command used and terminal output

No response

Relevant files

No response

System information

No response

Description of feature

Update the output documentation: https://nf-co.re/funcscan/dev/output

https://github.com/nf-core/funcscan/blob/dev/docs/output.md

Description of feature

Should produce two files:

summary (Sample_Name,Tool,No_Hits)
aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

Add Macrel_Peptides to modules

Description of feature

So far we included Macrel_Contigs in the pipeline. In this option the tool uses Prodigal to predict genes. We should think about replacing it with Macrel_Peptides, in order not to run a gene prediction (with Prokka or Prodigal) two times during the pipeline.

Ensure all tools have relevant critical parameters accessible at pipeline level

We need to make sure that all (relevant) parameters that a user may want to tweak of a given tool are available to be modifiable by the pipelie user (see deepARG, where this is already added).

Before release we should do a pass to check every tool and insert params where necessary.

Add the funcscan logo

Description of feature

to the main nf-core/funscan logo!

Add AMPlify to modules

Description of feature

Add AMPlify tool to modules
https://github.com/bcgsc/AMPlify
Li, C., Sutherland, D., Hammond, S.A. et al. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics 23, 77 (2022)
https://doi.org/10.1186/s12864-022-08310-4

Clean up RGI output

Description of feature

Make it optional to output temporary files in the RGI results. Default: no temp files.

Consider adding annotated contigs as main input to the pipeline

Description of feature

After talking to Martin Klapper today, he suggests that we add annoted contigs as a main input to the pipeline (besided assembly contigs) and and mke it an option to switch off the entire annotation step (prodigal and prokka), especially if we also cite that the output from MAG pipeline can be used as an input to funcscan.

Add HAMRONIZATION_RGI

Description of feature

Now we are adding RGI, we need to add the HAMRONIZATION module for it both to nf-core/modules and the pipeline

funcscan usage docs

Description of feature

Update the usage documentation: https://nf-co.re/funcscan/dev/usage

https://github.com/nf-core/funcscan/blob/dev/docs/usage.md

Add hmmsearch

Description of feature

Add hmmsearch module

[transferred from old repository]

Add Macrel

Description of feature

Add Macrel module

[transferred from old repository]

Evaluate additional AMP tools for inclusion

Description of feature

New modules for AMP detection included in funcscan

Amplify
https://anaconda.org/bioconda/amplify
EnsembleAMPPred
http://ncrna-pred.com/Hybrid_AMPPred.htm
ACEP
https://github.com/Fuhaoyi/ACEP
AMP-app
https://github.com/Hayfabm/AMP-app
AI4AMP
https://github.com/LinTzuTang/PC6-protein-encoding-method
https://github.com/LinTzuTang/AI4AMP_predictor

The predicted AMP Probabilities of the different tools should be combined in an output like this

Contig	Sequence	`Ampir`	`Amplify`	`EnsembleAMPPred`	`ACEP`	`AMP-app`	`AI4AMP`

[transferred from old repository]

Create HMMsearch module for BGCs

Description of feature

Consider adding an HMMsearch module similar to that created for the AMPs for BGCs detection

Disable antiSMASH Warning when the tool is not run

Description of the bug

The warning WARN: Warning: No antiSMASH database and/or directory supplied – they will be downloaded by the pipeline. appears even if antiSMASH was disabled in the run. Might be confusing for the user and should be disabled for this case.

Command used and terminal output

$ nextflow run . -c conf/test_bgc.config --bgc_skip_gecco true --bgc_skip_antismash true -profile conda --outdir hmm_bgc

Relevant files

No response

System information

No response

Update metro map: Add deepBGC

Add the Prodigal module to the pipeline

Description of feature

Prodigal should be added to be the default tool for gene annotation, which is needed for several tools (AMP tools and DeepARG, etc.). Prokka should be the optional way, if the user wants functional annotation as well.

make fargene scan for all ARG classes or list of ARG classes

Description of feature

In one run, fargene can scan for one out of ten antibiotic classes as pre-defined models:
(class_a, class_b_1_2, class_b_3, class_c, class_d_1, class_d_2, qnr, tet_efflux, tet_rpg, tet_enzyme)
Suggestion: The funcscan pipeline by default should scan for all classes and be able to accept a list of models / specific model defined by the user

Add Bakta

Description of feature

One more annotation tool! Let's include this module as one more annotation option. https://github.com/oschwengers/bakta
And maybe update the module to the most recent bakta version 1.5.

Refactor to subworkflows for each specific natural product

Description of feature

One for each of AMP/BGC/ARG, each of these are opt-in, then for each speicifc tool, within each subworkflow, is opt-out

MultiQC module for DeepARG

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for deepARG that we can make for MultiQC.

Have MultiQC output display pipeline summary tables

Description of feature

ORIGINALLY WE PLANNED TO ADD MULTIQC MODULES, HOWEVER FEW PRODUCES USEFUL SUMMARY STATISTICS. SEE THREAD

Our modules (first release):

deepARG database currently inaccessible

Description of the bug

As the deepARG databases are currently inaccessible, --arg_skip_deeparg was set to true temporarily in the funcscan test.config to avoid the failing of the test runs. As soon as the problem is resolved by the maintainers of deepARG, the tool should be included in the test runs again.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Workflow diagram

Update the funcscan workflow diagram (add new tools, outputs...)

Consider creating another module for macrel (peptides)

Description of feature

As the 6 other amp detection tools require prokka annotated files as input, it would be better if the macrel module also accepts prokka output as input. The current macrel/contig module only runs na sequences.

deeparg bioconda singularity container is borked

Description of the bug

Need to to fix it (this module will plague me forever it seems...)

I'm hoping warning and error is related. I'll try and add g++ to the conda recipe under run

❯ cat .command.log
/usr/local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
  "downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
  File "/usr/local/bin/deeparg", line 7, in <module>
    from deeparg.entry import main
  File "/usr/local/lib/python2.7/site-packages/deeparg/entry.py", line 10, in <module>
    import deeparg.predict.bin.deepARG as clf
  File "/usr/local/lib/python2.7/site-packages/deeparg/predict/bin/deepARG.py", line 12, in <module>
    from lasagne import layers
  File "/usr/local/lib/python2.7/site-packages/lasagne/__init__.py", line 27, in <module>
    import pkg_resources
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3251, in <module>
    @_call_aside
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 574, in _build_master
    ws = cls()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 567, in __init__
    self.add_entry(entry)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 623, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2065, in find_on_path
    for dist in factory(fullpath):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2135, in distributions_from_metadata
    root, entry, metadata, precedence=DEVELOP_DIST,
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2592, in from_location
    py_version=py_version, platform=platform, **kw
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2994, in _reload_version
    md_version = self._get_version()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2772, in _get_version
    version = _version_from_file(lines)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2556, in _version_from_file
    line = next(iter(version_lines), '')
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2767, in _get_metadata
    for line in self.get_metadata_lines(name):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1432, in get_metadata_lines
    return yield_lines(self.get_metadata(name))
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1420, in get_metadata
    value = self._get(path)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1616, in _get
    with open(path, 'rb') as stream:
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg-info/PKG-INFO'

Command used and terminal output

No response

Relevant files

No response

System information

No response

make db storage in outdir optional for amrfinderplus

Description of feature

Add the option to store the database downloaded by amrfinderplus_update in the output directory. If the database is stored there, it should be put into the folder databases/amrfinder_db (currently it is in amp/amrfinderplus/db

Remove MultiQC from pipeline

Description of feature

Rationale: #57 (comment)

Should remove channels, reference in documentation, and the module itself

Add DeepBGC

Description of feature

This tool predicts BGCs with a different approach than the pipeline's other BGC modules GECCO and antiSMASH, still using AI (deep learning, see the Readme on its GitHub). It's output table is compatible to the other tools, so that the BGC summarizing tool ("comBGC" or whatever it's gonna be) can be easily adapted to parse DeepBGC tables as well.

Add fargene

Description of feature

Add fargene

[transferred from old repository]

Clean up fARGene output

Make it optional to output temporary files in the fARGene results: tmpdir in each arg class subfolder might not be useful to the user. Default: no tmpdir.

Remove DeeepARG

Description of the bug

It's not compatible with containers due to a archaic version of one of the dependencies

Command used and terminal output

No response

Relevant files

No response

System information

No response

Clean output directories per sample

Description of feature

We need to clean up some output directories where multiple files per sample need to be put in individual sample folders.
For example this:

is supposed to look like this:

This is currently necessary for:

ampir
gecco
hamronization
prodigal
rgi
prodigal

MultiQC module for macrel

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for macrel/contigs that we can make for MultiQC.

MultiQC module for fARGene

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for fARGene that we can make for MultiQC.

Make it optional to save databases

Description of feature

In some cases it may not be worth 'publishing' internally downloaded databases as thye take up a lot of space. We should provide an opt-in flag that if provided, we also publish (via copy) the database to results/, and if not we leave in work and will be removed with cleanup.

Add AMPLify to pipeline

Description of feature

Add AMPlify to pipeline now it is in modules

AMP Summary Table

Description of feature

Should produce two files:

summary (Sample_Name,Tool,No_Hits)
aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

Generate suitable testdata

Description of feature

We need to find small as possible but big as necessary testdata for running minimal CI tests.

Suggestion from @louperelo is using the Zymo mock communities, which Loman Lab already have some contigs for: https://lomanlab.github.io/mockcommunity/

[transferred from old repository]

nf-core / funcscan Goto Github PK

funcscan's Issues

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of the bug

Bug

Solution

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Description of feature

Description of the bug

Command used and terminal output

Relevant files

System information

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Description of feature

Recommend Projects

Recommend Topics

Recommend Org