nf-core / funcscan Goto Github PK
View Code? Open in Web Editor NEW(Meta-)genome screening for functional and natural product gene sequences
Home Page: https://nf-co.re/funcscan
License: MIT License
(Meta-)genome screening for functional and natural product gene sequences
Home Page: https://nf-co.re/funcscan
License: MIT License
Add antismash module
[transferred from old repository]
See https://github.com/pha4ge/hAMRonization
Originally suggested by @rpetit3, who is adding the module already
Module: nf-core/modules#1790
Update the parameter docs: https://nf-co.re/funcscan/dev/parameters
https://github.com/nf-core/funcscan/blob/dev/nextflow_schema.json
Add ampir module
[transferred from old repository]
The pipeline fails if the user's directories of the staged antismash input channels databases
and antismash_dir
have identical folder names (despite different paths).
No response
No response
No response
Add deepARG module
[transferred from old repository]
Add AMPEP module
Waiting on feedback from @RosaLuzia whether it should be included or not
[transferred from old repository]
The module is available in nf-core/modules now and can be added.
DeepARG Database download regularly fails, blocking assessment if other tests pass in the CI tests
We should separate the CI test specifically for DeepARG so we can see if everything else works
No response
No response
No response
Update the output documentation: https://nf-co.re/funcscan/dev/output
Should produce two files:
summary (Sample_Name,Tool,No_Hits)
aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)
So far we included Macrel_Contigs in the pipeline. In this option the tool uses Prodigal to predict genes. We should think about replacing it with Macrel_Peptides, in order not to run a gene prediction (with Prokka or Prodigal) two times during the pipeline.
We need to make sure that all (relevant) parameters that a user may want to tweak of a given tool are available to be modifiable by the pipelie user (see deepARG, where this is already added).
Before release we should do a pass to check every tool and insert params where necessary.
to the main nf-core/funscan logo!
Add AMPlify tool to modules
https://github.com/bcgsc/AMPlify
Li, C., Sutherland, D., Hammond, S.A. et al. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics 23, 77 (2022)
https://doi.org/10.1186/s12864-022-08310-4
After talking to Martin Klapper today, he suggests that we add annoted contigs as a main input to the pipeline (besided assembly contigs) and and mke it an option to switch off the entire annotation step (prodigal and prokka), especially if we also cite that the output from MAG pipeline can be used as an input to funcscan.
Now we are adding RGI, we need to add the HAMRONIZATION module for it both to nf-core/modules and the pipeline
Update the usage documentation: https://nf-co.re/funcscan/dev/usage
Add hmmsearch module
[transferred from old repository]
Add Macrel module
[transferred from old repository]
New modules for AMP detection included in funcscan
The predicted AMP Probabilities of the different tools should be combined in an output like this
Contig | Sequence | Ampir |
Amplify |
EnsembleAMPPred |
ACEP |
AMP-app |
AI4AMP |
---|
[transferred from old repository]
Consider adding an HMMsearch module similar to that created for the AMPs for BGCs detection
The warning WARN: Warning: No antiSMASH database and/or directory supplied – they will be downloaded by the pipeline.
appears even if antiSMASH was disabled in the run. Might be confusing for the user and should be disabled for this case.
$ nextflow run . -c conf/test_bgc.config --bgc_skip_gecco true --bgc_skip_antismash true -profile conda --outdir hmm_bgc
No response
No response
Prodigal should be added to be the default tool for gene annotation, which is needed for several tools (AMP tools and DeepARG, etc.). Prokka should be the optional way, if the user wants functional annotation as well.
In one run, fargene can scan for one out of ten antibiotic classes as pre-defined models:
(class_a, class_b_1_2, class_b_3, class_c, class_d_1, class_d_2, qnr, tet_efflux, tet_rpg, tet_enzyme)
Suggestion: The funcscan pipeline by default should scan for all classes and be able to accept a list of models / specific model defined by the user
One more annotation tool! Let's include this module as one more annotation option. https://github.com/oschwengers/bakta
And maybe update the module to the most recent bakta version 1.5.
One for each of AMP/BGC/ARG, each of these are opt-in, then for each speicifc tool, within each subworkflow, is opt-out
If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for deepARG that we can make for MultiQC.
ORIGINALLY WE PLANNED TO ADD MULTIQC MODULES, HOWEVER FEW PRODUCES USEFUL SUMMARY STATISTICS. SEE THREAD
Our modules (first release):
As the deepARG databases are currently inaccessible, --arg_skip_deeparg
was set to true
temporarily in the funcscan test.config
to avoid the failing of the test runs. As soon as the problem is resolved by the maintainers of deepARG, the tool should be included in the test runs again.
No response
No response
No response
Update the funcscan workflow diagram (add new tools, outputs...)
As the 6 other amp detection tools require prokka annotated files as input, it would be better if the macrel module also accepts prokka output as input. The current macrel/contig module only runs na sequences.
Need to to fix it (this module will plague me forever it seems...)
I'm hoping warning and error is related. I'll try and add g++ to the conda recipe under run
❯ cat .command.log
/usr/local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
"downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
File "/usr/local/bin/deeparg", line 7, in <module>
from deeparg.entry import main
File "/usr/local/lib/python2.7/site-packages/deeparg/entry.py", line 10, in <module>
import deeparg.predict.bin.deepARG as clf
File "/usr/local/lib/python2.7/site-packages/deeparg/predict/bin/deepARG.py", line 12, in <module>
from lasagne import layers
File "/usr/local/lib/python2.7/site-packages/lasagne/__init__.py", line 27, in <module>
import pkg_resources
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3251, in <module>
@_call_aside
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
f(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 574, in _build_master
ws = cls()
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 567, in __init__
self.add_entry(entry)
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 623, in add_entry
for dist in find_distributions(entry, True):
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2065, in find_on_path
for dist in factory(fullpath):
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2135, in distributions_from_metadata
root, entry, metadata, precedence=DEVELOP_DIST,
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2592, in from_location
py_version=py_version, platform=platform, **kw
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2994, in _reload_version
md_version = self._get_version()
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2772, in _get_version
version = _version_from_file(lines)
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2556, in _version_from_file
line = next(iter(version_lines), '')
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2767, in _get_metadata
for line in self.get_metadata_lines(name):
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1432, in get_metadata_lines
return yield_lines(self.get_metadata(name))
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1420, in get_metadata
value = self._get(path)
File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1616, in _get
with open(path, 'rb') as stream:
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg-info/PKG-INFO'
No response
No response
No response
Add the option to store the database downloaded by amrfinderplus_update
in the output directory. If the database is stored there, it should be put into the folder databases/amrfinder_db
(currently it is in amp/amrfinderplus/db
Rationale: #57 (comment)
Should remove channels, reference in documentation, and the module itself
This tool predicts BGCs with a different approach than the pipeline's other BGC modules GECCO and antiSMASH, still using AI (deep learning, see the Readme on its GitHub). It's output table is compatible to the other tools, so that the BGC summarizing tool ("comBGC" or whatever it's gonna be) can be easily adapted to parse DeepBGC tables as well.
Add fargene
[transferred from old repository]
It's not compatible with containers due to a archaic version of one of the dependencies
No response
No response
No response
If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for macrel/contigs that we can make for MultiQC.
If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for fARGene that we can make for MultiQC.
In some cases it may not be worth 'publishing' internally downloaded databases as thye take up a lot of space. We should provide an opt-in flag that if provided, we also publish (via copy) the database to results/
, and if not we leave in work
and will be removed with cleanup
.
Add AMPlify to pipeline now it is in modules
Should produce two files:
We need to find small as possible but big as necessary testdata for running minimal CI tests.
Suggestion from @louperelo is using the Zymo mock communities, which Loman Lab already have some contigs for: https://lomanlab.github.io/mockcommunity/
[transferred from old repository]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.