Git Product home page Git Product logo

funcscan's Issues

Add antiSMASH

Description of feature

Add antismash module

[transferred from old repository]

Add documentation

Description of feature

FuncScan needs a documentation:

  • Describe purpose and all the tools
  • Clarify inputs/outputs
  • Describe the logic
  • Create/update schematic
  • ...

Perfect timing during the hackathon in March!

Add ampir

Description of feature

Add ampir module

[transferred from old repository]

Fix conflicting folder names of antiSMASH input channels

Description of the bug

Bug

The pipeline fails if the user's directories of the staged antismash input channels databases and antismash_dir have identical folder names (despite different paths).

Solution

  • Print a help message + add note in usage docs that this has to be avoided.
  • Maybe to be even more userfriendly: Get groovy to read the folder name and include it in the help message.
    Fix conflicting folder names of antiSMASH input channels

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add deepARG

Description of feature

Add deepARG module

[transferred from old repository]

Add AMPEP module

Description of feature

Add AMPEP module

Waiting on feedback from @RosaLuzia whether it should be included or not

[transferred from old repository]

DeepARG Database download regularly fails

Description of the bug

DeepARG Database download regularly fails, blocking assessment if other tests pass in the CI tests

We should separate the CI test specifically for DeepARG so we can see if everything else works

Command used and terminal output

No response

Relevant files

No response

System information

No response

BGC Summary Table

Description of feature

Should produce two files:

summary (Sample_Name,Tool,No_Hits)
aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

Add Macrel_Peptides to modules

Description of feature

So far we included Macrel_Contigs in the pipeline. In this option the tool uses Prodigal to predict genes. We should think about replacing it with Macrel_Peptides, in order not to run a gene prediction (with Prokka or Prodigal) two times during the pipeline.

Clean up RGI output

Description of feature

Make it optional to output temporary files in the RGI results. Default: no temp files.
image

Consider adding annotated contigs as main input to the pipeline

Description of feature

After talking to Martin Klapper today, he suggests that we add annoted contigs as a main input to the pipeline (besided assembly contigs) and and mke it an option to switch off the entire annotation step (prodigal and prokka), especially if we also cite that the output from MAG pipeline can be used as an input to funcscan.

Add HAMRONIZATION_RGI

Description of feature

Now we are adding RGI, we need to add the HAMRONIZATION module for it both to nf-core/modules and the pipeline

Add hmmsearch

Description of feature

Add hmmsearch module

[transferred from old repository]

Add Macrel

Description of feature

Add Macrel module

[transferred from old repository]

Evaluate additional AMP tools for inclusion

Description of feature

New modules for AMP detection included in funcscan

The predicted AMP Probabilities of the different tools should be combined in an output like this

Contig Sequence Ampir Amplify EnsembleAMPPred ACEP AMP-app AI4AMP

[transferred from old repository]

Disable antiSMASH Warning when the tool is not run

Description of the bug

The warning WARN: Warning: No antiSMASH database and/or directory supplied – they will be downloaded by the pipeline. appears even if antiSMASH was disabled in the run. Might be confusing for the user and should be disabled for this case.

Command used and terminal output

$ nextflow run . -c conf/test_bgc.config --bgc_skip_gecco true --bgc_skip_antismash true -profile conda --outdir hmm_bgc

Relevant files

No response

System information

No response

Add the Prodigal module to the pipeline

Description of feature

Prodigal should be added to be the default tool for gene annotation, which is needed for several tools (AMP tools and DeepARG, etc.). Prokka should be the optional way, if the user wants functional annotation as well.

make fargene scan for all ARG classes or list of ARG classes

Description of feature

In one run, fargene can scan for one out of ten antibiotic classes as pre-defined models:
(class_a, class_b_1_2, class_b_3, class_c, class_d_1, class_d_2, qnr, tet_efflux, tet_rpg, tet_enzyme)
Suggestion: The funcscan pipeline by default should scan for all classes and be able to accept a list of models / specific model defined by the user

MultiQC module for DeepARG

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for deepARG that we can make for MultiQC.

Have MultiQC output display pipeline summary tables

Description of feature

ORIGINALLY WE PLANNED TO ADD MULTIQC MODULES, HOWEVER FEW PRODUCES USEFUL SUMMARY STATISTICS. SEE THREAD

Our modules (first release):

  • amplify/predict
    • Log file in stdout, however no summary statistcs. Same as output TSV (could summarise number of lines in TSV)
  • deeparg
    • No log file, no info in Stdout, output files just tables
  • fargene
    • currently no sample-id information in any log file,
    • @louperelo may try updating the tool but little activite on fARGene repo)
  • hamronisation
  • hmmer/hmmsearch
    • No log files, no stdout info
  • macrel
    • There is a log file, slightly odd as it ust spits out a hdeader node and Prodigal output, and hten when there is a hit (I think) it prints the contig name. I guess could count those?
  • prokka
  • antismash
  • ampir
  • prodigal

deepARG database currently inaccessible

Description of the bug

As the deepARG databases are currently inaccessible, --arg_skip_deeparg was set to true temporarily in the funcscan test.config to avoid the failing of the test runs. As soon as the problem is resolved by the maintainers of deepARG, the tool should be included in the test runs again.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Workflow diagram

Update the funcscan workflow diagram (add new tools, outputs...)

Consider creating another module for macrel (peptides)

Description of feature

As the 6 other amp detection tools require prokka annotated files as input, it would be better if the macrel module also accepts prokka output as input. The current macrel/contig module only runs na sequences.

deeparg bioconda singularity container is borked

Description of the bug

Need to to fix it (this module will plague me forever it seems...)

I'm hoping warning and error is related. I'll try and add g++ to the conda recipe under run

❯ cat .command.log
/usr/local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:6: UserWarning: downsample module has been moved to the theano.tensor.signal.pool module.
  "downsample module has been moved to the theano.tensor.signal.pool module.")
Traceback (most recent call last):
  File "/usr/local/bin/deeparg", line 7, in <module>
    from deeparg.entry import main
  File "/usr/local/lib/python2.7/site-packages/deeparg/entry.py", line 10, in <module>
    import deeparg.predict.bin.deepARG as clf
  File "/usr/local/lib/python2.7/site-packages/deeparg/predict/bin/deepARG.py", line 12, in <module>
    from lasagne import layers
  File "/usr/local/lib/python2.7/site-packages/lasagne/__init__.py", line 27, in <module>
    import pkg_resources
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3251, in <module>
    @_call_aside
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3235, in _call_aside
    f(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 574, in _build_master
    ws = cls()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 567, in __init__
    self.add_entry(entry)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 623, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2065, in find_on_path
    for dist in factory(fullpath):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2135, in distributions_from_metadata
    root, entry, metadata, precedence=DEVELOP_DIST,
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2592, in from_location
    py_version=py_version, platform=platform, **kw
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2994, in _reload_version
    md_version = self._get_version()
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2772, in _get_version
    version = _version_from_file(lines)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2556, in _version_from_file
    line = next(iter(version_lines), '')
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2767, in _get_metadata
    for line in self.get_metadata_lines(name):
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1432, in get_metadata_lines
    return yield_lines(self.get_metadata(name))
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1420, in get_metadata
    value = self._get(path)
  File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1616, in _get
    with open(path, 'rb') as stream:
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg-info/PKG-INFO'

Command used and terminal output

No response

Relevant files

No response

System information

No response

make db storage in outdir optional for amrfinderplus

Description of feature

Add the option to store the database downloaded by amrfinderplus_update in the output directory. If the database is stored there, it should be put into the folder databases/amrfinder_db (currently it is in amp/amrfinderplus/db

Add DeepBGC

Description of feature

This tool predicts BGCs with a different approach than the pipeline's other BGC modules GECCO and antiSMASH, still using AI (deep learning, see the Readme on its GitHub). It's output table is compatible to the other tools, so that the BGC summarizing tool ("comBGC" or whatever it's gonna be) can be easily adapted to parse DeepBGC tables as well.

Add fargene

Description of feature

Add fargene

[transferred from old repository]

Clean up fARGene output

Make it optional to output temporary files in the fARGene results: tmpdir in each arg class subfolder might not be useful to the user. Default: no tmpdir.
image

Remove DeeepARG

Description of the bug

It's not compatible with containers due to a archaic version of one of the dependencies

Command used and terminal output

No response

Relevant files

No response

System information

No response

Clean output directories per sample

Description of feature

We need to clean up some output directories where multiple files per sample need to be put in individual sample folders.
For example this:
image
is supposed to look like this:
image

This is currently necessary for:

  • ampir
  • gecco
  • hamronization
  • prodigal
  • rgi
  • prodigal

MultiQC module for macrel

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for macrel/contigs that we can make for MultiQC.

MultiQC module for fARGene

Description of feature

If possible, we should make a MultiQC modules for summarising stats for each tool. We should investigate if a log file is made (or can be redirected from console) for fARGene that we can make for MultiQC.

Make it optional to save databases

Description of feature

In some cases it may not be worth 'publishing' internally downloaded databases as thye take up a lot of space. We should provide an opt-in flag that if provided, we also publish (via copy) the database to results/, and if not we leave in work and will be removed with cleanup.

AMP Summary Table

Description of feature

Should produce two files:

  • summary (Sample_Name,Tool,No_Hits)
  • aggregated (Sample_Name,Tool,Contig,Hit_Name,Probability,....)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.