phylo42 / pewo Goto Github PK

Phylogenetic Placement Evaluation Workflows : Benchmark placement software and different reference trees

License: MIT License

R 20.95% Python 77.62% Shell 1.43%

benchmarking metabarcoding metagenomics phylo-kmers phylogenetic-placement placement snakemake taxonomic-classification workflow

pewo's People

Contributors

Stargazers

Watchers

Forkers

matthiasblanke stardisblue frederic-mahe pierrebarbera jianshu93 benoitpenel computations thekswenson

pewo's Issues

add BranchDistance computation to PAC procedure

Currently PEWO PAC procedure computes Node Distance and expected Node Distance.
Another measure (already used by some authors) would be to use Branch Distance, e.g. actual branch length separating expected and observed placement.

javac encoding issues

Hi!

just ran into an issue with javac (via conda) choosing US-ASCII as encoding, causing the build to fail:

[javac] /home/folder/PEWO/scripts/java/PEWO_java/lib/RAPPAS/src/inputs/FASTQPointer.java:84: error: unmappable character (0xA9) for encoding US-ASCII
[javac]                 //elimination des character sp??ciaux
[javac]                                                ^
[javac] 68 errors
[javac] 1 warning

This may be a very system setup dependent issue affecting pretty much noone. Regardless, I thought I'd post the issue and a quick fix anyway.

Fix:

export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

and re-run the PEWO installer.

Pierre

Count is reserved for internal use

Under new versions of snakemake (e.g. mine is 7.24.2), the eval_accuracy workflow does not run with:

invalid name for input, output, wildcard, params or log: count is reserved for internal use
  File "/home/nikolai/dev/pewo_workflow/eval_accuracy.smk", line 27, in <module>
  File "/home/nikolai/dev/pewo_workflow/rules/op/operate_prunings.smk", line 72, in <module>

It does not happen with snakemake 5.10.0 (the version PEWO requires in the envs/environment.yaml and uses by default). Seemingly it happens as early as for 5.18.1. Just need to rename the parameter count to something else like pruning_count when updating dependencies of PEWO.

add documentation relative to CI tests

@nromashchenko commented in #4 :

Our Travis CI runs two pipelines travis/tests/1_..., travis/tests/2_... on every push, making sure it's possible to build and run those toy examples in the isolated environment.
If a developer does not change config files of those, his new app is not tested with CI.

What needs to be done:

add a section relative to CI in developer documentation
add a concrete example, this example could be directly based on AppSPAM example, e.g. commit 8a4af9d discussed in #4

Troubles with eval_resources_plots.R running the pipeline with single software

I just ran into an issue with eval_resources_plots.R for a run that only tests epa. Looks like the script presupposes that results for rappas and others must always be there; is there some easy way of fixing/hacking this to work? See warnings and erros below.

Cheers,
Pierre

[1] "OP:hmmer-align"
Warning message:
In analyses["epa"] <- c("hmmer-align", "epa-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["epang_h1"] <- c("hmmer-align", "epang-h1-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["epang_h2"] <- c("hmmer-align", "epang-h2-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["epang_h3"] <- c("hmmer-align", "epang-h3-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["epang_h4"] <- c("hmmer-align", "epang-h4-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["pplacer"] <- c("hmmer-align", "pplacer-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["apples"] <- c("hmmer-align", "apples-placement") :
  number of items to replace is not a multiple of replacement length
Warning message:
In analyses["rappas"] <- c("ansrec", "rappas-dbbuild", "rappas-placement") :
  number of items to replace is not a multiple of replacement length
Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns
Calls: merge ... merge.default -> merge -> merge.data.frame -> fix.by
Execution halted

define_resource_inputs copies input files

The define_resource_inputs rule copies input files to:

A/
T/
R/
G/

For examples, in case of eval_resources.smk, R/ and G/ will just have one copy of the same query file each. If the query file is big enough, this can be problematic. Since output files for rules can be symlinks, we should use them instead.

Add protein support for APPLES

For the latest version of APPLES (v2.0.5), PEWO should be using the -p flag for amino acid sequences. Currently it runs it in DNA mode what makes APPLES silently producing nonsense results.

Update wiki: examples 1-4 ND values

Due to recently fixed bug in the ND computation (and previous one leading to distances reported to be increased by one), we need to update the tutorials where they show the resulting NDs you get from running examples.

Support multiple input files for "query_file"

jscripts/java/PEWO_java/dist/PEWO.jar raises java.io.InvalidClassException

Hi,

The following job fails:

java -cp scripts/java/PEWO_java/dist/PEWO.jar DistanceGenerator_LITE2 /home/balaban/PEWO/examples/1_fast_test_of_accuracy_procedure/run RAPPAS,EPANG,PPLACER &> /home/balaban/PEWO/examples/1_fast_test_of_accuracy_procedure/run/logs/compute_nd.log

with the following exception:

"~/PEWO/examples/1_fast_test_of_accuracy_procedure/run/logs/compute_nd.log" 34L, 2389C 2,1 Top
ARGS: workDir [list_of_tested_software_directories,comma-separated]
example: /path/to/pewo_workdir EPANG,RAPPAS,PPLACER
scripts/java/PEWO_java/dist/PEWO.jar
workDir: /home/balaban/PEWO/examples/1_fast_test_of_accuracy_procedure/run
Loading /home/balaban/PEWO/examples/1_fast_test_of_accuracy_procedure/run/expected_placements.bin
Loading NxIndex
Loading pruningIndex
Loading expected placements
Loading trees
Jan 14, 2021 8:45:06 AM DistanceGenerator_LITE2 main
SEVERE: null
java.io.InvalidClassException: javax.swing.JComponent; local class incompatible: stream classdesc serialVersionUID = 3742318830738515599, local class serialVersionUID = 4588530037560142483
at java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1903)
at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1772)
at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1903)
at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1772)
at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1903)
at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1772)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1594)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:430)
at java.base/java.util.ArrayList.readObject(ArrayList.java:928)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160)
at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2216)
at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2087)
at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1594)
at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:430)
at DistanceGenerator_LITE2.main(Unknown Source)

I tried reinstalling PEWO and rebuilding the jar file but it didn't resolve the issue.

Accuracy procedure do not use anymore rappas_db_in_ram but db_build, which is slower

Need to check outputs set by workflow when rappas is used.
Previous version was using rappas_db_in_ram for accuracy and seperated db_build and placement for resources (the latter mode being slower).

Set up Github actions

We certainly need at least some level of testing and CI here, since the project is developed by more than zero people. We should start with:

running workflows with all example data we have
add an example for amino acid sequences

Conflict between local pip and conda pip for package taxtastic

If taxtastic is installed locally via pip, it mingles with the taxtastic installation of the conda environment.
Conda has its own internal pip command but somehow, it cannot isolate both ?

Example: taxtastic 0.8.5 installed locally via pip + 0.8.11 install via PEWO conda environement:

Traceback (most recent call last):
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 578, in _build_master
    ws.require(__requires__)
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 895, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.VersionConflict: (taxtastic 0.8.5 (/home/benclaff/.local/lib/python3.6/site-packages), Requirement.parse('taxtastic==0.8.11'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/benclaff/softwares/miniconda3/envs/PEWO/bin/taxit", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3112, in <module>
    @_call_aside
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3096, in _call_aside
    f(*args, **kwargs)
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3125, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 580, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 593, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/home/benclaff/.local/lib/python3.6/site-packages/pkg_resources/__init__.py", line 781, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'taxtastic==0.8.11' distribution was not found and is required by the application

Get rid of psiblast2fasta

According to HMMER's 3.3.2 guide, it supports FASTA with --outformat AFA. There is no need for psiblast and format juggling. The main reason why this should be done is that psiblast2fasta is quite memory inefficient, and it makes alignment steps in all workflows unrealistically RAM-greedy.

INSTALL.sh cannot find conda installation

I have miniconda 4.6.10 installed in my machine. In this version, "conda" command does not execute a binary file directly, it is a function added to .bashrc during installation. INSTALL.sh checks whether conda is installed using the "command -v" command. this command cannot locate functions defined in bashrc. As a result, PEWO installation fails with the error message :
PEWO installer: Command 'conda' not found.
PEWO installer: This is a requirement to PEWO installation. See documentation.