cmkobel / comparem2 Goto Github PK
View Code? Open in Web Editor NEWπ¦ π Microbial genomes-to-report pipeline
Home Page: https://CompareM2.readthedocs.io
License: GNU General Public License v3.0
π¦ π Microbial genomes-to-report pipeline
Home Page: https://CompareM2.readthedocs.io
License: GNU General Public License v3.0
Idea inspired by Tseemanns shred-reads command for snippy.
The idea is, that if you cut the genomes into reads, kraken will get much higher resolution. If I cut into the size that the kraken database is made up of, the species recognition will become more granulated.
Make a real install script that configures that python script.
The python script should use argparse, and calls the subsequent snakemake pipeline.
Default inputs/outputs means that starting the pipeline can be called as always (as with the alias)
When I set the snakemake conda prefix variable to something, prokka (and possibly also mlst) fail because of problems with perl root and perl moo.
In the installation, the variable is set as such.
echo "export SNAKEMAKE_CONDA_PREFIX=${ASSCOM2_BASE}/conda_base" >> ~/.bashrc
A workaround is to delete the variable from the ~/.bashrc andor ~/.zshrc
Note: this issue only pertains to using conda (--use-conda)
And then prefill the annotation.
https://docs.conda.io/projects/conda/en/latest/commands/run.html
Might be a cleaner solution.
since --until would work just as good.
I wonder why I never thought about that, as if all repeated jobs must end in a single??
Instead of setting up aliases and system varibles, and installing snakemake+mamba, maybe there could just be a conda package that would do all that for you.
Setting up the slurm/pbs stuff should still be manual but that is a different topic.
See "Warning: the file: .Rhistory doesn't look like a fasta file. Consider its inclusion." which is not part of the list made pΓ₯ the shell-script.
ββββββββββββββ¬βββ β¬ β¬ β¬ ββββββββ¬ββββββββ¬βββββββ¬βββββ¬ββKMA
βββ€ββββββββ€ βββββ΄ββ ββ¬β β β ββββββββββ€ββ¬ββββ€ β β βββ¬β
β΄ β΄ββββββββββ΄ β΄ββββ΄βββ΄ βββββββ΄ β΄β΄ β΄ β΄β΄βββ΄ β΄ β΄ ββββ΄ββ
Report issues at
https://github.com/cmkobel/assemblycomparator/issues
Info: The blastp-identity threshold is set to 95 (default).
(can be changed with the --blastp argument)
These are the 10 assemblies considered for project 2020_07_Ecoli_test_iqtree:
B18_236667.fa
B18_241039.fa
B18_309150.fa
B18_312563.fa
B18_343222.fa
B18_390375.fa
B18_412827.fa
B18_558476.fa
B18_576661.fa
B18_630107.fa
Do you wish to proceed? [y/n] y
proceeding...
activating environment...
validating assembly files...
backing up old content...
archiving content...
These are the jobs:
BLASTP: 95
Warning: the file: .Rhistory doesn't look like a fasta file. Consider its inclusion.
cmp_copy_2020_07_Ecoli_test_iqtree_ shouldrun 0.00% [1/0/0/0]
cmp_kraken2_ shouldrun 0.00% [2/0/0/0]
cmp_abricate_ shouldrun 0.00% [2/0/0/0]
cmp_prokka_2020_07_Ecoli_test_iqtree_ shouldrun 0.00% [2/0/0/0]
cmp_summary_tables_2020_07_Ecoli_test_iqtree shouldrun 88.24% [4/0/0/30]
cmp_mlst_2020_07_Ecoli_test_iqtree shouldrun 83.33% [2/0/0/10]
cmp_roary_95_2020_07_Ecoli_test_iqtree shouldrun 86.96% [3/0/0/20]
cmp_fasttree_2020_07_Ecoli_test_iqtree shouldrun 83.33% [4/0/0/20]
cmp_iqtree_2020_07_Ecoli_test_iqtree shouldrun 83.33% [4/0/0/20]
cmp_roary_plots_2020_07_Ecoli_test_iqtree shouldrun 80.00% [5/0/0/20]
cmp_panito_2020_07_Ecoli_test_iqtree shouldrun 80.00% [5/0/0/20]
cmp_mail_2020_07_Ecoli_test_iqtree shouldrun 78.43% [11/0/0/40]
Do you wish to submit this job list? [y/n]
If the pipeline is to work with conda for local setups anyway, I might as well ditch singularity, and focus all the debugging effort on the conda envs instead.
Solution: Pivot, so the samples are columns, and res. gene calls are rows.
Rename docker_imgs to docker_files.
Makes more sense.
.. only works if you do:
export PERL5LIB=/home/cmkobel/assemblycomparator2/conda_base/b51707c89e77d1771344e5a65ab516a5_/lib/perl5/site_perl/5.22.0
But that is never gonna be a good solution.
Almost all is done, but some things are missing
Many of the comparison tools. Mashtree, roary, iqtree....
The tools will just fail, which isn't a big deal. But would be nice if this was handled in a more graceful manner.
Running assemblycomparator2 --until <rule>
for any rule which is not metadata, on an uninitialized directory, the report will fail because the metadata file does not exist.
On solution is to have the metadata output as input in every single rule in the pipeline such that metadata will be forced to be created. Update, this solution might be suboptimal as it means that all jobs will have to run again if you update the metadata?
Running a few tests.
export ASSCOM2_KRAKEN2_DB='/project/ClinicalMicrobio/faststorage/database/kraken2/k2_pluspf_20210127'
Maybe each of the sequence_lengths_individual rules should touch it? Or what is the best option?
Often the user knows that the genomes are good stuff. And any2fasta can be skipped. Think about ~1000 genome datasets.
Consider if database setup can be made more parametric. Caveat with docker images is that it may be harder to acquire access to the databases. But maybe there is a way of doing that.
Fix (fixate) versions for everything in the conda environments.
It seems that asscom2 cannot handle spaces in the parent path of the working directory.
Neither in the filenames of the assemblies.
--cpo runs the CPO-relevant analyses (plasmids, resistance, etc)
--agg runs something relevant to Aggregatibacter
etc..
Make a pseudotarget named "cheap" or "fast" that just runs the quick stuff like:
The blastp-threshold should be part of the roary directory name.
Thus running more roary analyses with different thresholds, would be comparable.
Lige nu kΓΈrer rapporten some en rule. Men hvad med at installere R og alle pakkerne direkte i assemblycomparator2-condamiljΓΈet. I dette tilfΓ¦lde vil det vΓ¦re muligt at lave rapporten som et sidste script kald. Jeg mener der er en mulighed for at definere et script nΓ₯r pipelinen er fΓ¦rdig med at kΓΈre men jeg kan simpelthen ikke finde det.
Alternativt kunne man definere det som en ekstra ting i aliaset, sΓ₯ledes at nΓ₯r snakemake afslutter (exit 0) kan et nyt kald blive lavet. Hvis rule report ikke er med i outputlisten kunne man bare skrive ... && snakemake <blabla> --until report
og sΓ₯ vil rapporten komme ud.
HΓ₯ber det giver mening nΓ₯r jeg lΓ¦ser det her om 2 Γ₯r.
Pressing a mouse button or arrow keys emulates a "no"-answer.
It would be better if the user could confirm the entered value with an enter stroke.
Gives the user a chance to fix the problem ahead of time.
Use tabseq for development.
Use tabseq to calculate frequencies over the contigs. Visualize with some beautiful colors.
.. based on the minhash distances.
Update: or on gene absence/presence
After 24 hours, or when everything is done:
write a report, with the results that have been created so far.
If a .gwf directory already exists in the run folder, assemblycomparator can skip everything before running gwf.
Results are not so relevant I think.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.