microbial-pangenomes-lab / panfeed Goto Github PK

View Code? Open in Web Editor NEW

A k-mer counter that streams gene-cluster specific k-mers, while keeping k-mer positional information. Useful for microbial GWAS analyses with higher interpretability.

License: Apache License 2.0

Python 93.98% Shell 6.02%

panfeed's Issues

possibility to parallelize panfeed-get-kmers.py

Hi. I read your paper, and used it for 100 genomes to get metadata associated gene information.

I installed panfeed v.1.5.1 using conda and it worked well.
I tested panfeed with 100 genomes, and filtered p-lrt 0.05 .

In the second to last command in your github **Quick start guide **, it seems to take much time to proceed.

Quick start guide regarding my question

panfeed-get-kmers -a pyseer.tsv -p kmers_to_hashes.tsv.gz -k kmers.tsv.gz | gzip > annotated_kmers.tsv.gz

command line I used

panfeed-get-kmers -a pyseer_test -p panfeed2/kmers_to_hashes.tsv -k panfeed2/kmers.tsv -t 0.05 | gzip > annotated_kmers.tsv.gz
12:25:26 - panfeed - 14343 patterns pass the p-value threshold 0.05
12:25:34 - panfeed - Found 2718 gene clusters
12:25:34 - panfeed - Found 318325 k-mers

I did filtered -t 0.1 at first panfeed step (panfeed-get-clusters -a pyseer.tsv -p panfeed1/kmers_to_hashes.stv.gz -t 1E-7 > gene_clusters.txt),
and -t 0.05 at second panfeed step.

Do I have to tightly filter get gene_clusters.txt ?
Also, to get the result fast is it possible to parallalize panfeed-get-kmers???
Thank you very much for your work.

Recommend Projects

microbial-pangenomes-lab / panfeed Goto Github PK

panfeed's Issues

possibility to parallelize panfeed-get-kmers.py

Quick start guide regarding my question

command line I used

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent