Hi. I read your paper, and used it for 100 genomes to get metadata associated gene information.
I installed panfeed v.1.5.1 using conda and it worked well.
I tested panfeed with 100 genomes, and filtered p-lrt 0.05 .
In the second to last command in your github **Quick start guide **, it seems to take much time to proceed.
Quick start guide regarding my question
panfeed-get-kmers -a pyseer.tsv -p kmers_to_hashes.tsv.gz -k kmers.tsv.gz | gzip > annotated_kmers.tsv.gz
command line I used
panfeed-get-kmers -a pyseer_test -p panfeed2/kmers_to_hashes.tsv -k panfeed2/kmers.tsv -t 0.05 | gzip > annotated_kmers.tsv.gz
12:25:26 - panfeed - 14343 patterns pass the p-value threshold 0.05
12:25:34 - panfeed - Found 2718 gene clusters
12:25:34 - panfeed - Found 318325 k-mers
I did filtered -t 0.1 at first panfeed step (panfeed-get-clusters -a pyseer.tsv -p panfeed1/kmers_to_hashes.stv.gz -t 1E-7 > gene_clusters.txt),
and -t 0.05 at second panfeed step.
Do I have to tightly filter get gene_clusters.txt ?
Also, to get the result fast is it possible to parallalize panfeed-get-kmers???
Thank you very much for your work.