hsinnan75 / strainpro Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 3.0 89.48 MB

License: MIT License

C 48.11% C++ 48.48% Makefile 0.92% Shell 1.07% Perl 1.43%

strainpro's People

Stargazers

Watchers

Forkers

wangdi2014 jessewuiis yawlin

strainpro's Issues

option to specify path to tax dump files

As far as I can tell, the path to the tax dump files download by download_taxonomy.sh is hardcoded into StrainPro-build. It would be helpful if the user can specify the location to the tax dump files so that custom taxonomies can be used (eg., GTDB tax dump files instead of NCBI) or if the user already has the NCBI tax dump files downloaded and located elsewhere (eg., in a "databases" directory).

Cannot access file: bin/StrainPro-rep

For StrainPro v0.9.2, it appears that when calling StrainPro-build, that executable requires StrainPro-rep to be located at bin/StrainPro-rep relative to the user's current working directory. Thus the user is required to always call StrainPro-build from the base StrainPro directory.

Why not just provide instructions for adding /path/to/clone/of/strainpro/StrainPro/bin/ to the user's directory and remove this hard-coded path requirement for bin/StrainPro-rep in bin/StrainPro-build? That is what the PATH env variable is for, and it would make bin/StrainPro-build a lot more flexible.

strainpro-build only one CPU used

I just cloned the most recent version of StrainPro yesterday. I'm running Strainpro-build on the example "ecoli.fa" file with freshly downloaded taxdump files. The build job has been running for ~15 hours now, with no status updates on what it is doing, and it only seems to be using 1 thread instead of the default 16.

Why does Strainpro-build take so long even with only a few genomes?

It would help to have status updates to understand what strainpro-build is doing during the long run times.

In regards to the thread usage, I use a cluster for running jobs, and so it's inefficient to request 16 threads for a cluster job if only 1 thread is used for the majority of the time. If possible, It would be helpful to break up Strainpro-build into 2 separate commands: the first being the single-thread work, and the 2nd being the multi-threaded work.

What ist reference-fna

Hi all,
what os reference-fna in this case?

$bin/StrainPro-build -r reference-fna -o ref_idx [ref_idx is the output folder for BWT indexes]

Thanks for your help :)

StrainPro-build: buffer overflow detected

I'm using StrainPro-build v0.9.0 on Ubuntu 18.04.3. Here's a reproducible example:

git clone https://github.com/hsinnan75/StrainPro.git
cd StrainPro/
make
./download_taxonomy.sh
./download_genomic_library.sh archaea
./bin/StrainPro-build -r database/archaea/library.fna -o database/archaea/ref_idx

The output from StrainPro-build is:

Load taxonomy information.
Get all sequences...
Cluster 542 sequences...
*** buffer overflow detected ***: ./bin/StrainPro-build terminated
Aborted (core dumped)

The server that I'm using has plenty of resources, so the issue isn't a lack of memory or something like that.

nothing is mapping to RefSeq-bacteria

I just mapped 0.5mil 150bp HiSeq reads from a human gut metagenome (previously profiled with kraken2; looked very normal for the sample time) to RefSeq-bacteria, and I'm getting the following output:

#TaxID              	#Read_count         	#Depth              	#Relative_abundance 	#Confidence_score
@TaxRank:subspecies
@TaxRank:species
@TaxRank:genus
@TaxRank:family
@TaxRank:order
@TaxRank:calss
@TaxRank:phylum
@TaxRank:kingdom

Steps to reproduce

git clone https://github.com/hsinnan75/StrainPro.git
cd StrainPro
./download_taxonomy.sh
./download_genomic_library.sh library bacteria  
# WARNING: the following cmd takes many hours to complete (even with the default 16 threads)!
./bin/StrainPro-build -r database/bacteria/ -o database/bacteria/ref_idx
./bin/StrainPro-map -i database/bacteria/ref_idx -f /path/to/metagenome/read1.fq.gz -o output

If I map the reads with kraken2 versus refseq, then I get a taxonomic distribution that looks normal. I tried mapping another sample with 10 mil reads instead of the 0.5 mil reads, and then I did get some output:

#TaxID                  #Read_count             #Depth                  #Relative_abundance     #Confidence_score
@TaxRank:subspecies
99822                   2395                    41                      7.43                    0.410877
411470                  5785                    28                      5.07                    0.282264
435591                  218                     25                      4.53                    0.255269
479831                  1269                    21                      3.80                    0.211359
499175                  52                      25                      4.53                    0.252427
536231                  965                     28                      5.07                    0.287117
553973                  50                      46                      8.33                    0.462963
679935                  2386                    30                      5.43                    0.297136
...
@TaxRank:phylum
976                     1162258                 35                      89.80                   1.000000
1224                    5506                    92                      0.43                    1.000000
1239                    122790                  38                      9.49                    1.000000
201174                  3682                    30                      0.28                    1.000000
@TaxRank:kingdom
2                       1294236                 36                      100.00                  1.000000

For the 0.5 mil read sample, I'm wondering why I didn't at least get some hits at the kingdom and phylum level. Even for this 10mil read file, I'm only getting 4 phyla. Is StrainPro just not very sensitive?

FYI: @TaxRank:calss is spelled wrong

hsinnan75 / strainpro Goto Github PK

strainpro's People

Stargazers

Watchers

Forkers

strainpro's Issues

option to specify path to tax dump files

Cannot access file: bin/StrainPro-rep

strainpro-build only one CPU used

What ist reference-fna

StrainPro-build: buffer overflow detected

nothing is mapping to RefSeq-bacteria

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent