Git Product home page Git Product logo

strainpro's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

strainpro's Issues

option to specify path to tax dump files

As far as I can tell, the path to the tax dump files download by download_taxonomy.sh is hardcoded into StrainPro-build. It would be helpful if the user can specify the location to the tax dump files so that custom taxonomies can be used (eg., GTDB tax dump files instead of NCBI) or if the user already has the NCBI tax dump files downloaded and located elsewhere (eg., in a "databases" directory).

Cannot access file: bin/StrainPro-rep

For StrainPro v0.9.2, it appears that when calling StrainPro-build, that executable requires StrainPro-rep to be located at bin/StrainPro-rep relative to the user's current working directory. Thus the user is required to always call StrainPro-build from the base StrainPro directory.

Why not just provide instructions for adding /path/to/clone/of/strainpro/StrainPro/bin/ to the user's directory and remove this hard-coded path requirement for bin/StrainPro-rep in bin/StrainPro-build? That is what the PATH env variable is for, and it would make bin/StrainPro-build a lot more flexible.

strainpro-build only one CPU used

I just cloned the most recent version of StrainPro yesterday. I'm running Strainpro-build on the example "ecoli.fa" file with freshly downloaded taxdump files. The build job has been running for ~15 hours now, with no status updates on what it is doing, and it only seems to be using 1 thread instead of the default 16.

Why does Strainpro-build take so long even with only a few genomes?

It would help to have status updates to understand what strainpro-build is doing during the long run times.

In regards to the thread usage, I use a cluster for running jobs, and so it's inefficient to request 16 threads for a cluster job if only 1 thread is used for the majority of the time. If possible, It would be helpful to break up Strainpro-build into 2 separate commands: the first being the single-thread work, and the 2nd being the multi-threaded work.

What ist reference-fna

Hi all,
what os reference-fna in this case?

$bin/StrainPro-build -r reference-fna -o ref_idx [ref_idx is the output folder for BWT indexes]

Thanks for your help :)

StrainPro-build: buffer overflow detected

I'm using StrainPro-build v0.9.0 on Ubuntu 18.04.3. Here's a reproducible example:

git clone https://github.com/hsinnan75/StrainPro.git
cd StrainPro/
make
./download_taxonomy.sh
./download_genomic_library.sh archaea
./bin/StrainPro-build -r database/archaea/library.fna -o database/archaea/ref_idx

The output from StrainPro-build is:

Load taxonomy information.
Get all sequences...
Cluster 542 sequences...
*** buffer overflow detected ***: ./bin/StrainPro-build terminated
Aborted (core dumped)

The server that I'm using has plenty of resources, so the issue isn't a lack of memory or something like that.

nothing is mapping to RefSeq-bacteria

I just mapped 0.5mil 150bp HiSeq reads from a human gut metagenome (previously profiled with kraken2; looked very normal for the sample time) to RefSeq-bacteria, and I'm getting the following output:

#TaxID              	#Read_count         	#Depth              	#Relative_abundance 	#Confidence_score
@TaxRank:subspecies
@TaxRank:species
@TaxRank:genus
@TaxRank:family
@TaxRank:order
@TaxRank:calss
@TaxRank:phylum
@TaxRank:kingdom

Steps to reproduce

git clone https://github.com/hsinnan75/StrainPro.git
cd StrainPro
./download_taxonomy.sh
./download_genomic_library.sh library bacteria  
# WARNING: the following cmd takes many hours to complete (even with the default 16 threads)!
./bin/StrainPro-build -r database/bacteria/ -o database/bacteria/ref_idx
./bin/StrainPro-map -i database/bacteria/ref_idx -f /path/to/metagenome/read1.fq.gz -o output

If I map the reads with kraken2 versus refseq, then I get a taxonomic distribution that looks normal. I tried mapping another sample with 10 mil reads instead of the 0.5 mil reads, and then I did get some output:

#TaxID                  #Read_count             #Depth                  #Relative_abundance     #Confidence_score
@TaxRank:subspecies
99822                   2395                    41                      7.43                    0.410877
411470                  5785                    28                      5.07                    0.282264
435591                  218                     25                      4.53                    0.255269
479831                  1269                    21                      3.80                    0.211359
499175                  52                      25                      4.53                    0.252427
536231                  965                     28                      5.07                    0.287117
553973                  50                      46                      8.33                    0.462963
679935                  2386                    30                      5.43                    0.297136
...
@TaxRank:phylum
976                     1162258                 35                      89.80                   1.000000
1224                    5506                    92                      0.43                    1.000000
1239                    122790                  38                      9.49                    1.000000
201174                  3682                    30                      0.28                    1.000000
@TaxRank:kingdom
2                       1294236                 36                      100.00                  1.000000

For the 0.5 mil read sample, I'm wondering why I didn't at least get some hits at the kingdom and phylum level. Even for this 10mil read file, I'm only getting 4 phyla. Is StrainPro just not very sensitive?

FYI: @TaxRank:calss is spelled wrong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.