Comments (10)
This post from Mick Watson might help?
http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/
from kraken.
Since I'm only a Kraken user and not a developer, I can't diagnose your exact issues here. But I'm guessing that the standard installation is not going to recognize the extra "fungi" download that you specified. You'll need to double check your changes to the download_genomic_library.sh script, but after you're sure that they are ok, I would try manually specifying the libraries you want to download, as follows:
./kraken-build --download-library bacteria --db kraken_DB
./kraken-build --download-library plasmids --db kraken_DB
./kraken-build --download-library fungi --db kraken_DB
./kraken-build --download-library viruses --db kraken_DB
./kraken-build --download-library human --db kraken_DB
No guarantees that those will all work, but it's worth a try. Once those are all there, do:
kraken-build --download-taxonomy --db kraken_DB
kraken-build --build --db kraken_DB
from kraken.
I just tried to install Kraken and this still seems to be a problem. Is kraken thus dead software and no longer maintained ?
from kraken.
They have rearranged the file structure of all the RefSeq genomes on the ftp server. The modified paths to the necessary *.tar.gz files (which you can manually update in download_genomic_library.sh) are:
(modify line 43): $FTP_SERVER/genomes/archive/old_refseq/Bacteria/all.fna.tar.gz
(modify line 59): $FTP_SERVER/genomes/archive/old_refseq/Plasmids/plasmids.all.fna.tar.gz
These two paths will allow you to download the standard bacteria and plasmid databases, but I'm not sure about the proper updated path for downloading the viruses database.
from kraken.
Thanks!!
from kraken.
Hi, after modifying the lines, Bacterial and Viral DB were downloaded but not the plasmid, Fungi (which I added separately), or Humans, instead it starts the building step. Any suggestions for that?
from kraken.
I'd love to help, but it's difficult without more specific information. If you enter the plasmid ftp path in your browser, does it take you to a list of assemblies? I'd double check that you have the proper path for the plasmids ftp directory.
I would familiarize yourself with the new NCBI ftp directory structure for these archived versions of the assemblies and see if you can locate the proper paths to the archived fungi and human assemblies.
from kraken.
I modified the command this way and run, no errors were found but did not get plasmid, fungi or human databases:
case "$1" in
"bacteria")
mkdir -p $LIBRARY_DIR/Bacteria
cd $LIBRARY_DIR/Bacteria
if [ ! -e "lib.complete" ]
then
rm -f all.fna.tar.gz
wget $FTP_SERVER/genomes/archive/old_refseq/Bacteria/all.fna.tar.gz
echo -n "Unpacking..."
tar zxf all.fna.tar.gz
rm all.fna.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of bacterial genomes, already downloaded here."
fi
;;
"Fungi")
mkdir -p $LIBRARY_DIR/Fungi
cd $LIBRARY_DIR/Fungi
if [ ! -e "lib.complete" ]
then
rm -f all.fna.tar.gz
wget $FTP_SERVER/genomes/archive/old_refseq/Fungi/all.fna.tar.gz
echo -n "Unpacking..."
tar zxf all.fna.tar.gz
rm all.fna.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of fungal genomes, already downloaded here."
fi
;;
"plasmids")
mkdir -p $LIBRARY_DIR/Plasmids
cd $LIBRARY_DIR/Plasmids
if [ ! -e "lib.complete" ]
then
rm -f plasmids.all.fna.tar.gz
wget $FTP_SERVER/genomes/archive/old_refseq/Plasmids/plasmids.all.fna.tar.gz
echo -n "Unpacking..."
tar zxf plasmids.all.fna.tar.gz
rm plasmids.all.fna.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of plasmids, already downloaded here."
fi
;;
"viruses")
mkdir -p $LIBRARY_DIR/Viruses
cd $LIBRARY_DIR/Viruses
if [ ! -e "lib.complete" ]
then
rm -f all.fna.tar.gz
rm -f all.ffn.tar.gz
wget $FTP_SERVER/genomes/Viruses/all.fna.tar.gz
wget $FTP_SERVER/genomes/Viruses/all.ffn.tar.gz
echo -n "Unpacking..."
tar zxf all.fna.tar.gz
tar zxf all.ffn.tar.gz
rm all.fna.tar.gz
rm all.ffn.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of viral genomes, already downloaded here."
fi
;;
"human")
mkdir -p $LIBRARY_DIR/Human
cd $LIBRARY_DIR/Human
if [ ! -e "lib.complete" ]
then
# get list of CHR_* directories
wget --spider --no-remove-listing $FTP_SERVER/genomes/H_sapiens/
directories=$(perl -nle '/^d/ and /(CHR_\w+)\s*$/ and print $1' .listing)
rm .listing
# For each CHR_* directory, get GRCh* fasta gzip file name, d/l, unzip, and add
for directory in $directories
do
wget --spider --no-remove-listing $FTP_SERVER/genomes/H_sapiens/$directory/
file=$(perl -nle '/^-/ and /\b(hs_ref_GRCh\w+\.fa\.gz)\s*$/ and print $1' .listing)
[ -z "$file" ] && exit 1
rm .listing
wget $FTP_SERVER/genomes/H_sapiens/$directory/$file
gunzip "$file"
bhaley@NextSeq-Server:/mnt/data/bhaley/Results/kraken_dir$ sudo ./kraken-build --standard --db /mnt/data/bhaley/Results/kraken_DB
Found jellyfish v1.1.11
Skipping download of bacterial genomes, already downloaded here.
Skipping download of viral genomes, already downloaded here.
Kraken build set to minimize disk writes.
Creating k-mer set (step 1 of 6)...
Found jellyfish v1.1.11
Hash size not specified, using '11634429519'
K-mer set created. [1h3m14.378s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
K-mer set sorted. [4h41m27.198s]
Creating GI number to seqID map (step 4 of 6)...
GI number to seqID map created. [2m41.191s]
Creating seqID to taxID map (step 5 of 6)...
214486 sequences mapped to taxa. [40.859s]
Setting LCAs in database (step 6 of 6)...
Finished processing 214798 sequences
Database LCAs set. [2h46m41.733s]
Database construction complete. [Total: 8h34m45.604s]
bhaley@NextSeq-Server:/mnt/data/bhaley/Results/kraken_dir$
from kraken.
Doesn't support fungi; worked for plasmid but not for human, will work for the time being.... thanks a lot!!
from kraken.
Sorry for the late response. We are working on updating the download scripts so that they allow downloading of mouse and other refseq genomes. In the meantime, I would download the genomes using wget or rsync and add them using the kraken --add-to-library option which is described in the Kraken manual.
from kraken.
Related Issues (20)
- gzip: .gz: not in gzip format
- kraken2-build error HOT 2
- db_sort: unable to mmap database.jdb: Cannot allocate memory
- Bioconda Kraken2 build standard database issue HOT 1
- How much time should be expected for building a database by kraken2-build?
- build_db: error opening taxonomy//nodes.dmp: No such file or directory 2020 HOT 2
- Kraken max length
- Issue with PLASMID download? HOT 2
- what(): 'database_10916': File truncated HOT 1
- xargs: cat: terminated by signal 13
- Kraken1 database exit code 137 HOT 2
- Xargs: cat: terminated by signal 13 with kraken2-build --build. HOT 4
- issue with rsync_from_ncbi.pl HOT 2
- Why classified reads are contaminated and unclassified are clean reads?
- problems with building kraken and kraken2 databases HOT 1
- Unable to run kraken2-build HOT 1
- errors with build kraken database
- rsync error
- Kraken2 error
- Cant open file: [Errno 2] No such file or directory: 'prueba//results.spa'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kraken.