Git Product home page Git Product logo

Comments (10)

flashton2003 avatar flashton2003 commented on July 29, 2024 2

This post from Mick Watson might help?

http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/

from kraken.

jbeaulaurier avatar jbeaulaurier commented on July 29, 2024 1

Since I'm only a Kraken user and not a developer, I can't diagnose your exact issues here. But I'm guessing that the standard installation is not going to recognize the extra "fungi" download that you specified. You'll need to double check your changes to the download_genomic_library.sh script, but after you're sure that they are ok, I would try manually specifying the libraries you want to download, as follows:

./kraken-build --download-library bacteria --db kraken_DB
./kraken-build --download-library plasmids --db kraken_DB
./kraken-build --download-library fungi --db kraken_DB
./kraken-build --download-library viruses --db kraken_DB
./kraken-build --download-library human --db kraken_DB

No guarantees that those will all work, but it's worth a try. Once those are all there, do:
kraken-build --download-taxonomy --db kraken_DB
kraken-build --build --db kraken_DB

from kraken.

xapple avatar xapple commented on July 29, 2024 1

I just tried to install Kraken and this still seems to be a problem. Is kraken thus dead software and no longer maintained ?

from kraken.

jbeaulaurier avatar jbeaulaurier commented on July 29, 2024

They have rearranged the file structure of all the RefSeq genomes on the ftp server. The modified paths to the necessary *.tar.gz files (which you can manually update in download_genomic_library.sh) are:

(modify line 43): $FTP_SERVER/genomes/archive/old_refseq/Bacteria/all.fna.tar.gz
(modify line 59): $FTP_SERVER/genomes/archive/old_refseq/Plasmids/plasmids.all.fna.tar.gz

These two paths will allow you to download the standard bacteria and plasmid databases, but I'm not sure about the proper updated path for downloading the viruses database.

from kraken.

salaheenz avatar salaheenz commented on July 29, 2024

Thanks!!

from kraken.

salaheenz avatar salaheenz commented on July 29, 2024

Hi, after modifying the lines, Bacterial and Viral DB were downloaded but not the plasmid, Fungi (which I added separately), or Humans, instead it starts the building step. Any suggestions for that?

from kraken.

jbeaulaurier avatar jbeaulaurier commented on July 29, 2024

I'd love to help, but it's difficult without more specific information. If you enter the plasmid ftp path in your browser, does it take you to a list of assemblies? I'd double check that you have the proper path for the plasmids ftp directory.

I would familiarize yourself with the new NCBI ftp directory structure for these archived versions of the assemblies and see if you can locate the proper paths to the archived fungi and human assemblies.

from kraken.

salaheenz avatar salaheenz commented on July 29, 2024

I modified the command this way and run, no errors were found but did not get plasmid, fungi or human databases:

case "$1" in
"bacteria")
mkdir -p $LIBRARY_DIR/Bacteria
cd $LIBRARY_DIR/Bacteria
if [ ! -e "lib.complete" ]
then
rm -f all.fna.tar.gz
wget $FTP_SERVER/genomes/archive/old_refseq/Bacteria/all.fna.tar.gz
echo -n "Unpacking..."
tar zxf all.fna.tar.gz
rm all.fna.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of bacterial genomes, already downloaded here."
fi
;;

"Fungi")
mkdir -p $LIBRARY_DIR/Fungi
cd $LIBRARY_DIR/Fungi
if [ ! -e "lib.complete" ]
then
rm -f all.fna.tar.gz
wget $FTP_SERVER/genomes/archive/old_refseq/Fungi/all.fna.tar.gz
echo -n "Unpacking..."
tar zxf all.fna.tar.gz
rm all.fna.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of fungal genomes, already downloaded here."
fi
;;

"plasmids")
mkdir -p $LIBRARY_DIR/Plasmids
cd $LIBRARY_DIR/Plasmids
if [ ! -e "lib.complete" ]
then
rm -f plasmids.all.fna.tar.gz
wget $FTP_SERVER/genomes/archive/old_refseq/Plasmids/plasmids.all.fna.tar.gz
echo -n "Unpacking..."
tar zxf plasmids.all.fna.tar.gz
rm plasmids.all.fna.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of plasmids, already downloaded here."
fi
;;

"viruses")
mkdir -p $LIBRARY_DIR/Viruses
cd $LIBRARY_DIR/Viruses
if [ ! -e "lib.complete" ]
then
rm -f all.fna.tar.gz
rm -f all.ffn.tar.gz
wget $FTP_SERVER/genomes/Viruses/all.fna.tar.gz
wget $FTP_SERVER/genomes/Viruses/all.ffn.tar.gz
echo -n "Unpacking..."
tar zxf all.fna.tar.gz
tar zxf all.ffn.tar.gz
rm all.fna.tar.gz
rm all.ffn.tar.gz
echo " complete."
touch "lib.complete"
else
echo "Skipping download of viral genomes, already downloaded here."
fi
;;

"human")
mkdir -p $LIBRARY_DIR/Human
cd $LIBRARY_DIR/Human
if [ ! -e "lib.complete" ]
then
# get list of CHR_* directories
wget --spider --no-remove-listing $FTP_SERVER/genomes/H_sapiens/
directories=$(perl -nle '/^d/ and /(CHR_\w+)\s*$/ and print $1' .listing)
rm .listing

  # For each CHR_* directory, get GRCh* fasta gzip file name, d/l, unzip, and add
  for directory in $directories
  do
    wget --spider --no-remove-listing $FTP_SERVER/genomes/H_sapiens/$directory/
    file=$(perl -nle '/^-/ and /\b(hs_ref_GRCh\w+\.fa\.gz)\s*$/ and print $1' .listing)
    [ -z "$file" ] && exit 1
    rm .listing
    wget $FTP_SERVER/genomes/H_sapiens/$directory/$file
    gunzip "$file"

bhaley@NextSeq-Server:/mnt/data/bhaley/Results/kraken_dir$ sudo ./kraken-build --standard --db /mnt/data/bhaley/Results/kraken_DB
Found jellyfish v1.1.11
Skipping download of bacterial genomes, already downloaded here.
Skipping download of viral genomes, already downloaded here.
Kraken build set to minimize disk writes.
Creating k-mer set (step 1 of 6)...
Found jellyfish v1.1.11
Hash size not specified, using '11634429519'
K-mer set created. [1h3m14.378s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
K-mer set sorted. [4h41m27.198s]
Creating GI number to seqID map (step 4 of 6)...
GI number to seqID map created. [2m41.191s]
Creating seqID to taxID map (step 5 of 6)...
214486 sequences mapped to taxa. [40.859s]
Setting LCAs in database (step 6 of 6)...
Finished processing 214798 sequences
Database LCAs set. [2h46m41.733s]
Database construction complete. [Total: 8h34m45.604s]
bhaley@NextSeq-Server:/mnt/data/bhaley/Results/kraken_dir$

from kraken.

salaheenz avatar salaheenz commented on July 29, 2024

Doesn't support fungi; worked for plasmid but not for human, will work for the time being.... thanks a lot!!

from kraken.

jenniferlu717 avatar jenniferlu717 commented on July 29, 2024

Sorry for the late response. We are working on updating the download scripts so that they allow downloading of mouse and other refseq genomes. In the meantime, I would download the genomes using wget or rsync and add them using the kraken --add-to-library option which is described in the Kraken manual.

from kraken.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.