jenniferlu717 / krakentools Goto Github PK
View Code? Open in Web Editor NEWKrakenTools provides individual scripts to analyze Kraken/Kraken2/Bracken/KrakenUniq output files
License: GNU General Public License v3.0
KrakenTools provides individual scripts to analyze Kraken/Kraken2/Bracken/KrakenUniq output files
License: GNU General Public License v3.0
Hi,
If I were to provide a genus ID, would using --exclude --include-children --include-parent manage to remove:
We need the counts from, for example, the parent Kingdom to be readjusted by the same number of reads being removed (but not for the entire kingdom to be removed, obviously).
Also, does the script adjust every kraken output file, or does it only adjust the classified reads ? In other words, do I need to regenerate a kraken report afterwards using make_kreport.py?
By the way, thanks a lot for this tool, it's definitely an indispensable complement to Kraken, it has solved so many roadblocks we faced!
Hi,
It's my first time analysing metagenomics from a shotgun experiment but before to post this issue I tried some strategies to overcome the topic described below.
I would like to construct a phyloseq object from Kraken2 + Bracken output but containing a phylogenetic tree in the phyloseq object in order to be able to calculate Unifrac distances in downstream analysis.
Could you recommend me some software to create a phylogenetic tree from Kraken2 output?
Thanks on advance for your help/hints,
Magí.
Installation instruction states that scripts can be used straightforward, however, after clean install, there is an error:
Traceback (most recent call last):
File "/usr/local/bin/extract_kraken_reads.py", line 55, in <module>
from Bio import SeqIO
ModuleNotFoundError: No module named 'Bio'
hi,
I am trying to download the kraken database
using this command
kraken-build --standard --threads 24 --db standard
but I got an error, can you please help me
Error Message
rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/762/265/GCF_000762265.1_ASM76226v1
Hi,
I was trying to extract the reads by the script but its throwing me this error all the time. I tried to figure out the bug but didn't got. Please have a look. The kraken2 was run on the reads
cmd line: python3 extract_kraken_reads.py -k output.txt -s1 R1.fastq.gz -s2 R2.fastq.gz -t 1 -o R1_c_1.fastq -o2 R2_c_2.fastq -r report.txt --include-children
STEP: Parsing report file report.txt
Traceback (most recent call last):
File "extract_kraken_reads.py", line 395, in
main()
File "extract_kraken_reads.py", line 206, in main
while level_num != (prev_node.level_num + 1):
UnboundLocalError: local variable 'prev_node' referenced before assignment
Hi!
I'm trying to use the output of kreport2mpa for humann3 however it needs the mpa file to be in metaphlan3 output style. Is there anyway to implement this in kreport2mpa?
Edit: I've had a closer look at the output formats and it seems to only be a thing of getting the additional columns (NCBI_taxid and additional species) and adding them in. From what I can see, it would be difficult to just manually lift them using a pattern matching process when intermediate levels are omitted. I think it will need to be done at the line stripping level, although I'm unsure how your code deals with it exactly. I will try to fenegle the code and will let you know if it works.
I imagine it will be much quicker on your end though, so please do help if you think this is even possible at all
Thanks!
Hi,
Thank you for your script, your script has been extremely helpful. However I was trying to use kreport2mpa.py with --display-header for 63 samples, combine_mpa.py, which is said to be able to use the header automatically in the manual reported errors:
Number of files to parse: 63
dir="."
for report in $(find $dir -maxdepth 1 -name "*_report.txt"|sort -V);do
report=${report//_report.txt/};
#ktImportTaxonomy -m 3 -t 5 ${report}_report.txt -o ${report}_krona.html
kreport2mpa.py --display-header --no-intermediate-ranks -r ${report}_report.txt -o ${report}_mpa.txt
done
combine_mpa.py -i *_report_mpa.txt -o combined_mpa.tmp
Traceback (most recent call last):
File "/opt/KrakenTools/combine_mpa.py", line 142, in <module>
main()
File "/opt/KrakenTools/combine_mpa.py", line 83, in main
[classification, val] = line.strip().split('\t')
ValueError: too many values to unpack
This is my bash script for running the thing (I tried to sort my files by sort -V in bash to replace the header by names or the sample input order manually without the --display-header, commands but the output order seems to be different anyway)
dir="."
for report in $(find $dir -maxdepth 1 -name "*_report.txt"|sort -V);do
report=${report//_report.txt/};
#ktImportTaxonomy -m 3 -t 5 ${report}_report.txt -o ${report}_krona.html
kreport2mpa.py --no-intermediate-ranks -r ${report}_report.txt -o ${report}_mpa.txt
done
file_list=$(echo *_report.txt|tr " " "\n" | sort -V |tr "\n" " " )
echo *_report.txt | tr " " "\n" | cut -d "_" -f1 | sort -V > sample.header
echo "#Classification" > tmp.tmp
cat tmp.tmp sample.header | tr "\n" "\t" > table.header
combine_mpa.py -i $file_list -o combined_mpa.tmp
sed 1d combined_mpa.txt > table.tmp
cat table.header table.tmp > combined_mpa.txt
rm *.header
rm *.tmp
Traceback (most recent call last):
File "/home/davidmartins/miniconda3/envs/statistics/bin/kreport2mpa.py", line 173, in
main()
File "/home/davidmartins/miniconda3/envs/statistics/bin/kreport2mpa.py", line 128, in main
report_vals = process_kraken_report(line)
File "/home/davidmartins/miniconda3/envs/statistics/bin/kreport2mpa.py", line 71, in process_kraken_report
int(split_str[1])
IndexError: list index out of range
I'm trying to convert a KREPORTS file to mpa format but it keeps getting this error message. Does anyone knows how to solve this?
Hi
Thanks for a great tool. Great to be able to process the output files of kraken without knowing all of the TAXIDs and how they are linked together.
Have you considered making extract_kraken_reads support multi-threading to run faster on computers with multiple CPUs? It seems that it uses only one CPU.
I naively just split my fastq files to run it through with parallel but apparently loading the database once for each parallel process quickly maxed out the ´1 TB RAM in the computer I was using.
Best regards
Rasmus
The default mode of kreport2krona.py is not "--intermediate-ranks" as indicated, by default it only outputs standard ranks.
When outputting std ranks only it seems that the non-standard ranks are simple erased, which results in wrong krona report.
For example when I have something like this in the kraken2 report
1.96 74140 0 S 4932 Saccharomyces cerevisiae
1.96 74140 74140 S1 559292 Saccharomyces cerevisiae S288C
Then kreport2krona.py will compute cerevisiae = 0 and it will therefore not show in the graph, whereas I expected the non standard rank S1 to be included in its parent node S.
Hi all,
Is there any way to convert Kraken2 output into a format that can be loaded into the Megan program (version 6)? I'm new using Kraken2, and Megan, so I apologize in advance if this is a nonsensical question.
Thanks,
Command used:
python3 ~/bin/KrakenTools/extract_kraken_reads.py --kraken sd2897.nanopore.kraken2.output --report sd2897.nanopore.kraken2.report -s sd2897.nanopore.cut.fastq.gz -o sd2897.nanopore.cut.cleaned.fastq --fastq-output --taxid 186826 --include-children --exclude
Output:
extract_kraken_reads.py: error: the following arguments are required: -k
Consider either updating docs or script behavior
Hi
Thanks for a great tool. In order to got relative abundance, i used '--percentages' in ‘kreport2mpa.py . I'm only going to have two decimal places, and I want to get up to 10 decimal places, so what do I do in next?
many thanks
Hello,
I am getting the below error. I did try the script on two datasets with the same result. Could you please help me troubleshoot this? I also copied my submission script below the error messages.
Traceback (most recent call last):
File "/home/mthoemmes/kreport2mpa.py", line 188, in
main()
File "/home/mthoemmes/kreport2mpa.py", line 143, in main
report_vals = process_kraken_report(line)
File "/home/mthoemmes/kreport2mpa.py", line 80, in process_kraken_report
taxid = int(l_vals[-3])
NameError: name 'l_vals' is not defined. Did you mean: 'locals'?
for i in *_report_species.txt
do
filename=$(basename "$i")
fname="${filename%_report_species.txt}"
python /home/username/kreport2mpa.py -r
done
Kindly,
Megan
Hi, I am trying to use the extract read tool to select out some reads but I am getting this error "ValueError: invalid literal for int() with base 10: 'taxonomy_lvl " . Do you have any suggestions on how to fix . Thanks in advance.
Hello,
Thanks for your powerful tools. I meet a problem,when i run the script:
/data/Timmy/Bracken/kreport2mpa.py -r ./OUT/TEST.S.bracken -o ./OUT/TEST1.report
Traceback (most recent call last):
File "/data/Timmy/Bracken/kreport2mpa.py", line 173, in
main()
File "/data/Timmy/Bracken/kreport2mpa.py", line 128, in main
report_vals = process_kraken_report(line)
File "/data/Timmy/Bracken/kreport2mpa.py", line 74, in process_kraken_report
percents = float(split_str[0])
ValueError: could not convert string to float: Neorhizobium sp. NCHU2750
How can i fix it? Thanks for your help.
Hi
Will there be a tool that will be made to compare between two or more kraken results such that user can extract whats unique/common etc?
Cheers
Amali
Hello,
PROGRAM START TIME: 11-26-2019 02:26:33
STEP 0: PARSING REPORT FILE food.virome/kraken2.output/nt.output/BVD2.report
Traceback (most recent call last):
File "KrakenTools-master/extract_kraken_reads.py", line 388, in
main()
File "KrakenTools-master/extract_kraken_reads.py", line 192, in main
report_vals = process_kraken_report(line)
File "KrakenTools-master/extract_kraken_reads.py", line 110, in process_kraken_report
int(split_str[1])
NameError: name 'split_str' is not defined
=========================================================================
Please, give me some hint to resolve this problem.
Thank you.
Min-Soo Kim
Hello,
I am trying to create "decontaminated" fastaq files using Kraken2 and KrakenTools.
After I created the DB and I classified my reads using kraken2, I am trying to use KrakenTools. My problem is I cannot finish my job: KrakenTools is using a huge amount of memory (+125GB) even for small/medium datasets to be decontaminated (~50 millions reads each for paired end)...any idea about how to avoid this issue?
Thanks in advance.
Dear Jennifer,
I would like to use Kraken2, Bracken, and KrakenTools (e.g. kreport2mpa.py) to profile my metagenomes based on the UHGG database (https://www.nature.com/articles/s41587-020-0603-3). This database used the U, R, or R1-R7, rather than U, R, D, K, P, C, O, F, G or S, so I guess KrakenTools are not compatible with this format. Therefore, may I ask whether you have a plan to update the KrakenTools? Or could you tell me how I could revise the code to make it compatible?
Many thanks! Really appreciate your help!
With best regards,
Nathan
Hi,
when I use combine_kreports.py, I found the total reads and level reads always are the same on each sample.
It is right?
Thanks,
Jie
Dear Jen
I want to extract the viral reads with --include-children or exclude viral reads with --include children and obtain only the reads for further assembly. How can I get the fastq files rather than fasta files.
Thanks
Traceback (most recent call last):
File "/home/zlinbz/miniconda3/envs/bracken/bin/KrakenTools-master/kreport2mpa.py", line 188, in
main()
File "/home/zlinbz/miniconda3/envs/bracken/bin/KrakenTools-master/kreport2mpa.py", line 143, in main
report_vals = process_kraken_report(line)
File "/home/zlinbz/miniconda3/envs/bracken/bin/KrakenTools-master/kreport2mpa.py", line 80, in process_kraken_report
taxid = int(l_vals[-3])
NameError: name 'l_vals' is not defined. Did you mean: 'locals'?
Is there anyone who met this problem and solved it?
Hi
I'm trying to extract all bacterial reads from a paired-end kraken analysis but I am getting an error when the script tries to parse the kraken.report.txt. I'm running under most recent version of Biopython and have just updated to your latest script - the error I'm getting is:
PROGRAM START TIME: 02-06-2020 17:27:02
STEP 0: PARSING REPORT FILE //data/strepgen/JAM_EMBER_kraken/kraken_bracken_reports/EMB2.report.txt
Traceback (most recent call last):
File "extract_kraken_reads.py", line 395, in
main()
File "extract_kraken_reads.py", line 206, in main
while level_num != (prev_node.level_num + 1):
AttributeError: 'int' object has no attribute 'level_num'
Sorry my python is pretty poor so I can't fathom if it is a script problem or a problem in my report file - any thoughts?
Thanks
Rich
Hello everybody! I wanted to notify some problem with kreport2krona.py:
I have this report for a very simple in-silico metagenomic sample:
21.38 10203 10203 U 0 unclassified
78.62 37509 0 R 1 root
58.09 27715 0 R1 131567 cellular organisms
36.91 17612 0 D 2759 Eukaryota
36.91 17612 0 D1 33154 Opisthokonta
20.90 9974 0 K 4751 Fungi
20.90 9974 0 K1 451864 Dikarya
20.90 9974 0 P 4890 Ascomycota
20.90 9974 0 P1 716545 saccharomyceta
20.90 9974 0 P2 147537 Saccharomycotina
20.90 9974 0 C 4891 Saccharomycetes
20.90 9974 0 O 4892 Saccharomycetales
20.90 9974 0 F 4893 Saccharomycetaceae
20.90 9974 0 G 4930 Saccharomyces
20.90 9974 0 S 4932 Saccharomyces cerevisiae
20.90 9974 9974 S1 559292 Saccharomyces cerevisiae S288C
16.01 7638 0 K 33208 Metazoa
16.01 7638 0 K1 6072 Eumetazoa
16.01 7638 0 K2 33213 Bilateria
16.01 7638 0 K3 33511 Deuterostomia
16.01 7638 0 P 7711 Chordata
16.01 7638 0 P1 89593 Craniata
16.01 7638 0 P2 7742 Vertebrata
16.01 7638 0 P3 7776 Gnathostomata
16.01 7638 0 P4 117570 Teleostomi
16.01 7638 0 P5 117571 Euteleostomi
16.01 7638 0 P6 8287 Sarcopterygii
16.01 7638 0 P7 1338369 Dipnotetrapodomorpha
16.01 7638 0 P8 32523 Tetrapoda
16.01 7638 0 P9 32524 Amniota
16.01 7638 0 C 40674 Mammalia
16.01 7638 0 C1 32525 Theria
16.01 7638 0 C2 9347 Eutheria
16.01 7638 0 C3 1437010 Boreoeutheria
16.01 7638 0 C4 314146 Euarchontoglires
16.01 7638 0 O 9443 Primates
16.01 7638 0 O1 376913 Haplorrhini
16.01 7638 0 O2 314293 Simiiformes
16.01 7638 0 O3 9526 Catarrhini
16.01 7638 0 O4 314295 Hominoidea
16.01 7638 0 F 9604 Hominidae
16.01 7638 0 F1 207598 Homininae
16.01 7638 0 G 9605 Homo
16.01 7638 7638 S 9606 Homo sapiens
21.17 10103 0 D 2 Bacteria
21.17 10103 0 P 1224 Proteobacteria
21.17 10103 0 C 1236 Gammaproteobacteria
21.17 10103 0 O 72274 Pseudomonadales
21.17 10103 0 F 135621 Pseudomonadaceae
21.17 10103 0 G 286 Pseudomonas
21.17 10103 0 G1 136841 Pseudomonas aeruginosa group
21.17 10103 0 S 287 Pseudomonas aeruginosa
21.17 10103 10103 S1 208964 Pseudomonas aeruginosa PAO1
20.53 9794 0 D 10239 Viruses
20.53 9794 0 D1 35237 dsDNA viruses, no RNA stage
20.53 9794 0 O 548681 Herpesvirales
20.53 9794 0 F 10292 Herpesviridae
20.53 9794 0 F1 10293 Alphaherpesvirinae
20.53 9794 0 G 10294 Simplexvirus
20.53 9794 9794 S 10298 Human alphaherpesvirus 1
When I use kreport2krona, the krona file created is the following:
10203 Unclassified
0 k__Eukaryota
0 k__Eukaryota p__Ascomycota
0 k__Eukaryota p__Ascomycota c__Saccharomycetes
0 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales
0 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales f__Saccharomycetaceae
0 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales f__Saccharomycetaceae g__Saccharomyces
9974 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales f__Saccharomycetaceae g__Saccharomyces s__Saccharomyces_cerevisiae
0 k__Eukaryota p__Chordata
0 k__Eukaryota p__Chordata c__Mammalia
0 k__Eukaryota p__Chordata c__Mammalia o__Primates
0 k__Eukaryota p__Chordata c__Mammalia o__Primates f__Hominidae
0 k__Eukaryota p__Chordata c__Mammalia o__Primates f__Hominidae g__Homo
7638 k__Eukaryota p__Chordata c__Mammalia o__Primates f__Hominidae g__Homo s__Homo_sapiens
0 k__Bacteria
0 k__Bacteria p__Proteobacteria
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae g__Pseudomonas
10103 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae g__Pseudomonas s__Pseudomonas_aeruginosa
0 k__Viruses
0 k__Viruses o__Herpesvirales
0 k__Viruses o__Herpesvirales f__Herpesviridae
0 k__Viruses o__Herpesvirales f__Herpesviridae g__Simplexvirus
(here you have both of the files)
Simple_sample.zip
If you compare both files, you'll notice that the last line in the report, corresponding to Human alphaherpesvirus 1, is not present in the krona file (therefore, not in the html created by krona). I've tried to fix this on my own, but no success so far. Maybe you can help me out here? Thank you for your attention!
When I run make_kreport.py, it generates the following error (KeyError: 'unclassified (taxid 0)'). Does anyone know what causes it?
PROGRAM START TIME: 09-05-2020 13:40:52
STEP 1/4: Reading taxonomy kraken2_standard_09042020/mydb_taxonomy.txt...
722 nodes saved
STEP 2/4: Reading kraken file 2105F.kraken2.output.txt...
36.872 million reads processed
STEP 3/4: Creating final tree...
Traceback (most recent call last):
File "/opt/anaconda3/envs/kraken2/bin/make_kreport.py", line 198, in
main()
File "/opt/anaconda3/envs/kraken2/bin/make_kreport.py", line 145, in main
p_node = taxid2node[curr_tid].parent
KeyError: 'unclassified (taxid 0)'
Hi
Your script will be very useful for me if you could please help me with the following problem:
I have 206 samples generated through kraken2 (kreport) with --mpa style. When I try to combine them with your script, it gives me the following error:
Here is my script:
combine_mpa.py -i file1 file2 file3 filen -o output.txt
and here is the error:
Number of files to parse: 6 Traceback (most recent call last): File "../KrakenTools/combine_mpa.py", line 142, in <module> main() File "../KrakenTools/combine_mpa.py", line 83, in main [classification, val] = line.strip().split('\t') ValueError: need more than 1 value to unpack
Interestingly, if I combine 2 or 3 files with the same script, it works perfectly fine.
Please let me know what changes I should make to make it work? Many thanks in advance.
Thank you very much for your powerful tools!
I ran into the following error while running the make_kreport.py script.
python make_kreport.py -i P1_S7_L001_R_kraken2.txt -t nt_ktaxonomy -o P1
PROGRAM START TIME: 08-31-2021 16:43:17
STEP 1/4: Reading taxonomy nt_ktaxonomy...
2083898 nodes saved
STEP 2/4: Reading kraken file P1_S7_L001_R_kraken2.txt...
2.084 million reads processed
STEP 3/4: Creating final tree...
Traceback (most recent call last):
File "/home/microbiology/KrakenTools/make_kreport.py", line 199, in
main()
File "/home/microbiology/KrakenTools/make_kreport.py", line 145, in main
p_node = taxid2node[curr_tid].parent
KeyError: '3045'
I ran Kraken2 with the following basic script against the full nt database.
kraken2 --db $kraken2_db P1_S7_L001_R1_kneaddata.fastq --report P1_S7_L001_R_kraken2.txt --report-zero-counts
Thank you very much for your time and help.
Sincerely,
David Bradshaw
Hi,
combine_mpa.py was very useful when I combine the mpa reports without header.
But when I chose to add the header to the mpa report(--display-header), an error was reported (as below). How should I solve it?
Traceback (most recent call last):
File "combine_mpa.py", line 142, in
main()
File "combine_mpa.py", line 83, in main
[classification, val] = line.strip().split('\t')
ValueError: not enough values to unpack (expected 2, got 1)
To obtain the krona visualization of Bracken reports, I run the kreport2krona.py using the file bracken_kreport (percentage of reads, Total number of reads, etc). However, I noted that the created output file does not present the majority of reads assigned in the bracken_Kreport. For example, in the bracken_kreport,
a vast amount of reads assigned of genus Ligilactobacillus and the created file the number is 0. Anyone, could you help me to solve this?
100.00 7095261 0 R 1 root
100.00 7095139 0 R1 131567 cellular organisms
99.69 7073316 0 D 2 Bacteria
88.16 6255049 0 D1 1783272 Terrabacteria group
86.95 6169667 0 P 1239 Firmicutes
86.60 6144312 0 C 91061 Bacilli
86.43 6132776 0 O 186826 Lactobacillales
86.04 6104444 0 F 33958 Lactobacillaceae
83.89 5952280 0 G 2767887 Ligilactobacillus
1.53 108246 0 G 2742598 Limosilactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Ligilactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Limosilactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Lactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Leuconostoc
Hi,
I use extract_kraken_reads.py to separate reads depending on the taxonomy ID. My input files are fastq paired read files and I was hoping that the extracted files would also be fastq files, but it looks like there are fasta files. Is there an option to get fastq files as an output?
Thanks!
Hi Jennifer,
i meet an error report in the <make_kreport.py>, i wonder why it happens.
Here is my Log:
`
-bash-4.2$ python ~/yanren/app/KrakenTools-master/make_kreport.py -i MT1.krak2 -t ~/MY_KRAKEN2_DATABASE/TAXONOMY_MAKE.txt -o report
PROGRAM START TIME: 12-19-2020 12:49:03
STEP 1/4: Reading taxonomy /lustre/quanzx/MY_KRAKEN2_DATABASE/TAXONOMY_MAKE.txt...
30392 nodes saved
STEP 2/4: Reading kraken file MT1.krak2...
204.704 million reads processed
STEP 3/4: Creating final tree...
Traceback (most recent call last):
File "/lustre/quanzx/yanren/app/KrakenTools-master/make_kreport.py", line 198, in
main()
File "/lustre/quanzx/yanren/app/KrakenTools-master/make_kreport.py", line 145, in main
p_node = taxid2node[curr_tid].parent
KeyError: 'Yuavirus (taxid 1299429)'
`
pip3 install kraken-tools
desirable
https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56
Hello,
I have installed krakentools using anaconda3. When I launch extract_kraken_reads.py I am getting the error:
ERROR: --report not specified.(krakentools_env)
How can I fix this?
Best regards,
Giacomo
I've just downloaded the code and run extract_kraken_reads.py this way:
./KrakenTools-master/extract_kraken_reads.py -k output.kraken -s R1.fastq -s2 R2.fastq -o R1_extracted.fastq -o2 R2_extracted.fastq -t 2 --exclude --report report.txt --fastq-output
but I got this error:
PROGRAM START TIME: 11-19-2021 08:54:26
1 taxonomy IDs to parse
>> STEP 1: PARSING KRAKEN FILE FOR READIDS AZTI526.kraken
0.00 million reads processed
0 read IDs saved
>> STEP 2: READING SEQUENCE FILES AND WRITING READS
0 read IDs found (0.00 mill reads processed)
0 read IDs found (0.00 mill reads processed)
0 reads printed to file
Generated file: AZTI526_R1_extracted_kraken.fastq
Generated file: AZTI526_R2_extracted_kraken.fastq
Why no read was processed?
Hi,
Good day.
I would like to check if I can run 'combine_kreports.py' on my Bracken reports? I have around 650 Bracken reports. It shows below errors when I tried to run it using the following command.
combine_kreports.py -r *.bracken -o bracken_phylum_all.report
Could you please advise?
Thanks.
Regards,
Soo Ching
STEP 1: READING REPORTS
1/4 samples processedTraceback (most recent call last):
File "/nethome/lees51/.conda/envs/shotgun_ana3_py3.7.0/bin/combine_kreports.py", line 311, in
main()
File "/nethome/lees51/.conda/envs/shotgun_ana3_py3.7.0/bin/combine_kreports.py", line 203, in main
report_vals = process_kraken_report(line)
File "/nethome/lees51/.conda/envs/shotgun_ana3_py3.7.0/bin/combine_kreports.py", line 120, in process_kraken_report
level_reads = int(split_str[2])
ValueError: invalid literal for int() with base 10: 'P'
Dear all
It does not appear to be able to submit multiple species-level taxids for the extract_kraken_reads command, while having both the --exclude and --include-children options to remove the indicated taxids, as well as its strain-level offsprings. Is there a get around to this if I want to remove reads of multiple species and all the strains from the kraken output?
Thanks
Marcus
Thanks for this really useful tool.
I am running extract_kraken_reads.py in slurm sbatch jobs and the output printed to screen is leading to >10Gb job log files, with files containing: 0 reads processed^M 0.01 million reads processed^M 0.02 million reads processed^M ...and so on for my ~180x10^6 reads.
Is it possible to include an option that reduces the verbosity of the output? Possibly one that still outputs some useful information on major stages within the script?
Hi,
Thanks for developing this scripts to post-process Kraken result. Can I use it to process the KrakenUniq report?
I try your script to combine multi-report. It reports the above error. Can you help to figure out what 's wrong with my running?
perl ../KrakenTools/combine_kreports.py -r SLX_14_S5.fastp.clean.report SLX_15_S40.fastp.clean.report -o ../test.out.report
STEP 1: READING REPORTS
1/2 samples processedTraceback (most recent call last):
File "../KrakenTools/combine_kreports.py", line 309, in
main()
File "../KrakenTools/combine_kreports.py", line 201, in main
report_vals = process_kraken_report(line)
File "../KrakenTools/combine_kreports.py", line 113, in process_kraken_report
int(split_str[1])
IndexError: list index out of range
Hi Thanks for the tool. I tried testing it today and it is giving me this error. I used kraken file from the kraken2 tool and paired end files which i used as an input. In step 1 it is able to find the reads ID assigned to the taxa. But in step 2 the reads ids are not being found in the fastq files.
Going down the same road as #11 - is there a plan to have this released as a versionized package on GitHub/Bioconda/PyPi?
I'm currently writing a MultiQC module and it would be extremely helpful to be able to a.) import this as a set of scripts to create appropriate tables and then b.) get these for visualization into MultiQC.
That would be much easier if you could (at a minimum) create an ideally versionized script release here on Github and/or provide a pip3 installable #11 so we can get this to Bioconda.
I would be happy to help of course if you need any help on this :-)
Hi @jenniferlu717 and everyone,
I'm looking for a way to combine kraken2 output/reports from different samples and convert it to metaphlan format, or (the other way around) to convert kraken2 output/reports from different samples to metaphlan format and combine them together. It's like using the script kraken-mpa-report of kraken1 on different kraken output files and combining them into metaphlan format.
Googling brought me to your script kreport2mpa.py, which converts a kraken report to metaphlan format. And then, I saw that you also have combine_kreports.py, which combines kraken reports from different samples. I wonder if I can combine those two scripts to achieve what I'd like to do, maybe by using combine_kreports.py and then kreport2mpa.py? Do you have any idea?
Thanks in advance for your input.
Cheers
Hi,
I am trying to use "extract_kraken_reads.py" script to extract the reads related to particular species. From kraken report it is shown that there are 465 reads related to the species I am interested in.
I applied following command but the
python extract_kraken_reads.py -k 107C_out.kraken -s 107C_seq.file -o extract_107C.fastq -t 813 &
but eventually, no read ID is found . Here is my output;
PROGRAM START TIME: 08-07-2020 01:38:50
1 taxonomy IDs to parse
STEP 1: PARSING KRAKEN FILE FOR READIDS 107C_out.kraken
4.27 million reads processed
465 read IDs saved
STEP 2: READING SEQUENCE FILES AND WRITING READS
0 read IDs found (4.27 mill reads processed)
0 reads printed to file
Generated file: extract_107C.fastq
PROGRAM END TIME: 08-07-2020 01:40:44
[1]+ Done python extract_kraken_reads.py -k 107C_out.kraken -s 107C_seq.file -o extract_107C.fastq -t 813
Any comments would be appreciated.
I am trying to extract the reads from a single taxid at that level (no parents or children). The hierarchy looks like:
0.01 1898 0 P 2732408 Pisuviricota
0.01 1898 0 C 2732506 Pisoniviricetes
0.01 1898 0 O 76804 Nidovirales
0.01 1898 0 O1 2499399 Cornidovirineae
0.01 1898 0 F 11118 Coronaviridae
0.01 1898 0 F1 2501931 Orthocoronavirinae
0.01 1898 0 G 694002 Betacoronavirus
0.01 1898 12 G1 2509481 Embecovirus
0.01 1886 212 S 694003 Betacoronavirus 1
0.01 1539 1539 S1 31631 Human coronavirus OC43
0.00 135 135 S1 11128 Bovine coronavirus <--- I WANT THIS ONE
I would expect that there are 135 reads associated with taxid 11128. That's how many are in the read assignment report too:
17:29 cn0896 classify$ grep -P '\s+11128\s+' VRNA_015.txt | wc -l
135
However, when I try to extract those reads I get:
17:29 cn0896 classify$ /data/Segrelab/bwbin/KrakenTools/extract_kraken_reads.py -k VRNA_015.txt --fastq-output -s1 ../VRNA_015.qc.nohuman.fastq.gz -o VRNA_015.only_11128.fastq -t 11128
PROGRAM START TIME: 06-28-2021 21:31:38
1 taxonomy IDs to parse
>> STEP 1: PARSING KRAKEN FILE FOR READIDS VRNA_015.txt
16.09 million reads processed
102 read IDs saved
>> STEP 2: READING SEQUENCE FILES AND WRITING READS
102 read IDs found (7.04 mill reads processed)
102 reads printed to file
Generated file: VRNA_015.only_11128.fastq
PROGRAM END TIME: 06-28-2021 21:35:42
17:35 cn0896 classify$ grep -c '^' VRNA_015.only_11128.fastq | div4.pl
102
Any idea why I'm getting 102 instead of 135? I get the same error when extracting reads from inner nodes as well but I thought this was a simpler example. I don't see an obvious version flag but I cloned the repo in June of this year.
Dear Dev. & co,
Thanks so much for Bracken and the seemingly endless support provided for Kraken_, - not least this repo.
There are so many tools here that I really, really wish I'd known about when I was working K2+B into my workflows, not least the diversity tools that are somewhat hidden even within the KrakenTools repo (i.e. not mentioned at all in KT's README.md
).
The Kraken/Bracken suite of tools, and the community in general, would benefit hugely from having a (e.g.) "see also" header at the top of each README.md
that notified users e.g. "KrakenTools exists, with the following functionalities
". It might even cut down on the cross-posting of issues...
all the best,
H
Got the following error message:
ERROR: sequence file must be FASTA or FASTQ
It seems to me that the mode argument in gzip.open()
should be rt
, instead of r
.
Hi,
I tried with several different taxa to extract reads from a kraken file - first, the number of extracted reads does not match (e.g. 11 reads assigned to species in kraken file, but only 3 are extracted), and second, if I blast these sequences, they are in almost all cases nowhere near what kraken assigned them to (with a complete nt-kraken-database), like I sometimes get bacteria or plants, when it should be a mammal (for very few taxa though the blast result matches the kraken assignment).
I used
krakentools/extract_kraken_reads.py -k sample1.kraken -s sample1.fastq -t 37349 -o test.fasta
the process looks like this
PROGRAM START TIME: 08-18-2020 09:42:54
1 taxonomy IDs to parse
STEP 1: PARSING KRAKEN FILE FOR READIDS t1_trim.kraken
0.61 million reads processed
9 read IDs saved
STEP 2: READING SEQUENCE FILES AND WRITING READS
4 read IDs found (0.61 mill reads processed)
0 reads printed to file
Generated file: extr_trest.fasta
and the output file contains only 3 sequences of the 9 directly assigned reads, which when blasted are nothing close to the assigned taxon...
I am very confused about these results and any advice would be greatly appreciated.
Hi,
I filtered human tax using filter_bracken_out.py
, as follow:
python KrakenTools/filter_bracken.out.py -i Test.S.bracken -o Test.S.bracken.filter --exclude 9606
But what can I do using this filtered file?I want transform this Bracken-style file to Kraken report format file, and use kreport2mpa.py
to get absolute and relative abundance of each level. How can I achieve this? How can I convert the format?
Anybody know how to do that?
Thanks very mush!
Best wishes,
Myshu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.