jenniferlu717 / krakentools Goto Github PK

View Code? Open in Web Editor NEW

293.0 12.0 84.0 146 KB

KrakenTools provides individual scripts to analyze Kraken/Kraken2/Bracken/KrakenUniq output files

License: GNU General Public License v3.0

Python 100.00%

krakentools's People

Contributors

Stargazers

Watchers

Forkers

hadrieng biofuture jiaquanterry ditag rajaldebnath metaqiime pvanheus pythseq skerker maolingfengzju svpipaliya gracierichards vikash84 zmunro qazwsx1995 yilmazbah sdy2813 pipeuser jvhagey cmorganl alienzj magcurly pgcudahy martin-steinegger tomdeman-bio a-yanez genostack lananhle twelvesummer jianghexiliu tw7649116 timmywang1 lexinwei lvelosuarez nicolas-fernandez lpj-pj mrkevindc chengyao-peng hyperion1230 lishuangshuang0616 wook2014 kumereng samkam09 ssyamoako sharkinggu cj99125 jtclaypool dorbarker hannah1746 weibokong27 jalarke damioresegun andreott duqiyao zjyzjjzmt animesh bryzav oliverdrechsel naobservatory yedilserzhan nickp60 choiji-hye galsang-git schorlton hirosatosd afu5956 luozhy88 guokai8 liuchen92 recepcanaltinbag jasonarothman juliomateoslangerak zhangxiaodong8315 zhangwenda0518 jmboccacino ac-simpson chenziyi alema91 yemilawal missthepast feresa-science randolium ybdong919 zhixuanyan

krakentools's Issues

behaviour of extract_kraken_reads

Hi,

If I were to provide a genus ID, would using --exclude --include-children --include-parent manage to remove:

the read from the phyla
all its children,
and (where I'm less sure) : all the reads from the parent taxid

We need the counts from, for example, the parent Kingdom to be readjusted by the same number of reads being removed (but not for the entire kingdom to be removed, obviously).

Also, does the script adjust every kraken output file, or does it only adjust the classified reads ? In other words, do I need to regenerate a kraken report afterwards using make_kreport.py?

By the way, thanks a lot for this tool, it's definitely an indispensable complement to Kraken, it has solved so many roadblocks we faced!

Construct a phylogenetic tree from Kraken2 in order to use in phyloseq object

Hi,
It's my first time analysing metagenomics from a shotgun experiment but before to post this issue I tried some strategies to overcome the topic described below.

I would like to construct a phyloseq object from Kraken2 + Bracken output but containing a phylogenetic tree in the phyloseq object in order to be able to calculate Unifrac distances in downstream analysis.

Could you recommend me some software to create a phylogenetic tree from Kraken2 output?

Thanks on advance for your help/hints,

Magí.

extract kraken read _empty fastq file

HI ,
I am trying to extract read from kraken file , the fastq file generated seems empty? I am sure with taxID is in Kraken file but not sure where is the mistake ?

)

Thanks for your help (:

Add biopython dependency to istallation instructions

Installation instruction states that scripts can be used straightforward, however, after clean install, there is an error:

Traceback (most recent call last):
  File "/usr/local/bin/extract_kraken_reads.py", line 55, in <module>
    from Bio import SeqIO
ModuleNotFoundError: No module named 'Bio'

Unable to download the kraken database?

hi,
I am trying to download the kraken database
using this command
kraken-build --standard --threads 24 --db standard
but I got an error, can you please help me
Error Message
rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/762/265/GCF_000762265.1_ASM76226v1

local variable 'prev_node' referenced before assignment

Hi,
I was trying to extract the reads by the script but its throwing me this error all the time. I tried to figure out the bug but didn't got. Please have a look. The kraken2 was run on the reads

cmd line: python3 extract_kraken_reads.py -k output.txt -s1 R1.fastq.gz -s2 R2.fastq.gz -t 1 -o R1_c_1.fastq -o2 R2_c_2.fastq -r report.txt --include-children
STEP: Parsing report file report.txt
Traceback (most recent call last):
File "extract_kraken_reads.py", line 395, in
main()
File "extract_kraken_reads.py", line 206, in main
while level_num != (prev_node.level_num + 1):
UnboundLocalError: local variable 'prev_node' referenced before assignment

Metaphlan3 integration for kreport2mpa

Hi!
I'm trying to use the output of kreport2mpa for humann3 however it needs the mpa file to be in metaphlan3 output style. Is there anyway to implement this in kreport2mpa?

Edit: I've had a closer look at the output formats and it seems to only be a thing of getting the additional columns (NCBI_taxid and additional species) and adding them in. From what I can see, it would be difficult to just manually lift them using a pattern matching process when intermediate levels are omitted. I think it will need to be done at the line stripping level, although I'm unsure how your code deals with it exactly. I will try to fenegle the code and will let you know if it works.

I imagine it will be much quicker on your end though, so please do help if you think this is even possible at all

Thanks!

problems of sample names for combine_mpa.py

Hi,
Thank you for your script, your script has been extremely helpful. However I was trying to use kreport2mpa.py with --display-header for 63 samples, combine_mpa.py, which is said to be able to use the header automatically in the manual reported errors:
Number of files to parse: 63

dir="."
for report in $(find $dir -maxdepth 1 -name "*_report.txt"|sort -V);do
report=${report//_report.txt/};
#ktImportTaxonomy -m 3 -t 5 ${report}_report.txt -o ${report}_krona.html
kreport2mpa.py --display-header --no-intermediate-ranks -r ${report}_report.txt -o ${report}_mpa.txt
done
combine_mpa.py -i *_report_mpa.txt -o combined_mpa.tmp

Traceback (most recent call last):
  File "/opt/KrakenTools/combine_mpa.py", line 142, in <module>
    main()
  File "/opt/KrakenTools/combine_mpa.py", line 83, in main
    [classification, val] = line.strip().split('\t')
ValueError: too many values to unpack

This is my bash script for running the thing (I tried to sort my files by sort -V in bash to replace the header by names or the sample input order manually without the --display-header, commands but the output order seems to be different anyway)

dir="."
for report in $(find $dir -maxdepth 1 -name "*_report.txt"|sort -V);do
    report=${report//_report.txt/};
    #ktImportTaxonomy -m 3 -t 5 ${report}_report.txt -o ${report}_krona.html
    kreport2mpa.py --no-intermediate-ranks -r ${report}_report.txt -o ${report}_mpa.txt
done

file_list=$(echo *_report.txt|tr " " "\n" | sort -V |tr "\n" " " )
echo *_report.txt | tr " " "\n" | cut -d "_" -f1 | sort -V > sample.header
echo "#Classification" > tmp.tmp
cat tmp.tmp sample.header | tr "\n" "\t" > table.header
combine_mpa.py -i $file_list -o combined_mpa.tmp
sed 1d combined_mpa.txt > table.tmp
cat table.header table.tmp > combined_mpa.txt
rm *.header
rm *.tmp

Error running kreports2mpa.py

Traceback (most recent call last):
File "/home/davidmartins/miniconda3/envs/statistics/bin/kreport2mpa.py", line 173, in
main()
File "/home/davidmartins/miniconda3/envs/statistics/bin/kreport2mpa.py", line 128, in main
report_vals = process_kraken_report(line)
File "/home/davidmartins/miniconda3/envs/statistics/bin/kreport2mpa.py", line 71, in process_kraken_report
int(split_str[1])
IndexError: list index out of range

I'm trying to convert a KREPORTS file to mpa format but it keeps getting this error message. Does anyone knows how to solve this?

extract_kraken_reads to support multithreading?

Thanks for a great tool. Great to be able to process the output files of kraken without knowing all of the TAXIDs and how they are linked together.

Have you considered making extract_kraken_reads support multi-threading to run faster on computers with multiple CPUs? It seems that it uses only one CPU.

I naively just split my fastq files to run it through with parallel but apparently loading the database once for each parallel process quickly maxed out the ´1 TB RAM in the computer I was using.

Best regards
Rasmus

Matching many tax ids

kreport2krona.py not working properly in default mode

The default mode of kreport2krona.py is not "--intermediate-ranks" as indicated, by default it only outputs standard ranks.
When outputting std ranks only it seems that the non-standard ranks are simple erased, which results in wrong krona report.
For example when I have something like this in the kraken2 report

1.96	74140	0	S	4932	                          Saccharomyces cerevisiae
 1.96	74140	74140	S1	559292	                            Saccharomyces cerevisiae S288C

Then kreport2krona.py will compute cerevisiae = 0 and it will therefore not show in the graph, whereas I expected the non standard rank S1 to be included in its parent node S.

Convert Kraken output to Megan

Hi all,

Is there any way to convert Kraken2 output into a format that can be loaded into the Megan program (version 6)? I'm new using Kraken2, and Megan, so I apologize in advance if this is a nonsensical question.

Thanks,

Long parameter name --kraken was not understood by script

Command used:

python3 ~/bin/KrakenTools/extract_kraken_reads.py --kraken sd2897.nanopore.kraken2.output --report sd2897.nanopore.kraken2.report -s sd2897.nanopore.cut.fastq.gz -o sd2897.nanopore.cut.cleaned.fastq --fastq-output --taxid 186826 --include-children --exclude

Output:
extract_kraken_reads.py: error: the following arguments are required: -k

Consider either updating docs or script behavior

kreport2mpa.py ----percentages

Hi
Thanks for a great tool. In order to got relative abundance, i used '--percentages' in ‘kreport2mpa.py . I'm only going to have two decimal places, and I want to get up to 10 decimal places, so what do I do in next?

many thanks

error running kreport2mpa.py script

Hello,

I am getting the below error. I did try the script on two datasets with the same result. Could you please help me troubleshoot this? I also copied my submission script below the error messages.

error output

Traceback (most recent call last):
File "/home/mthoemmes/kreport2mpa.py", line 188, in
main()
File "/home/mthoemmes/kreport2mpa.py", line 143, in main
report_vals = process_kraken_report(line)
File "/home/mthoemmes/kreport2mpa.py", line 80, in process_kraken_report
taxid = int(l_vals[-3])
NameError: name 'l_vals' is not defined. Did you mean: 'locals'?

submission script

for i in *_report_species.txt

do
filename=$(basename "$i")
fname="${filename%_report_species.txt}"
python /home/username/kreport2mpa.py -r $i -o ${fname}_mpa.txt --display-header

done

Kindly,

Megan

extract reads error

Hi, I am trying to use the extract read tool to select out some reads but I am getting this error "ValueError: invalid literal for int() with base 10: 'taxonomy_lvl " . Do you have any suggestions on how to fix . Thanks in advance.

error in kreport2mpa.py

Hello，
Thanks for your powerful tools. I meet a problem,when i run the script:
/data/Timmy/Bracken/kreport2mpa.py -r ./OUT/TEST.S.bracken -o ./OUT/TEST1.report

Traceback (most recent call last):
File "/data/Timmy/Bracken/kreport2mpa.py", line 173, in
main()
File "/data/Timmy/Bracken/kreport2mpa.py", line 128, in main
report_vals = process_kraken_report(line)
File "/data/Timmy/Bracken/kreport2mpa.py", line 74, in process_kraken_report
percents = float(split_str[0])
ValueError: could not convert string to float: Neorhizobium sp. NCHU2750

How can i fix it? Thanks for your help.

Compare two kraken outputs

Will there be a tool that will be made to compare between two or more kraken results such that user can extract whats unique/common etc?

Cheers
Amali

The error when running extract_kraken_reads.py

Hello,

I got an error from running the command "extract_kraken_reads.py" as below:

PROGRAM START TIME: 11-26-2019 02:26:33

STEP 0: PARSING REPORT FILE food.virome/kraken2.output/nt.output/BVD2.report
Traceback (most recent call last):
File "KrakenTools-master/extract_kraken_reads.py", line 388, in
main()
File "KrakenTools-master/extract_kraken_reads.py", line 192, in main
report_vals = process_kraken_report(line)
File "KrakenTools-master/extract_kraken_reads.py", line 110, in process_kraken_report
int(split_str[1])
NameError: name 'split_str' is not defined
=========================================================================
Please, give me some hint to resolve this problem.
Thank you.

Min-Soo Kim

KrakenTools using huge amounts of RAM

Hello,
I am trying to create "decontaminated" fastaq files using Kraken2 and KrakenTools.
After I created the DB and I classified my reads using kraken2, I am trying to use KrakenTools. My problem is I cannot finish my job: KrakenTools is using a huge amount of memory (+125GB) even for small/medium datasets to be decontaminated (~50 millions reads each for paired end)...any idea about how to avoid this issue?
Thanks in advance.

Could KrakenTools be used for outputs based on the UHGG database?

Dear Jennifer,

I would like to use Kraken2, Bracken, and KrakenTools (e.g. kreport2mpa.py) to profile my metagenomes based on the UHGG database (https://www.nature.com/articles/s41587-020-0603-3). This database used the U, R, or R1-R7, rather than U, R, D, K, P, C, O, F, G or S, so I guess KrakenTools are not compatible with this format. Therefore, may I ask whether you have a plan to update the KrakenTools? Or could you tell me how I could revise the code to make it compatible?

Many thanks! Really appreciate your help!

With best regards,
Nathan

combine_kreports.py always print same {sample}_all and {sample}_lvl

Hi,
when I use combine_kreports.py, I found the total reads and level reads always are the same on each sample.
It is right?

Thanks,
Jie

extracting reads with quality values

Dear Jen
I want to extract the viral reads with --include-children or exclude viral reads with --include children and obtain only the reads for further assembly. How can I get the fastq files rather than fasta files.

Thanks

error running kreport2mpa.py script, name 'l_vals' is not defined.

Traceback (most recent call last):
File "/home/zlinbz/miniconda3/envs/bracken/bin/KrakenTools-master/kreport2mpa.py", line 188, in
main()
File "/home/zlinbz/miniconda3/envs/bracken/bin/KrakenTools-master/kreport2mpa.py", line 143, in main
report_vals = process_kraken_report(line)
File "/home/zlinbz/miniconda3/envs/bracken/bin/KrakenTools-master/kreport2mpa.py", line 80, in process_kraken_report
taxid = int(l_vals[-3])
NameError: name 'l_vals' is not defined. Did you mean: 'locals'?

Is there anyone who met this problem and solved it?

error when using extract_kraken_reads.py

I'm trying to extract all bacterial reads from a paired-end kraken analysis but I am getting an error when the script tries to parse the kraken.report.txt. I'm running under most recent version of Biopython and have just updated to your latest script - the error I'm getting is:

PROGRAM START TIME: 02-06-2020 17:27:02

STEP 0: PARSING REPORT FILE //data/strepgen/JAM_EMBER_kraken/kraken_bracken_reports/EMB2.report.txt
Traceback (most recent call last):
File "extract_kraken_reads.py", line 395, in
main()
File "extract_kraken_reads.py", line 206, in main
while level_num != (prev_node.level_num + 1):
AttributeError: 'int' object has no attribute 'level_num'

Sorry my python is pretty poor so I can't fathom if it is a script problem or a problem in my report file - any thoughts?

Thanks

Rich

kreport2krona does not check last line in the report

Hello everybody! I wanted to notify some problem with kreport2krona.py:

I have this report for a very simple in-silico metagenomic sample:

21.38 10203 10203 U 0 unclassified
78.62 37509 0 R 1 root
58.09 27715 0 R1 131567 cellular organisms
36.91 17612 0 D 2759 Eukaryota
36.91 17612 0 D1 33154 Opisthokonta
20.90 9974 0 K 4751 Fungi
20.90 9974 0 K1 451864 Dikarya
20.90 9974 0 P 4890 Ascomycota
20.90 9974 0 P1 716545 saccharomyceta
20.90 9974 0 P2 147537 Saccharomycotina
20.90 9974 0 C 4891 Saccharomycetes
20.90 9974 0 O 4892 Saccharomycetales
20.90 9974 0 F 4893 Saccharomycetaceae
20.90 9974 0 G 4930 Saccharomyces
20.90 9974 0 S 4932 Saccharomyces cerevisiae
20.90 9974 9974 S1 559292 Saccharomyces cerevisiae S288C
16.01 7638 0 K 33208 Metazoa
16.01 7638 0 K1 6072 Eumetazoa
16.01 7638 0 K2 33213 Bilateria
16.01 7638 0 K3 33511 Deuterostomia
16.01 7638 0 P 7711 Chordata
16.01 7638 0 P1 89593 Craniata
16.01 7638 0 P2 7742 Vertebrata
16.01 7638 0 P3 7776 Gnathostomata
16.01 7638 0 P4 117570 Teleostomi
16.01 7638 0 P5 117571 Euteleostomi
16.01 7638 0 P6 8287 Sarcopterygii
16.01 7638 0 P7 1338369 Dipnotetrapodomorpha
16.01 7638 0 P8 32523 Tetrapoda
16.01 7638 0 P9 32524 Amniota
16.01 7638 0 C 40674 Mammalia
16.01 7638 0 C1 32525 Theria
16.01 7638 0 C2 9347 Eutheria
16.01 7638 0 C3 1437010 Boreoeutheria
16.01 7638 0 C4 314146 Euarchontoglires
16.01 7638 0 O 9443 Primates
16.01 7638 0 O1 376913 Haplorrhini
16.01 7638 0 O2 314293 Simiiformes
16.01 7638 0 O3 9526 Catarrhini
16.01 7638 0 O4 314295 Hominoidea
16.01 7638 0 F 9604 Hominidae
16.01 7638 0 F1 207598 Homininae
16.01 7638 0 G 9605 Homo
16.01 7638 7638 S 9606 Homo sapiens
21.17 10103 0 D 2 Bacteria
21.17 10103 0 P 1224 Proteobacteria
21.17 10103 0 C 1236 Gammaproteobacteria
21.17 10103 0 O 72274 Pseudomonadales
21.17 10103 0 F 135621 Pseudomonadaceae
21.17 10103 0 G 286 Pseudomonas
21.17 10103 0 G1 136841 Pseudomonas aeruginosa group
21.17 10103 0 S 287 Pseudomonas aeruginosa
21.17 10103 10103 S1 208964 Pseudomonas aeruginosa PAO1
20.53 9794 0 D 10239 Viruses
20.53 9794 0 D1 35237 dsDNA viruses, no RNA stage
20.53 9794 0 O 548681 Herpesvirales
20.53 9794 0 F 10292 Herpesviridae
20.53 9794 0 F1 10293 Alphaherpesvirinae
20.53 9794 0 G 10294 Simplexvirus
20.53 9794 9794 S 10298 Human alphaherpesvirus 1

When I use kreport2krona, the krona file created is the following:

10203 Unclassified
0 k__Eukaryota
0 k__Eukaryota p__Ascomycota
0 k__Eukaryota p__Ascomycota c__Saccharomycetes
0 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales
0 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales f__Saccharomycetaceae
0 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales f__Saccharomycetaceae g__Saccharomyces
9974 k__Eukaryota p__Ascomycota c__Saccharomycetes o__Saccharomycetales f__Saccharomycetaceae g__Saccharomyces s__Saccharomyces_cerevisiae
0 k__Eukaryota p__Chordata
0 k__Eukaryota p__Chordata c__Mammalia
0 k__Eukaryota p__Chordata c__Mammalia o__Primates
0 k__Eukaryota p__Chordata c__Mammalia o__Primates f__Hominidae
0 k__Eukaryota p__Chordata c__Mammalia o__Primates f__Hominidae g__Homo
7638 k__Eukaryota p__Chordata c__Mammalia o__Primates f__Hominidae g__Homo s__Homo_sapiens
0 k__Bacteria
0 k__Bacteria p__Proteobacteria
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae
0 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae g__Pseudomonas
10103 k__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Pseudomonadaceae g__Pseudomonas s__Pseudomonas_aeruginosa
0 k__Viruses
0 k__Viruses o__Herpesvirales
0 k__Viruses o__Herpesvirales f__Herpesviridae
0 k__Viruses o__Herpesvirales f__Herpesviridae g__Simplexvirus

(here you have both of the files)
Simple_sample.zip

If you compare both files, you'll notice that the last line in the report, corresponding to Human alphaherpesvirus 1, is not present in the krona file (therefore, not in the html created by krona). I've tried to fix this on my own, but no success so far. Maybe you can help me out here? Thank you for your attention!

KeyError: 'unclassified (taxid 0)'

When I run make_kreport.py, it generates the following error (KeyError: 'unclassified (taxid 0)'). Does anyone know what causes it?

PROGRAM START TIME: 09-05-2020 13:40:52

STEP 1/4: Reading taxonomy kraken2_standard_09042020/mydb_taxonomy.txt...
722 nodes saved
STEP 2/4: Reading kraken file 2105F.kraken2.output.txt...
36.872 million reads processed
STEP 3/4: Creating final tree...
Traceback (most recent call last):
File "/opt/anaconda3/envs/kraken2/bin/make_kreport.py", line 198, in
main()
File "/opt/anaconda3/envs/kraken2/bin/make_kreport.py", line 145, in main
p_node = taxid2node[curr_tid].parent
KeyError: 'unclassified (taxid 0)'

error with "combine_mpa.py" script

Your script will be very useful for me if you could please help me with the following problem:

I have 206 samples generated through kraken2 (kreport) with --mpa style. When I try to combine them with your script, it gives me the following error:
Here is my script:
combine_mpa.py -i file1 file2 file3 filen -o output.txt
and here is the error:
Number of files to parse: 6 Traceback (most recent call last): File "../KrakenTools/combine_mpa.py", line 142, in <module> main() File "../KrakenTools/combine_mpa.py", line 83, in main [classification, val] = line.strip().split('\t') ValueError: need more than 1 value to unpack

Interestingly, if I combine 2 or 3 files with the same script, it works perfectly fine.

Please let me know what changes I should make to make it work? Many thanks in advance.

KeyError: '3045' from make_kreport.py

Thank you very much for your powerful tools!

I ran into the following error while running the make_kreport.py script.

python make_kreport.py -i P1_S7_L001_R_kraken2.txt -t nt_ktaxonomy -o P1

PROGRAM START TIME: 08-31-2021 16:43:17

STEP 1/4: Reading taxonomy nt_ktaxonomy...
2083898 nodes saved
STEP 2/4: Reading kraken file P1_S7_L001_R_kraken2.txt...
2.084 million reads processed
STEP 3/4: Creating final tree...
Traceback (most recent call last):
File "/home/microbiology/KrakenTools/make_kreport.py", line 199, in
main()
File "/home/microbiology/KrakenTools/make_kreport.py", line 145, in main
p_node = taxid2node[curr_tid].parent
KeyError: '3045'

I ran Kraken2 with the following basic script against the full nt database.

kraken2 --db $kraken2_db P1_S7_L001_R1_kneaddata.fastq --report P1_S7_L001_R_kraken2.txt --report-zero-counts

Thank you very much for your time and help.

Sincerely,

David Bradshaw

error with "combine_mpa.py" script when combining mpa report with header

Hi,
combine_mpa.py was very useful when I combine the mpa reports without header.

But when I chose to add the header to the mpa report(--display-header), an error was reported (as below). How should I solve it?

Traceback (most recent call last):
File "combine_mpa.py", line 142, in
main()
File "combine_mpa.py", line 83, in main
[classification, val] = line.strip().split('\t')
ValueError: not enough values to unpack (expected 2, got 1)

kreport2krona.py not count the reads of bracken.Kreport

To obtain the krona visualization of Bracken reports, I run the kreport2krona.py using the file bracken_kreport (percentage of reads, Total number of reads, etc). However, I noted that the created output file does not present the majority of reads assigned in the bracken_Kreport. For example, in the bracken_kreport,
a vast amount of reads assigned of genus Ligilactobacillus and the created file the number is 0. Anyone, could you help me to solve this?

100.00 7095261 0 R 1 root
100.00 7095139 0 R1 131567 cellular organisms
99.69 7073316 0 D 2 Bacteria
88.16 6255049 0 D1 1783272 Terrabacteria group
86.95 6169667 0 P 1239 Firmicutes
86.60 6144312 0 C 91061 Bacilli
86.43 6132776 0 O 186826 Lactobacillales
86.04 6104444 0 F 33958 Lactobacillaceae
83.89 5952280 0 G 2767887 Ligilactobacillus
1.53 108246 0 G 2742598 Limosilactobacillus

0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Ligilactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Limosilactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Lactobacillus
0 k__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Lactobacillaceae g__Leuconostoc

Format of output files from extract_kraken_reads.py

Hi,
I use extract_kraken_reads.py to separate reads depending on the taxonomy ID. My input files are fastq paired read files and I was hoping that the extracted files would also be fastq files, but it looks like there are fasta files. Is there an option to get fastq files as an output?
Thanks!

Error occurs in <make_kreport.py>

Hi Jennifer,
i meet an error report in the <make_kreport.py>, i wonder why it happens.
Here is my Log:

`
-bash-4.2$ python ~/yanren/app/KrakenTools-master/make_kreport.py -i MT1.krak2 -t ~/MY_KRAKEN2_DATABASE/TAXONOMY_MAKE.txt -o report
PROGRAM START TIME: 12-19-2020 12:49:03

STEP 1/4: Reading taxonomy /lustre/quanzx/MY_KRAKEN2_DATABASE/TAXONOMY_MAKE.txt...
30392 nodes saved
STEP 2/4: Reading kraken file MT1.krak2...
204.704 million reads processed
STEP 3/4: Creating final tree...
Traceback (most recent call last):
File "/lustre/quanzx/yanren/app/KrakenTools-master/make_kreport.py", line 198, in
main()
File "/lustre/quanzx/yanren/app/KrakenTools-master/make_kreport.py", line 145, in main
p_node = taxid2node[curr_tid].parent
KeyError: 'Yuavirus (taxid 1299429)'
`

Make pip3 installable?

pip3 install kraken-tools

desirable
https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56

extract_kraken_reads.py ERROR

Hello,

I have installed krakentools using anaconda3. When I launch extract_kraken_reads.py I am getting the error:
ERROR: --report not specified.(krakentools_env)
How can I fix this?

Best regards,

Giacomo

extract_kraken_reads.py No reads processed

I've just downloaded the code and run extract_kraken_reads.py this way:

./KrakenTools-master/extract_kraken_reads.py -k output.kraken -s R1.fastq -s2 R2.fastq -o R1_extracted.fastq -o2 R2_extracted.fastq -t 2 --exclude --report report.txt --fastq-output

but I got this error:

PROGRAM START TIME: 11-19-2021 08:54:26
        1 taxonomy IDs to parse
>> STEP 1: PARSING KRAKEN FILE FOR READIDS AZTI526.kraken
        0.00 million reads processed
        0 read IDs saved
>> STEP 2: READING SEQUENCE FILES AND WRITING READS
        0 read IDs found (0.00 mill reads processed)
        0 read IDs found (0.00 mill reads processed)
        0 reads printed to file
        Generated file: AZTI526_R1_extracted_kraken.fastq
        Generated file: AZTI526_R2_extracted_kraken.fastq

Why no read was processed?

Combine Bracken files

Hi,

Good day.

I would like to check if I can run 'combine_kreports.py' on my Bracken reports? I have around 650 Bracken reports. It shows below errors when I tried to run it using the following command.

combine_kreports.py -r *.bracken -o bracken_phylum_all.report

Could you please advise?

Thanks.

Regards,
Soo Ching

STEP 1: READING REPORTS
1/4 samples processedTraceback (most recent call last):
File "/nethome/lees51/.conda/envs/shotgun_ana3_py3.7.0/bin/combine_kreports.py", line 311, in
main()
File "/nethome/lees51/.conda/envs/shotgun_ana3_py3.7.0/bin/combine_kreports.py", line 203, in main
report_vals = process_kraken_report(line)
File "/nethome/lees51/.conda/envs/shotgun_ana3_py3.7.0/bin/combine_kreports.py", line 120, in process_kraken_report
level_reads = int(split_str[2])
ValueError: invalid literal for int() with base 10: 'P'

extract_kraken_reads with --exclude and --include-children on multiple taxids

Dear all

It does not appear to be able to submit multiple species-level taxids for the extract_kraken_reads command, while having both the --exclude and --include-children options to remove the indicated taxids, as well as its strain-level offsprings. Is there a get around to this if I want to remove reads of multiple species and all the strains from the kraken output?

Thanks

Marcus

--quiet option for extract_kraken_reads.py

Thanks for this really useful tool.

I am running extract_kraken_reads.py in slurm sbatch jobs and the output printed to screen is leading to >10Gb job log files, with files containing: 0 reads processed^M 0.01 million reads processed^M 0.02 million reads processed^M ...and so on for my ~180x10^6 reads.

Is it possible to include an option that reduces the verbosity of the output? Possibly one that still outputs some useful information on major stages within the script?

IndexError: list index out of range

Hi,

Thanks for developing this scripts to post-process Kraken result. Can I use it to process the KrakenUniq report?

I try your script to combine multi-report. It reports the above error. Can you help to figure out what 's wrong with my running?

perl ../KrakenTools/combine_kreports.py -r SLX_14_S5.fastp.clean.report SLX_15_S40.fastp.clean.report -o ../test.out.report

STEP 1: READING REPORTS
1/2 samples processedTraceback (most recent call last):
File "../KrakenTools/combine_kreports.py", line 309, in
main()
File "../KrakenTools/combine_kreports.py", line 201, in main
report_vals = process_kraken_report(line)
File "../KrakenTools/combine_kreports.py", line 113, in process_kraken_report
int(split_str[1])
IndexError: list index out of range

extract_kraken_reads.py throwing a error

Hi Thanks for the tool. I tried testing it today and it is giving me this error. I used kraken file from the kraken2 tool and paired end files which i used as an input. In step 1 it is able to find the reads ID assigned to the taxa. But in step 2 the reads ids are not being found in the fastq files.

Bioconda package / release

Going down the same road as #11 - is there a plan to have this released as a versionized package on GitHub/Bioconda/PyPi?

I'm currently writing a MultiQC module and it would be extremely helpful to be able to a.) import this as a set of scripts to create appropriate tables and then b.) get these for visualization into MultiQC.

MultiQC/MultiQC#121

That would be much easier if you could (at a minimum) create an ideally versionized script release here on Github and/or provide a pip3 installable #11 so we can get this to Bioconda.

I would be happy to help of course if you need any help on this :-)

How to combine kraken2 output/reports from different samples and convert it to metaphlan format?

Hi @jenniferlu717 and everyone,

I'm looking for a way to combine kraken2 output/reports from different samples and convert it to metaphlan format, or (the other way around) to convert kraken2 output/reports from different samples to metaphlan format and combine them together. It's like using the script kraken-mpa-report of kraken1 on different kraken output files and combining them into metaphlan format.

Googling brought me to your script kreport2mpa.py, which converts a kraken report to metaphlan format. And then, I saw that you also have combine_kreports.py, which combines kraken reports from different samples. I wonder if I can combine those two scripts to achieve what I'd like to do, maybe by using combine_kreports.py and then kreport2mpa.py? Do you have any idea?

Thanks in advance for your input.

Cheers

No read IDs found

Hi,
I am trying to use "extract_kraken_reads.py" script to extract the reads related to particular species. From kraken report it is shown that there are 465 reads related to the species I am interested in.
I applied following command but the
python extract_kraken_reads.py -k 107C_out.kraken -s 107C_seq.file -o extract_107C.fastq -t 813 &

but eventually, no read ID is found . Here is my output;

PROGRAM START TIME: 08-07-2020 01:38:50
1 taxonomy IDs to parse

STEP 1: PARSING KRAKEN FILE FOR READIDS 107C_out.kraken
4.27 million reads processed
465 read IDs saved
STEP 2: READING SEQUENCE FILES AND WRITING READS
0 read IDs found (4.27 mill reads processed)
0 reads printed to file
Generated file: extract_107C.fastq
PROGRAM END TIME: 08-07-2020 01:40:44

[1]+ Done python extract_kraken_reads.py -k 107C_out.kraken -s 107C_seq.file -o extract_107C.fastq -t 813

Any comments would be appreciated.

Read count doesn't match expected from extract_kraken_reads.py

I am trying to extract the reads from a single taxid at that level (no parents or children). The hierarchy looks like:

  0.01  1898    0       P       2732408         Pisuviricota
  0.01  1898    0       C       2732506           Pisoniviricetes
  0.01  1898    0       O       76804               Nidovirales
  0.01  1898    0       O1      2499399               Cornidovirineae
  0.01  1898    0       F       11118                   Coronaviridae
  0.01  1898    0       F1      2501931                   Orthocoronavirinae
  0.01  1898    0       G       694002                      Betacoronavirus
  0.01  1898    12      G1      2509481                       Embecovirus
  0.01  1886    212     S       694003                          Betacoronavirus 1
  0.01  1539    1539    S1      31631                             Human coronavirus OC43
  0.00  135     135     S1      11128                             Bovine coronavirus <--- I WANT THIS ONE

I would expect that there are 135 reads associated with taxid 11128. That's how many are in the read assignment report too:

17:29 cn0896 classify$ grep -P '\s+11128\s+' VRNA_015.txt | wc -l
135

However, when I try to extract those reads I get:

17:29 cn0896 classify$ /data/Segrelab/bwbin/KrakenTools/extract_kraken_reads.py -k VRNA_015.txt --fastq-output -s1 ../VRNA_015.qc.nohuman.fastq.gz -o VRNA_015.only_11128.fastq -t 11128
PROGRAM START TIME: 06-28-2021 21:31:38
		1 taxonomy IDs to parse
>> STEP 1: PARSING KRAKEN FILE FOR READIDS VRNA_015.txt
		16.09 million reads processed
		102 read IDs saved
>> STEP 2: READING SEQUENCE FILES AND WRITING READS
		102 read IDs found (7.04 mill reads processed)
		102 reads printed to file
		Generated file: VRNA_015.only_11128.fastq
PROGRAM END TIME: 06-28-2021 21:35:42
17:35 cn0896 classify$ grep -c '^' VRNA_015.only_11128.fastq | div4.pl 
102

Any idea why I'm getting 102 instead of 135? I get the same error when extracting reads from inner nodes as well but I thought this was a simpler example. I don't see an obvious version flag but I cloned the repo in June of this year.

if a script is repo'd in a forest

Dear Dev. & co,

Thanks so much for Bracken and the seemingly endless support provided for Kraken_, - not least this repo.

There are so many tools here that I really, really wish I'd known about when I was working K2+B into my workflows, not least the diversity tools that are somewhat hidden even within the KrakenTools repo (i.e. not mentioned at all in KT's README.md).

The Kraken/Bracken suite of tools, and the community in general, would benefit hugely from having a (e.g.) "see also" header at the top of each README.md that notified users e.g. "KrakenTools exists, with the following functionalities". It might even cut down on the cross-posting of issues...

all the best,
H

error when open gzipped files

Got the following error message:
ERROR: sequence file must be FASTA or FASTQ

It seems to me that the mode argument in gzip.open() should be rt , instead of r.

extract_kraken_reads wrong number of reads, blast result does not match

Hi,
I tried with several different taxa to extract reads from a kraken file - first, the number of extracted reads does not match (e.g. 11 reads assigned to species in kraken file, but only 3 are extracted), and second, if I blast these sequences, they are in almost all cases nowhere near what kraken assigned them to (with a complete nt-kraken-database), like I sometimes get bacteria or plants, when it should be a mammal (for very few taxa though the blast result matches the kraken assignment).

I used
krakentools/extract_kraken_reads.py -k sample1.kraken -s sample1.fastq -t 37349 -o test.fasta

the process looks like this

PROGRAM START TIME: 08-18-2020 09:42:54
1 taxonomy IDs to parse

STEP 1: PARSING KRAKEN FILE FOR READIDS t1_trim.kraken
0.61 million reads processed
9 read IDs saved
STEP 2: READING SEQUENCE FILES AND WRITING READS
4 read IDs found (0.61 mill reads processed)
0 reads printed to file
Generated file: extr_trest.fasta

and the output file contains only 3 sequences of the 9 directly assigned reads, which when blasted are nothing close to the assigned taxon...

I am very confused about these results and any advice would be greatly appreciated.

Filtered Bracken-style output to Kraken reports format

Hi,

I filtered human tax using filter_bracken_out.py, as follow:

python KrakenTools/filter_bracken.out.py -i Test.S.bracken -o Test.S.bracken.filter --exclude 9606

But what can I do using this filtered file？I want transform this Bracken-style file to Kraken report format file, and use kreport2mpa.py to get absolute and relative abundance of each level. How can I achieve this? How can I convert the format?

Anybody know how to do that?

Thanks very mush!

Best wishes,
Myshu

jenniferlu717 / krakentools Goto Github PK

krakentools's People

Contributors

Stargazers

Watchers

Forkers

krakentools's Issues

error output

submission script

I got an error from running the command "extract_kraken_reads.py" as below:

Recommend Projects

Recommend Topics

Recommend Org