10xgenomics / bamtofastq Goto Github PK

View Code? Open in Web Editor NEW

57.0 15.0 6.0 3.93 MB

Convert 10x BAM files to the original FASTQs compatible with 10x pipelines

License: MIT License

Rust 98.13% Starlark 1.87%

bamtofastq's People

Contributors

Stargazers

Watchers

Forkers

atul9 icanccwhite oh-wo ewowiredu welsonleo josquinmoraly

bamtofastq's Issues

Directly turn bam file to UMI count?

I need to filter out some reads from bam file from visium. Is it possible to directly turn that bam file to UMI count without transforming back to fastq?

I know there are some tools available, but it could not be better if using the same tool as used in space ranger.

Output directory doesn't exist error

Hi,
Thanks for developing the tool.
I'm trying to convert 05 BAM files produced by cellranger (GSE115469) and got the error as follow:
"bamtofastq v1.3.5
bamtofastqerror: error creating output directory: "/scratch/c.c1942783/Liver.scRNA/fastq/fastqs". Does it already exist?
Please contact [email protected] for assistance. Please re-run with --traceback and include stack trace with an error report
see below for more details:
error creating output directory: "/scratch/c.c1942783/Liver.scRNA/fastq/fastqs". Does it already exist?”
Those BAM files are the original ones.
My script is (here is for 01 of those files (https://sra-pub-src-1.s3.amazonaws.com/SRR7276478/P5TLH.bam.1)):
bamtofastq P5TLH.bam.1 /scratch/c.c1942783/Liver.scRNA/fastq/fastqs
The above directory has been created and does exist.
Any help would be greatly appreciated.
Kind regards,
Dien

Option --relaxed not working properly for bamtofastq-1.4.1

Hello,

I have .bam file of paired reads that was pre-filtered to specific locus using samtools (ie.):

samtools view -bS possorted_genome_bam.bam -f 2 -F 4 "14:21621800-22552200" "7:142299100-142813500" > possorted_genome_filt.bam

which was indexed afterwards. Now I want to convert it back to fastqs using bamtofastq-1.4.1. Obviously not every read will have its mate, so I wanted to use --relaxed option on resulting .bam. However when I do, I get this error:

bamtofastq v1.4.1
Didn't find both records for a paired end read. Skipping. Read name of unpaired record: A00703:356:H3KFNDSX7:1:1513:27163:20635 3:N:0:0
bamtofastq failed unexpectedly. Please contact [email protected] with the following information: 'swap_remove index (is 0) should be < len (is 0)' library/alloc/src/vec/mod.rs:1300:
   0: bamtofastq::set_panic_handler::{{closure}}
             at /root/src/main.rs:975:25
   1: std::panicking::rust_panic_with_hook
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:610:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
   4: rust_begin_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
   5: core::panicking::panic_fmt
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
   6: alloc::vec::Vec<T,A>::swap_remove::assert_failed
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/vec/mod.rs:1300:13
   7: alloc::vec::Vec<T,A>::swap_remove
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/vec/mod.rs:1305:13
      bamtofastq::proc_double_ended
             at /root/src/main.rs:1241:18
      bamtofastq::inner
             at /root/src/main.rs:1115:13
   8: bamtofastq::go
             at /root/src/main.rs:1026:13
      bamtofastq::main
             at /root/src/main.rs:958:15
   9: core::ops::function::FnOnce::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:227:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:123:18
  10: main
  11: __libc_start_main
             at /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
  12: <unknown>

Strangely while without --relaxed option bamtofastq does output at least some reads in .fastq files with --relaxed option they are all empty. I am not very familiar with Rust so i am not sure whether this is some kind of bug or I am missing some dependencies. Could you please help me to find out what is happening and how to solve the issue? Sadly, subsetting locus with bamtofastq directly is not an option because bamtofastq doesn't seem to support multiple loci in one command.

Different

thanks

Running bamtofastq on a subset of CellRanger bam

I would like to run bamtofastq on a subset of the CellRanger bam.

When I am running bamtofastq on the actual CellRanger possorted_genome_bam.bam, it works very smoothly, but when I input the bam subset, I am running into the error message: Error opening BAM file.

I would appreciate any help!

Below are the commands I ran to obtain the bam file subset:

Get readIDs of interest from a fastq (created downstream of the CellRanger output: is a subset of the initial bam with some additional tags, however no modification in the readID)

awk 'NR % 4 == 1' possorted_genome_bam_modified_R1.fastq | cut -c 2- > readID_labelled.txt

subset CellRanger bam by readIDs of interest

samtools view possorted_genome_bam.bam | fgrep -w -f readID_labelled.txt > possorted_genome_bam_labelled.bam

bam does not contain a header, so I add it manually

samtools view -H possorted_genome_bam.bam > bam_subset/header.txt

cat header.txt possorted_genome_bam_labelled.bam > possorted_genome_bam_labelled_h.bam

manual inspection confirms same layout of CellRanger bam and subsetted bam
command from bamtofastq:

/local/users/bin/bamtofastq-1.3.2 possorted_genome_bam_labelled_h.bam path/to/file/fastq

The bam data processed by bamtofastq is different from the original FASTQ file.

Hi,
Thanks for developing this powerful tool. But when I processed the BAM file from the CellRanger pipeline with bamtofastq, the results didn't match the original FASTQ sequencing data. The code is " ./bamtofastq_linux --nthreads 48 /data/jr/A15762/outs/possorted_genome_bam.bam /home/pc/A15762/"

My original FASTQ file is :

The bam data after bamtofastq processed is :

Curiously, two folders containing FASTQ data appear.

The first folder screenshot is:

The second folder screenshot is:

For this issue, Which FASTQ data should I use for reanalysis with CellRanger?
I sincerely hope to get your help.

Make the tool work with "cellranger vdj" bam outputs

It would be great if the tool could produce a FastQ file from the VDJ pipeline BAM files (e.g. all_contig.bam), including the usage of the --bx-list option as well.

Thanks in advance.

Compiling

When I compile by running 'cargo build --release', I receive the following error. I am using rust version 1.41.1.

[easybuild@compute-0-0 bamtofastq]$ cargo build --release
Compiling libc v0.2.43
error[E0463]: can't find crate for std

error: aborting due to previous error

For more information about this error, try rustc --explain E0463.
error: could not compile libc.

To learn more, run the command again with --verbose

_thread 'main' panicked at 'not a string' running bamtofastq

I'm trying to run bamtofastq on a BAM file generated by cellranger count (cellranger version 1.3) but I got the following error:

thread 'main' panicked at 'not a string', /home/sena/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.12.1/src/bam/record.rs:367:18

I'm attaching the first 4 lines of the bam file I used, which are enough to reproduce the error.

The command I used was straightforward:

RUST_BACKTRACE=1 ./bamtofastq debug.bam.txt test

This is the RUST_BACKTRACE=1 output. It seems like this is an error using string() in the htslib. I'm not sure how to fix this so any help would be greatly appreciated. Thank you!

__stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::sys_common::backtrace::print
at /checkout/src/libstd/sys_common/backtrace.rs:68
at /checkout/src/libstd/sys_common/backtrace.rs:57
2: std::panicking::default_hook::{{closure}}
at /checkout/src/libstd/panicking.rs:381
3: std::panicking::default_hook
at /checkout/src/libstd/panicking.rs:397
4: std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:577
5: std::panicking::begin_panic
at /checkout/src/libstd/panicking.rs:538
6: rust_htslib::bam::record::Aux::string
at /home/sena/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.12.1/src/bam/record.rs:367
7: bamtofastq::FormatBamRecords::bam_rec_to_fq
at src/main.rs:515
8: bamtofastq::FormatBamRecords::format_read
at src/main.rs:595
9: bamtofastq::proc_single_ended
at src/main.rs:1065
10: bamtofastq::inner
at src/main.rs:950
11: bamtofastq::go
at src/main.rs:865
12: bamtofastq::run
at src/main.rs:846
13: bamtofastq::main
at src/main.rs:815
14: std::rt::lang_start::{{closure}}
at /checkout/src/libstd/rt.rs:74
15: std::panicking::try::do_call
at /checkout/src/libstd/rt.rs:59
at /checkout/src/libstd/panicking.rs:480
16: __rust_maybe_catch_panic
at /checkout/src/libpanic_unwind/lib.rs:101
17: std::rt::lang_start_internal
at /checkout/src/libstd/panicking.rs:459
at /checkout/src/libstd/panic.rs:365
at /checkout/src/libstd/rt.rs:58
18: std::rt::lang_start
at /checkout/src/libstd/rt.rs:74
19: main
20: _libc_start_main
21: start

debug.bam.txt
debug.sam.txt

Type = TenX

Hi,
What software do you use for downloading SRA in bam format and which software uses the parameter "Type = TenX"?

Missing closing double quotes gives TOML parse error

bamtofastq/Cargo.toml

Line 3 in d321c4c

version = "1.3.0

Unable to convert 10x bam to fastq using bamtofastq and getting folder named as "MissingLibrary"

Hi i have a quick question, i have few aligned bam files from single cell RNA Seq data. I want to regenerate fastqs from them. In order to do so i am using 10x's bamtofastq utility and I am also getting fastq files but in the specified path within a folder named “MissingLibrary_1_flowcellName”. I am not sure what does this mean? Why the generated folder is named as missing library.
Does anybody have any idea about this?

The command I used is:
bamtofastq possorted_genome_bam.bam ./sample
Any help will be highly appreciated.

Thanks

Is there a way to skip the error and continue converting?

Hi,

I am running it on a bam file but the error message told me it could be truncated. I am seeking a way how to skip these errors.
Slight data loss is acceptable.

`[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read_block] Invalid BGZF header at offset 295634318
[E::bgzf_read] Read block operation failed with error 6 after 0 of 4 bytes
bamtofastq error: IO Error reading BAM file. Your BAM file may be corrupt

If this error is unexpected, contact [email protected] for assistance. Please re-run with --traceback and include stack trace with an error report
see below for more details:

IO Error reading BAM file. Your BAM file may be corrupt
0: anyhow::backtrace::capture::Backtrace::capture
1: bamtofastq::inner
2: bamtofastq::main
3: std::sys_common::backtrace::__rust_begin_short_backtrace
4: _main
`

Strange read-splitting behaviour

Dear bamtofastq developer team,

I recently came across a very interesting behaviour. I am trying to reprocess a public dataset that consists of 22 10x GEX runs (I've checked and I'm pretty positive that none of those are ATAC etc). Here is the link to the dataset:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138669

SRA has failed to recognise the "technical" R1 read, so they have made the submitter's 10x BAM files available. However, upon running the latest (v1.4.1) bamtofastq (each job has completed successfully etc), I have discovered that samples were split into two big groups. Group 1 (GSM4115877-GSM4115889) has generated normal index, technical R1 (26 bp), and biological R2 (98 bp). However, Group 2 (GSM4115868-GSM4115876) has generated 4 reads: index, R1 which is a biological read (98 bp), R2 containing cell barcode (16 bp), and R3 containing UMI (10 bp).

GSM4115868 I1 R1 R2 R3
GSM4115869 I1 R1 R2 R3
GSM4115870 I1 R1 R2 R3
GSM4115871 I1 R1 R2 R3
GSM4115872 I1 R1 R2 R3
GSM4115873 I1 R1 R2 R3
GSM4115874 I1 R1 R2 R3
GSM4115875 I1 R1 R2 R3
GSM4115876 I1 R1 R2 R3
GSM4115877 I1 R1 R2
GSM4115878 I1 R1 R2
GSM4115879 I1 R1 R2
GSM4115880 I1 R1 R2
GSM4115881 I1 R1 R2
GSM4115882 I1 R1 R2
GSM4115883 I1 R1 R2
GSM4115884 I1 R1 R2
GSM4115885 I1 R1 R2
GSM4115886 I1 R1 R2
GSM4115887 I1 R1 R2
GSM4115888 I1 R1 R2
GSM4115889 I1 R1 R2

All BAM tags/headers appear to be the same, even made by the same version of Cell Ranger (v3 I think).

SRR10254548.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY
SRR10254549.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf
..............
SRR10254569.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf

Do you know what is causing it, and I can I fix it?

For your convenience, here are some (NCBI) links to an "offending" and a "normal-behaving" BAM files:

bad BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254550/SC4possorted_genome_bam.bam.1
good BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254567/SC185possorted_genome_bam.bam.1

Thank you in advance!

-- Alex

cargo build failed due to rust-htslib v0.21.0

when installing the package from Master branch, an error occurs:

...
Compiling human-panic v1.0.1
Compiling shardio v0.3.0 (https://github.com/10XGenomics/rust-shardio.git#205727d5)
Compiling rust-htslib v0.21.0
error: failed to run custom build command for rust-htslib v0.21.0
process didn't exit successfully: /~/tools/bamtofastq/target/release/build/rust-htslib-3559a93039723522/build-script-build (exit code: 101)
....

cellranger-arc 1.0.0

could bamtofastq be used to process bam file from cellranger-arc pipeline, version 1.0.0?

Sort paired end .bam before run bamtofastq?

Hi,

Thanks for developing this wonderful tool! I am now using it to convert the possorted_genome_bam.bam generated by Cellranger to the fastq. I am not sure if it is paired end, should I sorted the bam file before running bamto fastq? As I have used the unsorted bam file to generate fastq, and then used the fastq file for scvelo. The scvelo show that there are about 300000 duplicated cells out of 400000 cell in total, I am not sure if it is because of the barcodes in the fastq files? Could you kindly give me some guidance?

Thanks a lot
Boyu

fastq directory structure?

Hi there,

I ran bamtofastq on the bam file at:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_1k_protein_v3

And it outputted two directories of fastq files (see image):

What are they?
Thanks!
-dave

How to set sample name of fastq file output

Hi,

I used cellranger atac to run bamtofastq. The fastq output folder has "MissingLibrary". How do I set the sample name to a meaningful name instead of MissingLibrary? Also, the filename of fastq files starts with something like bamtofastq_S1_L001_I1_001.fastq.gz, how do I change the bamtofastq prefix to the actual sample name?

The command line I used is:

/software/cellranger-atac-2.1.0/lib/bin/bamtofastq file/path/to/bam/ output/directory

Thanks

How to install bamtofastq?

Hi,
Thanks for developing this tool. When I downloaded bamtofastq_linux to my Linux system, I don't know how to make it work.
My code is "export PATH=/home/pc/bamtofastq_linux:$PATH、chmod 700 bamtofastq_linux、sudo apt install cargo、cargo build ---release."
In additon ,when i run cargo build --release, then the error is "error: could not find Cargo.toml in /home/pc or any parent directory"
I sincerely hope to get your help.

bamtofastqerror: Not a valid read pair: false, false

Hi all,
I downloaded the original bam file from SRA database and ran bamtofastq with "--traceback" twice, each time getting this error:

bamtofastqerror: Not a valid read pair: false, false
Please contact [email protected] for assistance. Please re-run with --traceback and include stack trace with an error report
see below for more details:
==========================
Not a valid read pair: false, false
   0: failure::backtrace::internal::InternalBacktrace::new
             at execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/internal.rs:46
      failure::backtrace::Backtrace::new
             at execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:121
   1: <failure::error::error_impl::ErrorImpl as core::convert::From<F>>::from
             at execroot/exec/external/bamtofastq/execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/error/error_impl.rs:19
      <failure::error::Error as core::convert::From<F>>::from
             at execroot/exec/external/bamtofastq/execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/error/mod.rs:36
      failure::error_message::err_msg
             at execroot/exec/external/bamtofastq/execroot/home/registry/src/github.com-1ecc6299db9ec823/failure-0.1.8/src/error_message.rs:12
      bamtofastq::FormatBamRecords::bam_rec_to_ser
             at execroot/exec/external/bamtofastq/src/main.rs:494
   2: bamtofastq::proc_double_ended
             at execroot/exec/external/bamtofastq/src/main.rs:1185
      bamtofastq::inner
             at execroot/exec/external/bamtofastq/src/main.rs:1110
   3: bamtofastq::go
             at execroot/exec/external/bamtofastq/src/main.rs:1020
      bamtofastq::main
             at execroot/exec/external/bamtofastq/src/main.rs:977
   4: core::ops::function::FnOnce::call_once
             at /mnt/build/toolchains/tools/rust/1.53.0/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /mnt/build/toolchains/tools/rust/1.53.0/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125
   5: main
   6: __libc_start_main
   7: <unknown>

What should I do next ?

'Invalid BAM record: read: "1" is missing tag: "CR"'

I get the following error when trying to run bamtofastq
thread 'main' panicked at 'Invalid BAM record: read: "1" is missing tag: "CR"', src/main.rs:509:25

Here are the comment tags in the bam header:

@CO	10x_bam_to_fastq:I1(BC:QT)
@CO	10x_bam_to_fastq:R1(CR:CY,UR:UY,TR:TQ)
@CO	10x_bam_to_fastq:R2(SEQ:QUAL)

and the first read in the bamfile

52080637	256	chr10	13047	3	25S125M	*	0	0	GTGGTATCAACGCAGAGTACATGGGGGCTCCAACCCTCGGGATGCCTCATGCTCACCCTTTGGCACCCACCTGACAGCTCAGCATGTCTGCTCTCTGCCATCCTCAATGCCTGCTCTAGACAAGCCCAAGTCCGCCAGGAGTGGCAGAGG	FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FF:FFFFFFFFFFFFF:FFFFFFFFFF:FFFFF:FFFF	RG:Z:A3_inflamed:MissingLibrary:1:H3TWHDMXX:1	NH:i:2	NM:i:2

Wondering if something is formatted incorrectly. Any help much appreciated, thanks!

bamtofastq generated 3 reads with triplicate of each reads

Dear Sir/Madam,

Hope you are well.

I download the original bam file and did a bamtofastq convert. So I found that SRR7092170 has a bam. file (link: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR7092170&display=data-access), so i downloaded the bam file and did a bamtofastq convert. My command is this: bamtofastq_linux /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/YX_05.bam /lustre/miifs01/project/m2_jgu-canshank3/Individual_Processing/Xiang_try_bam/output
bamtofastq v1.4.1
Writing finished. Observed 127722890 read pairs. Wrote 127722890 read pair.

After the bamtofastq process, I obtained a full list of fastq files looking like this:

bamtofastq_S1_L001_I1_001.fastq.gz
bamtofastq_S1_L001_I1_002.fastq.gz
bamtofastq_S1_L001_I1_003.fastq.gz
bamtofastq_S1_L001_R1_001.fastq.gz
bamtofastq_S1_L001_R1_002.fastq.gz
bamtofastq_S1_L001_R1_003.fastq.gz
bamtofastq_S1_L001_R2_001.fastq.gz
bamtofastq_S1_L001_R2_002.fastq.gz
bamtofastq_S1_L001_R2_003.fastq.gz
bamtofastq_S1_L001_R3_001.fastq.gz
bamtofastq_S1_L001_R3_002.fastq.gz
bamtofastq_S1_L001_R3_003.fastq.gz

Please see the first few lines of these files:

bamtofastq_S1_L001_I1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 4:N:0:0
AACGACAC
+
CCDDBIII
@D00536:344:HFYLCBCXY:1:1210:1863:95823 4:N:0:0
CGTCCTCT
+
DDDDAGHH
@D00536:344:HFYLCBCXY:1:2113:4071:88090 4:N:0:0
TTGATGGG

bamtofastq_S1_L001_R1_001.fastq.gz:

@D00536:344:HFYLCBCXY:1:1114:4988:61338 1:N:0:0
AATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCAACACCGGCCATGCAGCAAAATCATCAGTGGAAA
+
DDDDDHIEGHIIIIIHHHIHHHIIIIIIIIIIGHFHHIIGIIIIIGGIIIIIDHHHIIHHHHHHHIHIHFEDHHIFH?1FECG@GHGGGHHHHHIHHH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 1:N:0:0
AAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAGTCATTCTCA
+
DDCDCHIIHEGCCC1GCEEHHHFHHHIHHII1CCEHGGIIH1EGCHHHHHHHIGHIIHHHEGHIIFIGGHIIIIHIHIIIEHHHHHDHCCFGHHH?C1
@D00536:344:HFYLCBCXY:1:2113:4071:88090 1:N:0:0
AGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGCTGGCAATCTCGTTTAAACTACATGCAGGAACAGCAAAGGAAATCCGGCAAATTTGCGCAG

bamtofastq_S1_L001_R2_001.fastq.gz
ATAACATGACCAAC
+
ADA@DIIFI?<FHH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 2:N:0:0
CACGCTACAGATGA
+
DDDDAICCHIIHIE
@D00536:344:HFYLCBCXY:1:2113:4071:88090 2:N:0:0
CACGCTACAGATGA

bamtofastq_S1_L001_R3_001.fastq.gz

@D00536:344:HFYLCBCXY:1:1114:4988:61338 3:N:0:0
GTAGGCAACA
+
DDDDCIIIIH
@D00536:344:HFYLCBCXY:1:1210:1863:95823 3:N:0:0
CGCAAAATAA
+
DDDDDIIIII
@D00536:344:HFYLCBCXY:1:2113:4071:88090 3:N:0:0
CGCAAAATAA

I am confused why there are R3? Did the read 1 got split into two reads? (because the length of R3 and R2 seems to make up to 24 base pair). And why there are R1 - R3 and seems every reads and index files always triplicated into 001, 002 and 003.

I am hoping that I have described my problem sufficiently for a response to solve my issue.

Thank you in advance, and thank you for developing this tool, and looking forward to your response.

Best Wishes,

David

bam.fetch error with expecting u64 but getting u32

Hello 10X team!

When compiling the latest version of bamtofastq, I get the following error:

error[E0308]: mismatched types
   --> src/main.rs:846:28
    |
846 |             bam.fetch(tid, loc.start, loc.end)?;
    |                            ^^^^^^^^^
    |                            |
    |                            expected `u64`, found `u32`
    |                            help: you can convert an `u32` to `u64`: `loc.start.into()`

error[E0308]: mismatched types
   --> src/main.rs:846:39
    |
846 |             bam.fetch(tid, loc.start, loc.end)?;
    |                                       ^^^^^^^
    |                                       |
    |                                       expected `u64`, found `u32`
    |                                       help: you can convert an `u32` to `u64`: `loc.end.into()`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0308`.
error: could not compile `bamtofastq`.

but switching the line with the suggested fix ameliorates the issue and builds successfully.

full compilation output:

   Compiling cc v1.0.50
   Compiling libc v0.2.69
   Compiling proc-macro2 v1.0.10
   Compiling unicode-xid v0.2.0
   Compiling syn v1.0.17
   Compiling cfg-if v0.1.10
   Compiling serde v1.0.106
   Compiling autocfg v1.0.0
   Compiling version_check v0.9.1
   Compiling memchr v2.3.3
   Compiling lazy_static v1.4.0
   Compiling pkg-config v0.3.17
   Compiling bitflags v1.2.1
   Compiling regex-syntax v0.6.17
   Compiling rustc-demangle v0.1.16
   Compiling ryu v1.0.3
   Compiling glob v0.3.0
   Compiling getrandom v0.1.14
   Compiling itoa v0.4.5
   Compiling semver-parser v0.7.0
   Compiling byteorder v1.3.4
   Compiling remove_dir_all v0.5.2
   Compiling pulldown-cmark v0.2.0
   Compiling bindgen v0.53.2
   Compiling quick-error v1.2.3
   Compiling same-file v1.0.6
   Compiling bytecount v0.4.0
   Compiling shlex v0.1.1
   Compiling ppv-lite86 v0.2.6
   Compiling rustc-hash v1.1.0
   Compiling peeking_take_while v0.1.2
   Compiling lazycell v1.2.1
   Compiling glob v0.2.11
   Compiling semver v0.1.20
   Compiling failure_derive v0.1.7
   Compiling maybe-uninit v2.0.0
   Compiling smallvec v1.3.0
   Compiling matches v0.1.8
   Compiling doc-comment v0.3.3
   Compiling either v1.5.3
   Compiling crc32fast v1.2.0
   Compiling log v0.4.8
   Compiling percent-encoding v2.1.0
   Compiling void v1.0.2
   Compiling min-max-heap v1.3.0
   Compiling strsim v0.9.3
   Compiling linear-map v1.2.0
   Compiling termcolor v1.1.0
   Compiling custom_derive v0.1.7
   Compiling ieee754 v0.2.6
   Compiling thread_local v1.0.1
   Compiling fs-utils v1.1.4
   Compiling walkdir v2.3.1
   Compiling unicode-bidi v0.3.4
   Compiling itertools v0.8.2
   Compiling itertools v0.9.0
   Compiling unreachable v0.1.1
   Compiling rustc_version v0.1.7
   Compiling unicode-normalization v0.1.12
   Compiling aho-corasick v0.7.10
   Compiling csv-core v0.1.10
   Compiling quote v1.0.3
   Compiling crossbeam-utils v0.7.2
   Compiling num-traits v0.2.11
   Compiling regex-automata v0.1.9
   Compiling newtype_derive v0.1.6
   Compiling rand v0.4.6
   Compiling idna v0.2.0
   Compiling error-chain v0.12.2
   Compiling nom v5.1.1
   Compiling rand_core v0.5.1
   Compiling clang-sys v0.29.3
   Compiling rand_chacha v0.2.2
   Compiling crossbeam-channel v0.4.2
   Compiling url v2.1.1
   Compiling rand v0.7.3
   Compiling tempdir v0.3.7
   Compiling regex v1.3.7
   Compiling num-traits v0.1.43
   Compiling ordered-float v0.3.0
   Compiling uuid v0.8.1
   Compiling tempfile v3.1.0
   Compiling os_type v2.2.0
   Compiling backtrace-sys v0.1.35
   Compiling libz-sys v1.0.25
   Compiling libloading v0.5.2
   Compiling openssl-sys v0.9.55
   Compiling bzip2-sys v0.1.8+1.0.8
   Compiling curl-sys v0.4.30+curl-7.69.1
   Compiling lz4-sys v1.8.3
   Compiling lzma-sys v0.1.15
   Compiling cexpr v0.4.0
   Compiling flate2 v1.0.14
   Compiling backtrace v0.3.46
   Compiling synstructure v0.12.3
   Compiling serde_derive v1.0.106
   Compiling snafu-derive v0.6.6
   Compiling derive-new v0.5.8
   Compiling hts-sys v1.10.2
   Compiling failure v0.1.7
   Compiling snafu v0.6.0
   Compiling serde_json v1.0.51
   Compiling semver v0.9.0
   Compiling bio-types v0.6.0
   Compiling toml v0.5.6
   Compiling bincode v1.2.1
   Compiling bstr v0.2.12
   Compiling docopt v1.1.0
   Compiling serde_bytes v0.11.3
   Compiling csv v1.1.3
   Compiling human-panic v1.0.3
   Compiling cargo_metadata v0.6.4
   Compiling skeptic v0.13.4
   Compiling lz4 v1.23.1
   Compiling rust-htslib v0.30.0
   Compiling shardio v0.7.0 (https://github.com/10XGenomics/rust-shardio.git#88aacb25)
   Compiling bamtofastq v1.3.0 (/home/benjohnson/NGS_tools/utils/bamtofastq)

cargo 1.42.0 (86334295e 2020-01-31)
Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-96-generic x86_64)

bamtofastq depends on bxindex, which is part of Long Ranger

I ran bamtofastq --bx-list=mylist.txt my.bam and got this error message:

bamtofastq/src/bx_index.rs

Line 29 in f914d80

 format!("Couldn't find BX index: '{:?}'. You must sort your BAM file with 'samtools sort -t BX' and index with 'bxindex'", bxi_fn) 

I was scratching my head wondering "What is bxindex?" and searching Google without finding any results.

Then I found:
https://github.com/10XGenomics/longranger/blob/master/bin/bxindex

Now I understand that there is another project called Long Ranger that includes a Python library called tenkit that includes functions for creating a .bam.bxi file.

Would you consider including links to such dependencies for bamtofastq in the README file?

Also, I wonder if you might consider changing the error message about sorting and indexing? It could be more helpful if it included a link to the Long Ranger project, so folks can eventually figure out how to index their BAM files the way bamtofastq wants them.

error[E0107]: wrong number of type arguments: expected 1, found 2

I'm trying to install bamtofastq with different version of Rust(v1.28,v.1.32,v1.33) but I always got the following error:

error[E0107]: wrong number of type arguments: expected 1, found 2
--> src/main.rs:172:21
|
172 | impl SortKey<SerFq, Vec> for SerFqSort {
| ^^^^^^^ unexpected type argument

error: aborting due to previous error

For more information about this error, try rustc --explain E0107.
error: Could not compile bamtofastq.

Reproducing cellranger count from bamtofastq

Thank you for the tool.

Before including it in our analysis pipeline, I wanted to give it a quick check, and set up this:

— we have a small sample processed with cellranger 3.1
— run bamtofastq on this sample possorted.bam
— run cellranger on bamtofastq output
— compare original and restored matrices.

Unfortunately, they are not identical. In the filtered_feature_bc_matrix I see a small difference in barcodes.tsv.gz for example. I wonder, what is causing this and how to debug it?

Does bamtofastq work properly for bam files from STARsolo?

Hi!

Just a quick question: Does bamtofastq work properly for bam files from STARsolo? And if not, how can I get the fastqs fneeded for cell ranger when I only have STARsolo .bam files?

Thank you!!

`head_suffix` missing index information

Besides sort order, I would expect that FASTQs generated from the bam should be identical to the original FASTQs. However, the head_suffix is hardcoded as :N:0:0 rather than the format :N:0:{index}. E.g.:

@A00739:209:HNTY3DMXX:2:1101:1090:1000 1:N:0:AGTGGAAC
...
@A00739:209:HNTY3DMXX:2:1101:4291:1000 1:N:0:AGTGGAAC
...
@A00739:209:HNTY3DMXX:2:1101:5195:1000 1:N:0:AGTGGAAC

Is it possible to include this information?

Failed to build the release

Hi,
My os is ubuntu16.04 and I failed to build the release. The following is the error information. I'm not familiar with Rust and Please help me.

--- stderr
./htslib/htslib/hts.h:31:10: fatal error: 'stddef.h' file not found
./htslib/htslib/hts.h:31:10: fatal error: 'stddef.h' file not found, err: true
thread 'main' panicked at 'Unable to generate bindings.: ()', src/libcore/result.rs:997:5
note: Run with RUST_BACKTRACE=1 environment variable to display a backtrace.

Post-error behaviour: bamtofastqerror: Didn't find both records for a paired end read. Is your BAM file complete?

Hi I am using the bamtofastq to dump cellranger-arc generated atac-seq bam files (and barcode-subsetted by 10x subset-bam) to their original fastq format (_I1_001, _R1_001, _R2_001. _R3_001) since I need to re-align the R1_001 and _R3_001 as Read1 and Read2 using bowtie2 (which I have successfully used before). However, during the dumping process, the bamtofastq threw out an error message as:

bamtofastqerror: Didn't find both records for a paired end read. Is your BAM file complete?
However, when I go to the output directory and check the outputs, It seems that there were several sets of fastq files. So my question is: what is the behaviour of bam2fastq after throwing out that error message? Does it 1) terminate itself immediately or 2) ignore that unpaired read and proceed to the next? If (2), then the fastqs are still usable but (1) would surely produce truncated fastq files. What case would it be? Thanks!

Release 1.4.2/1.5.0

Hello,

I'd like to request a new release because v1.4.1 cannot be built on aarch64 architecture due to:

11:24:43 BIOCONDA INFO (OUT) error[E0308]: mismatched types
11:24:43 BIOCONDA INFO (OUT)    --> /opt/conda/conda-bld/10x_bamtofastq_1681816419021/_build_env/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.38.2/src/bam/record.rs:152:16
11:24:43 BIOCONDA INFO (OUT)     |
11:24:43 BIOCONDA INFO (OUT) 152 |             s: sam_copy.as_ptr() as *mut i8,
11:24:43 BIOCONDA INFO (OUT)     |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
11:24:43 BIOCONDA INFO (OUT)     |
11:24:43 BIOCONDA INFO (OUT)     = note: expected raw pointer `*mut u8`
11:24:43 BIOCONDA INFO (OUT)                found raw pointer `*mut i8`
11:24:43 BIOCONDA INFO (OUT) 
11:24:43 BIOCONDA INFO (OUT) error[E0308]: mismatched types
11:24:43 BIOCONDA INFO (OUT)     --> /opt/conda/conda-bld/10x_bamtofastq_1681816419021/_build_env/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-htslib-0.38.2/src/bam/record.rs:583:17
11:24:43 BIOCONDA INFO (OUT)      |
11:24:43 BIOCONDA INFO (OUT) 581  |             htslib::bam_aux_get(
11:24:43 BIOCONDA INFO (OUT)      |             ------------------- arguments to this function are incorrect
11:24:43 BIOCONDA INFO (OUT) 582  |                 &self.inner as *const htslib::bam1_t,
11:24:43 BIOCONDA INFO (OUT) 583  |                 c_str.as_ptr() as *mut i8,
11:24:43 BIOCONDA INFO (OUT)      |                 ^^^^^^^^^^^^^^^^^^^^^^^^^ expected `u8`, found `i8`
11:24:43 BIOCONDA INFO (OUT)      |
11:24:43 BIOCONDA INFO (OUT)      = note: expected raw pointer `*const u8`
11:24:43 BIOCONDA INFO (OUT)                 found raw pointer `*mut i8`

The problem has been fixed in rust-htslib 0.39.3 with rust-bio/rust-htslib@a21aff2

bamtofastq already depends on 0.42 -

bamtofastq/Cargo.toml

Line 21 in 8720dcd

rust-htslib = { version = "0.42", default-features = false }

10xgenomics / bamtofastq Goto Github PK

bamtofastq's People

Contributors

Stargazers

Watchers

Forkers

bamtofastq's Issues

If this error is unexpected, contact [email protected] for assistance. Please re-run with --traceback and include stack trace with an error report see below for more details:

Recommend Projects

Recommend Topics

Recommend Org

If this error is unexpected, contact [email protected] for assistance. Please re-run with --traceback and include stack trace with an error report
see below for more details: