markschl / seq_io Goto Github PK
View Code? Open in Web Editor NEWFASTA and FASTQ parsing in Rust
License: MIT License
FASTA and FASTQ parsing in Rust
License: MIT License
Hi there,
When building seq_io (0.3.1) with rust 1.71, I get the following warning:
warning: the following packages contain code that will be rejected by a future version of Rust: buf_redux v0.8.4
For context, it appears that later versions of rust will have a breaking change with respect to semi-colons and macros.
This issue also appeared in needletail, and a solution was proposed here onecodex/needletail#64. Basically, there is an updated version of buf_redux: https://crates.io/crates/buffer-redux. This should be a drop-in replacement and fix this issue.
There isn't necessarily a rush for this, more of an FYI for now. But if it is possible to get an updated seq_io so this doesn't appear during builds, that would be very nice.
Thanks!
How should I solve it? If you can help me, I will be very grateful
Here are the details:
Compiling seq_io v0.4.0-alpha.0 (https://github.com/markschl/seq_io#3d461a36)
error: unexpected token
--> .cargo/git/checkouts/seq_io-5c12caa4159d1858/3d461a3/src/core/record.rs:21:19
|
21 | concat!("use seq_io::", $module_path, "::{Reader, RecordSet};"),
| ^
|
::: /home/xl/.cargo/git/checkouts/seq_io-5c12caa4159d1858/3d461a3/src/fastx/record.rs:454:1
|
454 | impl_recordset!(RefRecord, QualRecordPosition, LineStore, "fastx", "fastq");
| --------------------------------------------------------------------------- in this macro invocation
|
= note: this error originates in the macro impl_recordset
(in Nightly builds, run with -Z macro-backtrace for more info)
error: unexpected token
--> /home/xl/.cargo/git/checkouts/seq_io-5c12caa4159d1858/3d461a3/src/core/record.rs:21:19
|
21 | concat!("use seq_io::", $module_path, "::{Reader, RecordSet};"),
| ^
|
::: /home/xl/.cargo/git/checkouts/seq_io-5c12caa4159d1858/3d461a3/src/fasta/record.rs:269:1
|
269 | impl_recordset!(RefRecord, SeqRecordPosition, LineStore, "fasta", "fasta");
| -------------------------------------------------------------------------- in this macro invocation
|
= note: this error originates in the macro impl_recordset
(in Nightly builds, run with -Z macro-backtrace for more info)
error: unexpected token
--> /home/xl/.cargo/git/checkouts/seq_io-5c12caa4159d1858/3d461a3/src/core/record.rs:21:19
|
21 | concat!("use seq_io::", $module_path, "::{Reader, RecordSet};"),
| ^
|
::: /home/xl/.cargo/git/checkouts/seq_io-5c12caa4159d1858/3d461a3/src/fastq/record.rs:361:1
|
361 | impl_recordset!(RefRecord, QualRecordPosition, RangeStore, "fastq", "fastq");
| ---------------------------------------------------------------------------- in this macro invocation
|
= note: this error originates in the macro impl_recordset
(in Nightly builds, run with -Z macro-backtrace for more info)
Hello! Is there a timeline for cutting the 0.4.0 release? Specifically I'm finding the read_record_set_exact
functionality to be extremely useful.
While working on making Rust detect more misuses of uninitialized memory (rust-lang/rust#71274), this crate was flagged as a possible regression via its dependency on crossbeam. That dependency is outdated; it would be great if you could update to the latest crossbeam 0.7 which fixed many critical soundness issues. Thanks. :)
Hi Mark and thanks for the great package,
I chose seq io for the speed and was trying to read a gz fastq when I found the pull request and the notes. Originally, I had:
let fq = File::open(fastq_path).expect("Could not open Fastq");
let fq = flate2::read::GzDecoder::new(fq).into_inner();
seq_io::fastq::Reader::new(fq)
to read a fastq.gz. This compiled and I thought great! I tested the application, and it keeps giving me thread 'main' panicked at 'called
Result::unwrap()on an
Err value: InvalidStart { found: 203, pos: ErrorPosition { line: 1, id: None } }', src/main.rs:60:2
I unzipped the same test file and the same code runs fine, I only input a simple check gz basically here: if matches!(ext.unwrap(), "gz") {}
I'm not sure what's going on, I even made sure I added lto=true. It seems to be reading the file but incorrectly, so I tried running buf_redux like in the pull request.. My head hurts so I'll come back to it later, but this is a really exciting prospect! This is the fastest reader so reading from gz would be phenomenal. Any ideas on what the problem could be? cheers
Hi,
Many thanks for developing and sharing seq_io!
I recently got a curious case when trying to parse a gziped fastq files sent by one of my collaborators. When using seq_io directly on these files, I get for all them the following error message:
Err
value: UnexpectedEnd { pos: ErrorPosition { line: 713, id: None } }
However, if I gunzip these fastq files and re-gzip them, they get parsed as expected.
Would you know what could cause this error? Could it be an incompatibility Windows/Unix (I'm using a Unix system but I don't know how these files have been generated)?
My apologies if this is a trivial question/issue.
Dear seq_io team,
I have many 10G fasta/fastq files (10,000) and I want to read each file in parallel (the order of each record in the file does not matter) so that I can accelerate reading all those files. What is the best way you will recommend?
Thanks,
Jianshu
Hello,
I'm using seq_io to do some sliding window analysis of a 4Mb genome, but the size of the sequence remains limited. How should I configure my Rust program for the seqeunce size to be about 4Mb long?
Maybe I'm confused about the API but why do seq_io::fast*::write_*
take ownership of the writer? Doesn't this mean you can only ever write one record? This seems problematic.
The simplest reproduction example is:
let mut reader = Reader::from_path("in.fastq")?;
let mut writer = BufWriter::new(File::open(Path::new("out.fastq"))?);
for r in reader.records() {
let r = r?;
seq_io::fastq::write_to(writer, &r.head, &r.seq, &r.qual);
}
which doesn't compile since writer
was moved.
Hi developer,
When I used seq_io::fasta::Reader to load reference genome (such as GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna
), the size of each chromosome sequence was not corrected (larger than the true sequence length). This is because in reference fasta file the sequence of each chromosome is divided into multiple lines. And I think the size of chromosome sequence in seq_io::fasta::Reader includes all LFs when calculate the sequence length.
Best,
Neng
Hi - I'm wondering if you'd be open to a PR that added a few functions and impls that I'd find really useful when writing tests using seq_io.
In short I'd like to add a few functions to the Record traits (and/or impls) to return String or str versions of the fields, and then implement either Debug or Display to show the String versions.
It's not a ton of work, but I want to make sure you'd be open to a PR before creating one. I can see how it might be confusing anyone using the library for the first time, but it would make writing tests to much more pleasurable. Right now my tests are littered with calls to String::from_utf8(...)
and similar, and I currently have custom assert_eq()
functions for the types so that when they don't match the String forms are printed instead of Vecs of u8s.
Happy to do the work if you'd review and ultimately accept a PR.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.