majored / rs-async-zip Goto Github PK
View Code? Open in Web Editor NEWAn asynchronous ZIP archive reading/writing crate.
License: MIT License
An asynchronous ZIP archive reading/writing crate.
License: MIT License
When I use async_zip::tokio::read::seek::ZipFileReader::reader_with_entry()
to get a reader, I don't seem to be able to use tokio's AsyncRead
or AsyncReadExt
on it, despite using the tokio specific module. This is fine for when I only want to read (since I could use the function aliases read_to_<end/string>_checked()
), but when I'm using something like copy()
, having AsyncRead
implemented is required.
outer.zip
I'm getting "failed to open reader: UnexpectedHeaderError(1969404127, 33639248)" when trying to open that file. (It was a larger zip but I created this test case)
I looked in to why it was happening and it looks like it's looking through the file starting at end-0xffff to end for the end of cd magic number. The problem is that since the inner zip isn't compressed by the outer zip it will actually find the inner zip's end of cd instead of the outer one.
It also runs in to problems if an arbitrary file has that byte sequence but that's probably a bit more rare.
bad_bytes.zip
I think the fix is to just make the AsyncDelimiterReader search starting from the end instead of the front of that range.
I'm using this crate to loop over entries in ZIP archives and extract them. It works as expected on version 0.0.7. However, after upgrading to version 0.0.8, the same function would sometimes fail with error Encountered an unexpected header (actual: 0x0, expected: 0x6054b50)
. I noticed that those "bad" archives have one thing in common: they contain no dir entries but still have a path structure, i.e., "folder/file.ext" exists, but "folder" does not exist. I believe this is what caused the regression.
I'm trying to open this ZIP archive, but async_zip::read::fs::ZipFileReader::new()
returns UnexpectedHeaderError
:
Error: Encountered an unexpected header (actual: 0x0, expected: 0x6054b50).
All other tools I tried work well with this archive. I'm using async_zip
version 0.0.8 from crates.io, rust 1.62.
It would be nice if one could pass Path
s instead of String
s to the fs
based reader
Unable to read an archive using the code
let mut file = tokio::fs::File::open(zip_file).await.unwrap();
let mut zip = ZipFileReader::new(&mut file).await.unwrap();
let entry = zip.file().entries().get(0).unwrap().clone();
let mut string = String::new();
let mut reader = zip.entry(0).await.unwrap();
let txt = reader.read_to_string_checked(&mut string, entry.entry()).await.unwrap();
println!("{}", txt);
Error is
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UpstreamReadError(Custom { kind: InvalidData, error: "stream did not contain valid UTF-8" })'
This happens on a simple zip file with two files (a.txt and b.txt) having content
01234567890123456789012345678901234567890123456789
Extracting the zip file without crc check shows the content of a.txt as
45678901234567890123456789
PK��
�����XC0V�4�^3���3�
Looks like a regression in v0.0.10. Works correctly in v0.0.9
As per: https://users.rust-lang.org/t/stream-data-in-compress-and-stream-out/72521/6
At current, the stream writer implementation calls shutdown() on the compression writer when attempting to close the entry.
This is done as the upstream compression crate does not implement any way to finish the encoding without calling shutdown().
However, their implementation of the shutdown also polls the inner writer's shutdown, thus chaining the shutdown call up to the generic writer.
I don't see any way to avoid this shutdown call, but feel free to comment below if you do.
So for the time being, an inner type wrapping the generic writer should be implemented which ignores calls to poll_shutdown().
Likely want to be implemented within: https://github.com/Majored/rs-async-zip/blob/main/src/write/compressed_writer.rs
Both of them takes a mutable alias to self
and an immutable reference to ZipEntry
:
pub async fn read_to_end_checked(&mut self, buf: &mut Vec<u8>, entry: &ZipEntry) -> Result<usize>;
pub async fn read_to_string_checked(&mut self, buf: &mut String, entry: &ZipEntry) -> Result<usize>;
The mutable alias to self
can only be obtained if we also hold a mutable alias to ZipFileReader
.
But to hold an immutable reference to ZipEntry
, we need to hold an immutable reference to ZipFileReader
.
Thus, this API is unusable.
When a data descriptor is used (primarily for streaming compression), async_zip currently does correctly handle CRC, but it does not do anything particular to the size information in local file headers, resulting in zero compressed & uncompressed size. This should be either documented, or preferably handled by the public interface so that ZipEntry::[un]compressed_size
returns Option<u32>
or similar.
StoredZipEntry::data_offset
computes the position of the compressed data for a particular entry by taking the position of the local file header and adding the header length and the trailing data length:
Lines 159 to 165 in 07721c3
Unfortunately, the calculation of the trailing data length here is incorrect. Specifically, it's using the length of self.entry.extra_field()
which is based on the extra_field
in the central directory record. extra_field
can have different lengths in the local file header and in the central directory record.
This leads to data_offset
returning the wrong position and causes errors decompressing files. As far as I know, the only correct way to find data_offset
is by reading the local file header to find the length of the extra_field
following it.
(And if you're doing that, you can also read the length of the filename field from the local file header as well)
It would be useful to have a changelog explaining the breaking changes between versions.
The method ZipEntry::dir
, which checks whether the entry represents a directory, was removed in e4a0aa5. It's easy enough to reimplement as entry.filename().ends_with('/')
but I'm curious why it was removed in the first place. If the removal was an oversight, it would be nice if it could be added back since it's convenient to have.
The current code first searches for the EOCDR
for the zip file and then searches for a zip64 EOCD locator
:
Lines 40 to 53 in 7808bcd
As far as I can tell, this repeated search is unnecessary because the zip64 EOCD locator
is of a fixed size and immediately precedes the EOCDR
:
4.3.6 Overall .ZIP file format:
[local file header 1]
[encryption header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[encryption header n]
[file data n]
[data descriptor n]
[archive decryption header]
[archive extra data record]
[central directory header 1]
.
.
.
[central directory header n]
[zip64 end of central directory record]
[zip64 end of central directory locator]
[end of central directory record]
I think that a better approach may be to:
EOCDR
as normal (using the existing search code)EOCDR
. If the signature matches the zip64 EOCD locator
signature, then that's our locator. Otherwise the zip file is not a zip64The current implementation has a few issues:
zip64 EOCD locator
signature is in the comment at the end of the EOCDR
, that can cause incorrect parsingEOCDR
has the largest possible comment, the zip64 EOCD locator
will not be found even if it exists (because of the way the searches are implemented)EOCDR
and zip64 EOCD locator
) are implemented on top of locate_record_by_signature
which searches up to the last ~64kb of a zip file. This can cause a repeated unnecessary search of the last 64kb which can be a problem if reads to the underlying reader are expensive.cc @skairunner
Hello,
I migrated to 0.0.13. Since this release an EntryStreamWriter constructed from a async_zip::tokio::write:: ZipFileWriter doesn't implement tokio::io::AsyncWrite anymore... It is annoying when trying to use it in the tokio ecosystem...
Best regards and thanks for this useful crate.
Hi!
This minimal example:
// [dependencies]
// async_zip = "0.0.6"
// tokio = { version = "1" features = ["full"] }
use async_zip::write::{EntryOptions, ZipFileWriter};
use async_zip::Compression;
use tokio::io::AsyncWriteExt;
#[tokio::main]
async fn main() {
let mut output_file = tokio::fs::File::create("ö.zip").await.unwrap();
let mut output_writer = ZipFileWriter::new(&mut output_file);
let filename = "öäöääö.txt".to_string();
let entry_options = EntryOptions::new(filename, Compression::Stored);
let mut entry_writer = output_writer
.write_entry_stream(entry_options)
.await
.unwrap();
let data = "hello öäöääö".to_string();
tokio::io::copy(&mut data.as_bytes(), &mut entry_writer)
.await
.unwrap();
entry_writer.close().await.unwrap();
output_writer.close().await.unwrap();
output_file.shutdown().await.unwrap();
}
Will yield ö.zip
but öäöääö.txt
's filename will be broken. The contents are correct.
Using the regular zip crate:
// [dependencies]
// zip = "0.5.13"
use std::io::Write;
use zip::write::FileOptions;
fn main() {
let path = std::path::Path::new("ö.zip");
let file = std::fs::File::create(&path).unwrap();
let mut zip = zip::ZipWriter::new(file);
let options = FileOptions::default()
.compression_method(zip::CompressionMethod::Stored)
.unix_permissions(0o755);
zip.start_file("öäöääö.txt", options).unwrap();
zip.write_all("hello öäöääö".to_string().as_bytes())
.unwrap();
zip.finish().unwrap();
}
Does not produce any problems
Edit: I'm not sure this is an issue. I should probably be sanitizing file names before I pass them to EntryWriter!!
Sorry for creating the issue prematurely. please delete me :)
why?
In some zip files with large central directories, the process of initially loading the zip file can cause lots of small reads (relevant code). This can be quite slow if reads to the underlying reader are expensive.
Since we already know the size of the central directory before we start parsing it (and we know we're going to read the whole thing), we could just read it all at once.
This should be as simple as reading eocdr.size_cent_dir
bytes into a temporary buffer and passing that to crate::read::cd
instead of passing &mut reader
as below.
Lines 40 to 44 in 07721c3
An alternative is for users to wrap their reader in a BufReader before passing it to ZipFileReader
, but this isn't ideal as CompressedReader
in this library already uses a BufReader internally. Also, users likely don't have knowledge of the central directory sizes of their files beforehand so they can't tune buffer sizes.
I can make a pull request with the solution I described above if that sounds good to you
Thanks!
It would be nice if there was a trait that abstracted over the different implementations of readers, such that I can implement generically over them.
I'm finding is a bit hard to use this library when I need to parse more than a single zip given a single &'static str value. This library seems to work great but only the the case were you have a hard coded filename. As soon as you try to read input from a user interface you run into lifetime errors. not being able to create a &'static str from owned String types.
This would go away if filename were and owned type. Though I could be missing something. If it is possible today can you share some examples were collecting a list of paths of even a single path from user input is possible?
Hey,
I tried to read a simple UTF8 txt file from a zip archive.
use async_zip::read::fs::ZipFileReader;
use tokio::io::AsyncReadExt;
#[tokio::main]
async fn main() {
let zip = ZipFileReader::new("resources.zip".to_string()).await.unwrap();
let (index, entry) = zip.entry("content.txt").unwrap();
let mut entry_reader = zip.entry_reader(index).await.unwrap();
let file_length = entry.uncompressed_size().unwrap();
let mut buffer = String::with_capacity(file_length as usize);
let bytes_read = entry_reader.read_to_string(&mut buffer).await.unwrap();
println!("{}:{}", bytes_read, buffer);
}
I get an error that the file is not utf8-encoded:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: "stream did not contain valid UTF-8" }', src\main.rs:13:69
So I just did a hack to see what bytes actually are inside the file:
use async_zip::read::fs::ZipFileReader;
use tokio::io::AsyncReadExt;
#[tokio::main]
async fn main() {
let zip = ZipFileReader::new("resources.zip".to_string()).await.unwrap();
let (index, entry) = zip.entry("content.txt").unwrap();
let mut entry_reader = zip.entry_reader(index).await.unwrap();
let file_length = entry.uncompressed_size().unwrap();
let mut buffer = Vec::with_capacity(file_length as usize);
let bytes_read = entry_reader.read_to_end(&mut buffer).await.unwrap();
println!("{}:{}", bytes_read, unsafe{ std::str::from_utf8_unchecked(&buffer) });
}
The console output is the following:
30:
vETF]�:� � � $
Is there something I'm doing wrong? It seems really weird.
In case you are interested, I can send you the zip file.
Hi,
Thanks for the library! I want to construct a ZipDateTime
but do not want to use chrono
as I already use time
directly.
I am porting from synchronous code that uses zip
and previously I could use this function but I can't see an equivalent in ZipDateTime
.
Am I missing something?
Thanks 🙏
First of all, thanks for the great crate!!
I'm currently trying to use this crate to write a zip compressed stream to a tokio::DuplexStream. This currently fails because DuplexStream
implements poll_shutdown
by not allowing any more writes to the stream. This is problematic as calling close
on the ZipFileWriter
calls shutdown()
to finalize the compression encoder for each file added to the archive, which means that writes after the first call to close
all fail.
I don't think DuplexStream
is wrong here to not allow writes after being shutdown, the Tokio documentation for Shutdown()
on the AsyncWrite
trait mentions that after shutdown succeeds, "the I/O object itself is likely no longer usable." (https://docs.rs/tokio-io/0.1.13/tokio_io/trait.AsyncWrite.html#return-value)
From what I can tell, Shutdown
is called here so that the async_compression encoder can finalize the encoding. I guess this is mostly OK for that use case (though I see someone has raised an issue about adding another way of finalizing - Nullus157/async-compression#141) but in this case it seems like it will stop this library working at all.
I guess it's fortunate that the tokio::File
and tokio::Cursor<Vec<u8>>
implementations of shutdown
just call flush
(https://github.com/tokio-rs/tokio/blob/702d6dccc948b6a460caa52da4e4c03b252c5c3b/tokio/src/fs/file.rs#L702 and https://github.com/tokio-rs/tokio/blob/702d6dccc948b6a460caa52da4e4c03b252c5c3b/tokio/src/io/async_write.rs#L376) so it's not so much of an issue for those two writers (at least for now, maybe the implementations will change in the future to actually shut down?).
For the time being I've worked around this by wrapping DuplexStream
to do a flush instead of a shutdown, but it's not ideal:
struct DuplexStreamWrapper(DuplexStream);
impl AsyncWrite for DuplexStreamWrapper {
fn poll_write(
mut self: Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
buf: &[u8],
) -> std::task::Poll<Result<usize, io::Error>> {
Pin::new(&mut self.0).poll_write(cx, buf)
}
fn poll_flush(mut self: Pin<&mut Self>, cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), io::Error>> {
Pin::new(&mut self.0).poll_flush(cx)
}
fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), io::Error>> {
Pin::new(&mut self.0).poll_flush(cx)
}
}
Would be happy to help out with a fix but thought I'd raise this before trying anything in case there's a better way of going about what I'm trying to do. I'm not sure if this problem should/could be fixed/worked-around in this crate, or if a change needs to be implemented in async_compression to allow finalizing the encoder without shutting down the writer first.
https://github.com/SylvKT/sylv-api/blob/1f5ffd63bf0c95c5c2c16b4b54b4c92ea55ef981/src/task/retrieve_jar.rs#L109
This fails to read the zip file here with this error:
ZipError(UnableToLocateEOCDR)
I'm not sure why this is happening or what I can do about it, but it opens fine in Ark.
Linux 6.3.6-arch1-1
So, I realize this isn't in the realm of responsibility for a zip archive library, but I am trying to open a p4k file (from star citizen) using your lib and getting an unexpected header. My research indicates (even from cryengine's own developer docs) that a crypak/p4k file should just be a zip with some zstd or deflate compression and some files encrypted while others are not even compressed. From what I have been able to find, I should be able to just list the files with standard zip capability, and yet I am getting this unexpected header before I even try to extract.
Finished dev [unoptimized + debuginfo] target(s) in 4.70s
Running `target\debug\scprospector.exe`
[src\lib.rs:6] &file = tokio::fs::File {
std: File {
handle: 0x00000000000000dc,
path: "\\\\?\\C:\\Users\\David\\repos\\scprospector\\Data.p4k",
},
}
file open; trying to open as zip
Error: Encountered an unexpected header (actual: 0x99df8764, expected: 0x2014b50).
error: process didn't exit successfully: `target\debug\scprospector.exe` (exit code: 1)
Looking at some other open source code that successfully opens the file in dotnet and another in python, those authors have some kind of encryption key they seem to open it with. Maybe this is happening because async_zip doesn't support encryption yet? I'm trying to find out what kind of encryption it is. Maybe ZipCrypt, but I'm not sure.
Obviously this isn't something I expect you to solve, but I wondered if you have any insight into why this might happen. If not, that is ok too. Thanks!
Looks like the dependency async_io_utilities was yanked from crates.io, resulting in this crate no longer compiling as a dependency in other crates.
error: no matching package named `async_io_utilities` found
location searched: registry `crates-io`
required by package `async_zip v0.0.9`
... which satisfies dependency `async_zip = "^0.0.9"` of package ...
Hi, I've been using the crate for copying directories over a zip stream and I noticed that the read::stream::ZipFileReader
fails in case the compressed stream contains a file that is already compressed by deflate method.
use async_zip::read::stream::ZipFileReader;
use tokio::fs::File;
#[tokio::main]
async fn main() {
// Fails when file contains already compressed file
// e.g. epub, odt, zip
let mut file = File::open("with-zipfile.zip").await.unwrap();
let mut zip = ZipFileReader::new(&mut file);
while !zip.finished() {
if let Some(reader) = zip.entry_reader().await.unwrap() {
let entry = reader.entry();
println!("{:?}", entry.name());
let mut out = vec![];
reader.copy_to_end_crc(&mut out, 1024).await.unwrap();
}
}
}
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UpstreamReadError(Custom { kind: Other, error: DecompressError(General { msg: None }) })', src/main.rs:16:47
stack backtrace:
0: rust_begin_unwind
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
1: core::panicking::panic_fmt
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
2: core::result::unwrap_failed
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1616:5
3: core::result::Result<T,E>::unwrap
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1298:23
4: test_zip::main::{{closure}}
at ./src/main.rs:16:13
5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/future/mod.rs:80:19
6: tokio::park::thread::CachedParkThread::block_on::{{closure}}
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/park/thread.rs:267:54
7: tokio::coop::with_budget::{{closure}}
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/coop.rs:102:9
8: std::thread::local::LocalKey<T>::try_with
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/thread/local.rs:399:16
9: std::thread::local::LocalKey<T>::with
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/thread/local.rs:375:9
10: tokio::coop::with_budget
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/coop.rs:95:5
11: tokio::coop::budget
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/coop.rs:72:5
12: tokio::park::thread::CachedParkThread::block_on
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/park/thread.rs:267:31
13: tokio::runtime::enter::Enter::block_on
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/runtime/enter.rs:152:13
14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/runtime/scheduler/multi_thread/mod.rs:79:9
15: tokio::runtime::Runtime::block_on
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.21.0/src/runtime/mod.rs:492:44
16: test_zip::main
at ./src/main.rs:10:5
17: core::ops::function::FnOnce::call_once
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:227:5
And the file I'm trying to send:
$ zipinfo with-zipfile.zip
Archive: with-zipfile.zip
Zip file size: 89746 bytes, number of entries: 2
drwxr-xr-x 2.0 unx 0 bx stor 22-Sep-17 15:50 with-zipfile/
-rw-r--r-- 2.0 unx 89326 bX defN 22-Sep-17 15:50 with-zipfile/olives.zip
2 files, 89326 bytes uncompressed, 89356 bytes compressed: 0.0%
I did some experiments and it seems to happen with many file formats that use deflate (specifically in the "normal" mode), for example: epub or odt.
Why would that matter that the stream contains a file that is also compressed? The DecompressError
suggests that the error is somewhere in the async-compression
flate2 call, but I couldn't narrow down where and why exactly. Some help would be appreciated!
Thanks so much for creating this awesome lib.
I have been trying to archive some executables and I am able to do that but I want to preserve the executable permission. I couldn't find any obvious API to give file perms to the entries written. I am pretty new to rust so excuse me if I missed anything. If there's an API to do this please can you point me towards it?
P.S I also have no idea regarding the zip file format.
Although the standard has supported UTF8
since 2006, many legacy software still create archives with utf8-incompatible metadata. In particular, the Windows Explorer's built-in ZIP archiver encodes filenames with encodings defined by the system locale setting. (For example, CP437
for English, CP936/GBK
for Simplified Chinese, CP950/Big5
for Traditional Chinese. This is still the case on Windows 11.)
A ZIP library should be expected to read and extract archives with utf8-incompatible metadata.
ZipFileReader
as of version 0.0.11 returns UpstreamReadError: stream did not contain valid UTF-8
if the archive has utf8-incompatible metadata.
Lines 149 to 153 in d64187b
ZipFileReader
should be able to read archives with utf8-incompatible metadata.UTF8
flag is set, ZipFileReader
should try to parse its metadata as Rust string.UTF8
flag is not set, ZipFileReader
should either return the raw bytes or try to parse the metadata with a default encoding. In the latter case, it might be useful to also return the raw bytes allowing the caller to try a different encoding.zip-rs
zip-rs
performs a lossy conversion if the entry has a UTF8
flag. Else, it tries to parse the metadata with CP437
.
https://github.com/zip-rs/zip/blob/0dcc40bee0179d9e841622f6c1a2217173b69951/src/read.rs#L683-L690
Python
zipfile
module in Python's standard library has the same behavior as zip-rs
, but allows the caller to provide an optional encoding to replace the default CP437
.
The attached archives contain an empty file with the same name. One is encoded with GBK
, the other with UTF8
.
gbk.zip
utf8.zip
ZipEntryReader
is used in public API of async_zip::read::stream::ZipFileReader
and it was public, but it is now private in 0.0.12
Hi Harry!
This crate is absolutely wonderful, it saved me over a year ago when I was too dumb to figure out async_compression, and your package (back at 0.0.7) wrapped it and did all the dirty work for me.
Now I just bumped the package to 0.0.15, to take advantage of some of the compression algorithms being behind feature flags, and improve compile times, but I noticed a couple things:
Compat
thing. Which I did not need before.I was hoping you wouldn't explaining this Compat
situation and why it is now needed. In the process I hope to gain a better understanding of the rust async ecosystem! Thank you!!
For reference, here is my function that uses your library, and the comments IN CAPS detail the changes i had to make to bump to 0.0.15
#[get("/download")]
pub async fn directories_download(
db_pool: web::Data<DbPool>,
blob_storage: web::Data<BlobStorage>,
query: web::Query<DownloadDirectoryRequest>,
id: Identity,
) -> StreamingResponse<ReaderStream<impl AsyncRead>> {
tracing::Span::current().record("query", query.as_value());
let user_id =
require_user_login(id).map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?;
// Prepare a stream that will receive the compressed bytes
let (mut compressed_tx, compressed_rx) = tokio::io::duplex(1024);
// I NEEDED TO `.compat()` THIS
let compressed_tx = compressed_tx.compat();
// Get a list of hashes and paths that we need to compress
let files_to_zip = {
let mut files = directories_get_children_hashes_and_paths_for_directory_download(
db!(db_pool),
user_id,
query.directory_entry_id,
)
.await?;
if query.deduplicate == Some(true) {
deduplicate(&mut files)
};
files
};
tokio::spawn(async move {
// Prepare our ZipFileWriter
let mut zip_archive = ZipFileWriter::new(compressed_tx);
for (hash, path) in files_to_zip {
let (mut uncompressed_tx, mut uncompressed_rx) = tokio::io::duplex(1024);
let mut entry_writer = zip_archive
.write_entry_stream(ZipEntryBuilder::new(path.into(), Compression::Deflate))
.await
.expect("Couldn't create an EntryStreamWriter")
// NEEDED `.compat_write()` HERE
.compat_write();
// Begin streaming into the channel
let blob_storage = blob_storage.clone();
tokio::spawn(async move {
blob_storage
.retrieve_file_streaming(&hash, &mut uncompressed_tx)
.await
.expect("blob storage could not retrieve file");
});
// Copy from channel into the entry_writer
tokio::io::copy(&mut uncompressed_rx, &mut entry_writer)
.await
.expect("couldn't copy unompressed bytes into the EntryStreamWriter");
// // finalize this file's compression
entry_writer
// NEEDED TO GET THE `EntryStreamWriter` BACK OUT
// TO BE ABLE TO `.close()` IT
.into_inner()
.close()
.await
.expect("couldn't shutdown the EntryStreamWriter");
}
// When all uncompressed_streams have completed we can close off
// the ZipFileWriter
zip_archive
.close()
.await
.expect("couldn't close the zip file");
});
Ok(StreamingBody(ReaderStream::new(compressed_rx)))
}
`
Do you have a planned timeline for the next release?
Also, have you considered publishing it as 0.1.0 (instead of 0.0.10) since this is a big change from previous releases?
I'm currently migrating from zip to async_zip because my program uses tokio IO everywhere, except for in zip files. I'm using the GitHub main branch, not crates.io because of #64. Here is the function I wrote for extracting a given reader to an output directory:
/// Extract the `input` zip file to `output_dir`
pub async fn extract_zip(
input: impl AsyncRead + AsyncSeek + Unpin,
output_dir: &Path,
) -> Result<()> {
let mut zip = ZipFileReader::new(input).await?;
for i in dbg!(0..zip.file().entries().len()) {
dbg!(i);
let entry = zip.file().entries()[i].entry();
let path = output_dir.join(entry.filename());
if entry.dir() {
create_dir_all(&path).await?;
} else {
if let Some(up_dir) = path.parent() {
if !up_dir.exists() {
create_dir_all(up_dir).await?;
}
}
copy(&mut zip.entry(i).await?, &mut File::create(&path).await?).await?;
}
}
Ok(())
}
It's almost identical to your example extractor, minus the sanitation because I trust the download sources (for now at least).
I was getting deflate decompression errors at seemingly random places. So I tried debugging it by printing out the indices and the total length (as shown in the code) and I came to a weird conclusion. It seems decompression fails at around 15%-20% of the total length. I have no idea what's going on, and thanks in advance for any help.
In 0.0.9, ZipEntryReader
provides method to manually check crc so that we can use AsyncRead
impl to read from it and then check the crc.
With 0.0.10, this is no longer possible.
It may be a good idea to add a few more notes to your list of caveats about decompressing from a non-seekable stream.
From Wikipedia:
Because ZIP files may be appended to, only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives), as the central directory may declare that some files have been deleted and other files have been updated.
Maybe consider adding a note about deleted and updated files to the list here:
rs-async-zip/src/read/stream.rs
Lines 17 to 28 in 6bca65b
Hello, I'm trying to use your lib for the first time and just using the basic examples in your docs for reading. I'm getting:
thread 'main' has overflowed its stack
error: process didn't exit successfully: `target\debug\scprospector.exe` (exit code: 0xc00000fd, STATUS_STACK_OVERFLOW)
The zip file was created by 7zip for windows and is attached.
test.zip
How can I get this working?
use scprospector::print_p4k_contents;
#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
print_p4k_contents().await?;
Ok(())
}
use async_zip::read::seek::ZipFileReader;
use tokio::fs::File;
pub async fn print_p4k_contents() -> Result<(), anyhow::Error> {
let mut file = File::open("scprospector.zip").await?;
dbg!(&file);
println!("file open; trying to open as zip");
let mut zip = ZipFileReader::new(&mut file).await?;
println!("zip open; trying to read first entry");
let reader = zip.entry_reader(0).await?;
println!("first entry read; getting crc");
let txt = reader.read_to_string_crc().await?;
println!("{}", txt);
Ok(())
}
It crashes at line let mut zip = ZipFileReader::new(&mut file).await?;
.
an alternative API to write_entry_stream
:
Helpful if the writer ever needs to be stored due to the inherent issues with self referential structs.
sketched out:
impl ZipFileWriter {
async fn to_entry_stream<E: Into<ZipEntry>>(self, entry: E) -> Result<OwnedEntryStreamWriter, ZipFileWriter> {
...
Ok(OwnedEntryStream { outer: self, ... })
}
}
impl OwnedEntryStreamWriter {
async fn to_zip_writer(self) -> Result<ZipFileWriter, OwnedEntryStreamWriter> {
self.close().await?;
Ok(self.outer)
}
}
The ZipEntryReader::copy_to_end_crc
method panics when working with a specific zip file. I don't know whether the file is standards-compliant. If the cause of the bug is that it isn't, it should be expressed as a returned error instead of a panic.
Cargo.toml:
[package]
name = "async-zip-unwrap-min"
version = "0.1.0"
edition = "2021"
[dependencies]
async_zip = "=0.0.3"
[dependencies.tokio]
version = "=1.14.0"
features = ["macros", "rt-multi-thread"]
main.rs:
use {
async_zip::read::stream::ZipFileReader,
tokio::{
fs::File,
io,
},
};
#[tokio::main]
async fn main() {
let mut zip_file = File::open("BizHawk-2.7-win-x64.zip").await.expect("panic not happening here");
let mut zip_file = ZipFileReader::new(&mut zip_file);
while let Some(entry) = zip_file.entry_reader().await.expect("panic not happening here") {
entry.copy_to_end_crc(&mut io::sink(), 64 * 1024).await.expect("panic not happening here");
}
}
The file BizHawk-2.7-win-x64.zip
is taken from https://github.com/TASEmulators/BizHawk/releases/tag/2.7.
Output:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: DecompressError(General { msg: None }) }', C:\Users\fenhl\.cargo\registry\src\github.com-1ecc6299db9ec823\async_zip-0.0.3\src\read\mod.rs:176:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Not all bytes of this reader were consumed before being dropped.', C:\Users\fenhl\.cargo\registry\src\github.com-1ecc6299db9ec823\async_zip-0.0.3\src\read\mod.rs:208:13
stack backtrace:
(stack backtrace for the 2nd panic omitted since that's not the bug being discussed here)
I've been impacted quite a lot by the removal of the entries()
iterator and ZipFileReader which now asks for an owned Vec.
But the first one is a bit more important to me, in my case i work with nested zip files which i need to search by matching the name, with the entries()
api i could just use the find()
iterator and get the index.
Something like this was possible before:
let (file_index, zip_entry) = zip.entries().iter().find_map(|entry| { .... })?;
With the second one i guess i can get away with a .to_vec() and call it a day (maybe i'm leaving a bit of performance on the table).
Anyway you could bring it back?
Thanks in advance for your work and for your help, this is a really nice lib overall.
I'm not super familiar with compression in general and zip specifically.. Would it be possible to have an AsyncSeek
implementation through which one could skip the first N (decompressed) output bytes and start reading from that point?
cargo-binstall just experienced one failure due to this not supported in streaming mode.
The error message says it will get supported soon, so I wonder if there is a timeline for this, i.e. if it's going to be supported in the next release and how long it will take.
If it will take quite some time for it to be supported, we might switch to file-based API.
This is just an issue asking for the timeline, not asking for a new feature, @Majored please don't feel pressured.
Windows may creates zip file with deflate64 compression method but this library doesn't support deflate64 so decompressing such zip file.
I want to decompress such a zip file so I want rs-async-zip to support deflate64.
Depends on Nullus157/async-compression#237
I have a use case where I'm unzipping one file and putting the files into a subdirectory of another zip.
It would be great if I could take the ZipEntry
from one, convert it to a ZipEntryBuilder
and call .filename
Currently, the workaround I have to use involves reading all the information from the ZipEntry
and constructing a new one from scratch, since the constructor is the only way to set the filename.
rs-async-zip/src/read/io/combined_record.rs
Lines 41 to 43 in 7808bcd
I believe this should be combined.directory_size = zip64eocdr.directory_size
cc @skairunner
From async_zip::read::stream
#considerations:
The inability to read internally stored ZIP archives when using the Stored compression method.
What does "internally stored ZIP archives" mean?
Support other runtimes by:
tokio::io::{AsyncRead, AsyncWrite, AsyncSeek}
and their extensions to those found inside the futures
cratetokio
feature to the futures-io
feature of async_compression
tokio
that mirrors the main API, but using tokio_utils::compat::Compat
to operate on Tokio's types instead
fs
module would be exclusive to this tokio
moduleCompat
does not provide an implementation of AsyncSeek
. A small newtype wrapper implementing AsyncSeek
could be included in this crate for internal use. Smol's async-compat provides an implementation, but it also tampers with tokio's runtime, so it might not be the best idea.async_io_utilities
I'll be modifying the crate for use with my project that uses async-std, so if this is something you're interested in, I'd be happy to polish the API and submit a PR.
Just a suggestion - when merging a PR, the "Squash and Merge" option tends to lead to a cleaner git history - especially if the individual commits within a pull request aren't particularly descriptive. This takes all the commits in a PR, squashes them into one named after the PR title, and puts it on top of main
. You can also configure this to include the PR description.
If you do want to keep all individual commits from a PR, the "Rebase and Merge" option will stack them all on top of main
instead of potentially interleaving them with other commits. This isn't the case for merge commits. For example, some commits from #66 are displayed before 28a932f even though you merged #66 after 28a932f. "Rebase and Merge" avoids this.
Both "Squash and Merge" and "Rebase and Merge" make the git history more readable by avoiding having several merge commits on the main branch. I'd recommend using either of those instead of merge commits.
Why is it that only the tokio::read::seek
module has a ZipFileReader
that returns tokio::read::ZipEntryReader
s? That means that one has to wrap the entry reader(s) in tokio_util::compat::Compat
to use them in a tokio context.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.