Comments (11)
The formula matches the one in the reference implementation, so this must be a bug somewhere else. It would help a lot to have a dataset that reproduces the problem.
#define ZSTD_COMPRESSBOUND(srcSize) ((srcSize) + ((srcSize)>>8) + (((srcSize) < (128<<10)) ? (((128<<10) - (srcSize)) >> 11) /* margin, from 64 to 0 */ : 0))
from aircompressor.
I've been able to reproduce this locally. I'm investigating.
from aircompressor.
Thanks @miniway for the report!
Can you perhaps try to reproduce this in a synthetic way, eg using tpch data as the input?
That would also help verify the problem still exists in the latest release.
from aircompressor.
Unfortunately I could not easily reproduce the issue, but my gut feeling is it doesn't seem to happen at 0.15 but it happens rarely at 0.18
from aircompressor.
Hi, All. Apache ORC community also is blocked by this issue. This causes UT failures consistently.
Apache ORC community has been using 0.16 without this issue. New 0.17 and 0.18 seems to have this issue.
The relevant change of airlift is this commit.
Caused by: java.lang.IllegalStateException: Overflow detected
at io.airlift.compress.zstd.Util.checkState(Util.java:59)
at io.airlift.compress.zstd.BitOutputStream.close(BitOutputStream.java:85)
at io.airlift.compress.zstd.HuffmanCompressor.compressSingleStream(HuffmanCompressor.java:130)
at io.airlift.compress.zstd.HuffmanCompressor.compress4streams(HuffmanCompressor.java:75)
at io.airlift.compress.zstd.ZstdFrameCompressor.encodeLiterals(ZstdFrameCompressor.java:333)
at io.airlift.compress.zstd.ZstdFrameCompressor.compressBlock(ZstdFrameCompressor.java:224)
at io.airlift.compress.zstd.ZstdFrameCompressor.compressFrame(ZstdFrameCompressor.java:172)
at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:145)
at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
at org.apache.orc.impl.AircompressorCodec.compress(AircompressorCodec.java:68)
at org.apache.orc.impl.OutStream.spill(OutStream.java:290)
at org.apache.orc.impl.OutStream.write(OutStream.java:263)
at org.apache.orc.impl.SerializationUtils.writeLongBE(SerializationUtils.java:840)
at org.apache.orc.impl.SerializationUtils.unrolledBitPackBytes(SerializationUtils.java:674)
at org.apache.orc.impl.SerializationUtils.unrolledBitPack32(SerializationUtils.java:644)
at org.apache.orc.impl.SerializationUtils.writeInts(SerializationUtils.java:497)
at org.apache.orc.impl.RunLengthIntegerWriterV2.writeDirectValues(RunLengthIntegerWriterV2.java:370)
at org.apache.orc.impl.RunLengthIntegerWriterV2.writeValues(RunLengthIntegerWriterV2.java:174)
at org.apache.orc.impl.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:804)
at org.apache.orc.impl.writer.IntegerTreeWriter.writeBatch(IntegerTreeWriter.java:87)
at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:704)
... 60 more
from aircompressor.
Thanks for narrowing down the issue to that commit. I’ll take a look soon.
from aircompressor.
Also, do you have a data set that reproduces it consistently? I’d like to investigate whether there’s a bug elsewhere that is being surfaced by that change.
from aircompressor.
Update: Apache ORC upgrades to 0.19 finally. Thanks!
from aircompressor.
aircompression 0.20, The same problem occurred
from aircompressor.
Thank you for reporting, @melin . If you don't mind, could you file an Apache ORC JIRA issue at https://issues.apache.org/jira/projects/ORC with the above information, please?
from aircompressor.
Thank you, @martint !
from aircompressor.
Related Issues (20)
- Snappy Decompress throw MalformedInputException HOT 2
- Zstd Decompress does support files compressed by newer C version? HOT 6
- Implement other ZSTD compression levels
- Lz4 double buffering
- compile and generate class files for the given Java implementaion. HOT 1
- ZstdOutputStream writes empty stream HOT 2
- Support Zstd seekable format
- Hive fails opening split for zst compressed files
- ZStd JNI vs Aircompressor pure java performance question HOT 3
- The result is different HOT 1
- Add `SECURITY.md` file and enable vulnerability reporting HOT 2
- Set up OSS-Fuzz
- Decompression: How to determine the size of the output buffer? HOT 1
- Remove dependency on sun.misc.Unsafe? HOT 1
- May I ask if Android is usable? HOT 1
- Security vulnerability: Snappy decompressor can be made to crash JVM HOT 5
- Compression ratio is different in ZSTD algorithm between ZstdOutputStream and ZstdCompressor.compress(Bytebuffer) HOT 2
- Support lz4 framing decompression HOT 1
- ZSTD : Drive the end of decompression with both "inputLimit" and "outputLimit"
- Request: Generic Decompressor for all algos based on magic number header
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aircompressor.