Comments (10)
The same requirement for snappy support Big Endian.
from aircompressor.
Correct, this code does not support big endian. Based on our experimentation, adding big endian support would either make the code very complex, or slow due to new abstractions. More importantly, to the best of our knowledge there are no public CI resources we could use to test on big endian machines. Without the ability to test the software, I would not be comfortable even attempting this. Finally, there are very few big endian CPUs anymore. To the best of my knowledge, the only little endian CPU in modern usage is PowerPC, and it isn't very popular, so it would be difficult to justify the work to add support for it (please correct me if I'm wrong).
from aircompressor.
The code is already 100% complete (done by myself) for xerial/snappy-java, and is zero overhead for existing users as it simply swaps in the classes dynamically (to support the byte swapping) on the constructor class via snappy-loader. If you took the same approach, there is zero overhead for existing users.
For other users, the performance is infinitely better than current, as currently there is no Java support.
As you do not use the class loader interface as used by lib-snappy, an IF THEN ELSE statement around each rawCompress/rawDecompress to select different method would be required.
This would be the only difference to the lib-snappy Pure Java implementation.
ie The class loading removes the need for the IF THEN ELSE on each low level API call, to decide with to to byte swap method or not.
You may want to change your main SnappyCompress/SnappyDecompress Interfaces to do the same as snappy-loader (this would be your only change)
Big Endian is used by s390x, which is used extensively for most large financial institutions such as Banks, and is very widespread. This is one of the platforms that snappy-java tests with their DevOps CI pipeline via Linux on Z( for which free servers are available), albeit currently they are testing only with the JNI implementation.
Projects like yours are a bridge to cloud migrations.
I am involved is such a multi-million dollar migration from s390x to GCP Avro with snappy-compression.
I am only offering this as a courtesy, as we don't need to need or use your project at all.
Do what you want.
from aircompressor.
I am working on OpenShift on s390x arch to enable metering operator. I have the same option with kiwi1969.
from aircompressor.
@kiwi1969 Hi , Have you did the same work in aircompressor? If you did, could you please request one PR like snappy-java? Thanks.
from aircompressor.
I have submitted a pull request #117 and put in the equivalent code I did for snappy-java project.
The only difference is that because we don't have snappy loader in air compressor project, I created alternative mainline classes for BE users to select. ie SnappyCompressBE() and SnappyDecompressBE() etc
This way the little endian people are completely isolated
It does however mean that the end-user currently has to have 'if' statements around their API calls to test for BigEndian
from aircompressor.
@dain I am also interested in big-endian support. Again for the IBM Z (s390x) platform. The Spark project pulls in this code via the Orc project which leads to breakage. A few thoughts:
More importantly, to the best of our knowledge there are no public CI resources we could use to test on big endian machines.
Since you are using Travis CI I think you can just add the s390x architecture to your YAML file to get big-endian coverage. There are some instructions here: https://docs.travis-ci.com/user/multi-cpu-architectures/.
Based on our experimentation, adding big endian support would either make the code very complex, or slow due to new abstractions.
Looking at the code it might not be too bad. There are only a few places where a little-endian integer is really required and these aren't necessarily in performance sensitive code. {get,put}LittleEndian{Int,Long}
methods used judiciously might be ok, especially if the JIT optimizer is able to inline them and remove any dead branches.
@kiwi1969 I'd be inclined to avoid adding a new API to be honest. The API would need to be maintained and as you say would require changes at the consumer side. This would likely be a problem especially for established projects.
from aircompressor.
I initially had all the code inline, but separated it only because people said they didn't want to impact performance of LITTLE_ENDIAN by requiring additional IF statements to select the correct ENDIAN.
For the 'Snappy-Java' project, it worked out that we can just swap in the JNI/Little_Endian/BIG_ENDIAN Interface dynamically, so separate classes seemed the way to go (performance wise).
It is unlikely the compression code itself is going to change much in the future, so having 2 versions is a little of a hassle, but but really that bad in terms of maintainability.
Possible alternate solutions could be could be
a) Add the IF statement within new HOLEINT() etc methods to conditionally do the reversebytes() method, which would closer match the original C++ code (this would lead to a small performance drop.)
b) Add a dynamic interface loader such as from snappy-java project, to simulate a JNI call via Java Interface to all the methods. This would mean the alternate methods are mapped dynamically for no cost in performance. I actually like this method, as then we closer match the snappy-java project.
I am happy with whatever people deem best for the project though
from aircompressor.
FYI - I updated my changes for pull #117 to simply do littleendian() tests in specific places of SnappyRawCompressor and SnappyRawDecompressor. There seemed to be no noticeable drop in performance, so think this is the best. Just waiting for that pull to be reviewed/accepted
from aircompressor.
This project is focused on providing fast/efficient implementations of the compression algorithms, and that is currently not possible if we try to support different endiannesses. Additionally, we are not aware of any public CI infrastructure that supports big endian, so we could not test this change even if we wanted to suppor big endian. As we drop support for older JVMs we will evaluate new technology like VarHandles and MemorySegments, and those technonogies can transparently handle different endiannesses.
from aircompressor.
Related Issues (20)
- Snappy Decompress throw MalformedInputException HOT 2
- Zstd Decompress does support files compressed by newer C version? HOT 6
- Implement other ZSTD compression levels
- Lz4 double buffering
- compile and generate class files for the given Java implementaion. HOT 1
- ZstdOutputStream writes empty stream HOT 2
- Support Zstd seekable format
- Hive fails opening split for zst compressed files
- ZStd JNI vs Aircompressor pure java performance question HOT 3
- The result is different HOT 1
- Add `SECURITY.md` file and enable vulnerability reporting HOT 2
- Set up OSS-Fuzz
- Decompression: How to determine the size of the output buffer? HOT 1
- Remove dependency on sun.misc.Unsafe? HOT 1
- May I ask if Android is usable? HOT 1
- Security vulnerability: Snappy decompressor can be made to crash JVM HOT 5
- Compression ratio is different in ZSTD algorithm between ZstdOutputStream and ZstdCompressor.compress(Bytebuffer) HOT 2
- Support lz4 framing decompression HOT 1
- ZSTD : Drive the end of decompression with both "inputLimit" and "outputLimit"
- Request: Generic Decompressor for all algos based on magic number header
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aircompressor.