airlift / aircompressor Goto Github PK
View Code? Open in Web Editor NEWA port of Snappy, LZO, LZ4, and Zstandard to Java
License: Apache License 2.0
A port of Snappy, LZO, LZ4, and Zstandard to Java
License: Apache License 2.0
Hi,
Please remove the below exception. It is ok to reset state that is not existing :)
Otherwise your SnappyCodec will not work with Hadoop's SequenceFile implementation.
java.lang.UnsupportedOperationException: resetState not supported for Snappy
at io.airlift.compress.snappy.HadoopSnappyInputStream.resetState(HadoopSnappyInputStream.java:81)
at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:2134)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2217)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:78)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
presto-product-tests/conf/docker/singlenode/compose.sh up
, i.e. hdp2.6-hive
)yum install -y lzop
abc\n
and compress it with lzop -o output.lzo inputfile
format = 'TEXTFILE'
in Presto, add the file to it.m2/repository/org/anarres/lzo/lzo-core/1.0.5/lzo-core-1.0.5.jar
and .m2/repository/org/anarres/lzo/lzo-hadoop/1.0.5/lzo-hadoop-1.0.5.jar
to classpath and enable the codec in site.xmlCaused by: java.io.IOException: Unsupported LZO flags 50331649
at io.airlift.compress.lzo.HadoopLzopInputStream.<init>(HadoopLzopInputStream.java:93)
at io.airlift.compress.lzo.LzopCodec.createInputStream(LzopCodec.java:91)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:122)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at io.prestosql.plugin.hive.HiveUtil.createRecordReader(HiveUtil.java:220)
... 20 more
I'm getting this error when I try to build:
Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.1.13:revision (default) on project aircompressor: .git directory could not be found! Please specify a valid [dotGitDirectory] in your pom.xml
Hi all, yesterday I ran a compressor comparison through the benchmark tool provided in our test code. The below table shows a part of the result, a more detailed log can be found at the end of this comment.
comparison ratio | compress | decompress |
---|---|---|
airlift-snappy/xerial-snappy | 2.294 | 0.706 |
airlift-lz4/jpountz-lz4 | 0.86 | 1.07 |
airlift-lzo/hadoop-lzo | 1.92 | 2.6 |
airlift-zstd/luben-zstd | 0.998 | 1.08 |
Any comments are welcome.
The full table is here: compress_log.xlsx
The full runlog of the benchmark is here: compressor.log
What do you think about setting up OSS-Fuzz for this project?
Given that aircompressor uses sun.misc.Unsafe
quite a lot, it is probably important that all of this usage is safe since otherwise the JVM could crash, or worse. OSS-Fuzz might be able to help find issues with that.
I assume aircompressor fulfills the requirements to be included into OSS-Fuzz, but that can also be clarified with the maintainers beforehand.
For Zstd it might be necessary to add a hook to disable checksum verification, otherwise fuzzing might not be that effective there, see jazzer documentation for some information. I don't have any experience with that yet.
Fuzzing with OSS-Fuzz / jazzer might not support detecting out-of-bounds Unsafe
reads and writes yet though, see CodeIntelligenceTesting/jazzer#891.
warning on java 9:
(probably known as the class is called UnsafeUtil)
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by io.airlift.compress.snappy.UnsafeUtil (.../repository/io/airlift/aircompressor/0.9/aircompressor-0.9.jar) to field java.nio.Buffer.address WARNING: Please consider reporting this to the maintainers of io.airlift.compress.snappy.UnsafeUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release
ByteBuf source = ByteBufAllocator.DEFAULT.heapBuffer(1024);
byte[] bytes = new byte[1024];
source.writeBytes(bytes);
ZstdCompressor compressor = new ZstdCompressor();
int maxLength = compressor.maxCompressedLength(source.readableBytes());
ByteBuf target = ByteBufAllocator.DEFAULT.heapBuffer(10240);
compressor.compress(source.nioBuffer(), target.nioBuffer(0, maxLength));
Output:
/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/bin/java -agentlib:jdwp=transport=dt_socket,address=127.0.0.1:51570,suspend=y,server=n -javaagent:/Users/lipenghui/Library/Caches/JetBrains/IntelliJIdea2021.1/captureAgent/debugger-agent.jar -Dfile.encoding=UTF-8 -classpath /Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home/lib/tools.jar:/Users/lipenghui/GitHub/pulsar-test/target/classes:/Users/lipenghui/.m2/repository/org/apache/avro/avro/1.9.2/avro-1.9.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.10.2/jackson-core-2.10.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.10.2/jackson-databind-2.10.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.10.2/jackson-annotations-2.10.2.jar:/Users/lipenghui/.m2/repository/org/apache/commons/commons-compress/1.19/commons-compress-1.19.jar:/Users/lipenghui/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-client-original/2.9.0-SNAPSHOT/pulsar-client-original-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-client-api/2.9.0-SNAPSHOT/pulsar-client-api-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-common/2.9.0-SNAPSHOT/pulsar-common-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-client-admin-api/2.9.0-SNAPSHOT/pulsar-client-admin-api-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/io/swagger/swagger-annotations/1.6.2/swagger-annotations-1.6.2.jar:/Users/lipenghui/.m2/repository/com/google/guava/guava/30.1-jre/guava-30.1-jre.jar:/Users/lipenghui/.m2/repository/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar:/Users/lipenghui/.m2/repository/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/Users/lipenghui/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/Users/lipenghui/.m2/repository/org/checkerframework/checker-qual/3.5.0/checker-qual-3.5.0.jar:/Users/lipenghui/.m2/repository/com/google/errorprone/error_prone_annotations/2.3.4/error_prone_annotations-2.3.4.jar:/Users/lipenghui/.m2/repository/com/google/j2objc/j2objc-annotations/1.3/j2objc-annotations-1.3.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport-native-unix-common/4.1.66.Final/netty-transport-native-unix-common-4.1.66.Final-linux-x86_64.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/cpu-affinity/4.14.1/cpu-affinity-4.14.1.jar:/Users/lipenghui/.m2/repository/io/airlift/aircompressor/0.19/aircompressor-0.19.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-haproxy/4.1.66.Final/netty-codec-haproxy-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/org/eclipse/jetty/jetty-util/9.4.42.v20210604/jetty-util-9.4.42.v20210604.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/dataformat/jackson-dataformat-yaml/2.12.3/jackson-dataformat-yaml-2.12.3.jar:/Users/lipenghui/.m2/repository/org/yaml/snakeyaml/1.27/snakeyaml-1.27.jar:/Users/lipenghui/.m2/repository/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/pulsar-transaction-common/2.9.0-SNAPSHOT/pulsar-transaction-common-2.9.0-SNAPSHOT.jar:/Users/lipenghui/.m2/repository/org/apache/pulsar/bouncy-castle-bc/2.9.0-SNAPSHOT/bouncy-castle-bc-2.9.0-SNAPSHOT-pkg.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-http/4.1.66.Final/netty-codec-http-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-common/4.1.66.Final/netty-common-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-buffer/4.1.66.Final/netty-buffer-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport/4.1.66.Final/netty-transport-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec/4.1.66.Final/netty-codec-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-handler-proxy/4.1.66.Final/netty-handler-proxy-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-socks/4.1.66.Final/netty-codec-socks-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-resolver-dns/4.1.66.Final/netty-resolver-dns-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-resolver/4.1.66.Final/netty-resolver-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-codec-dns/4.1.66.Final/netty-codec-dns-4.1.66.Final.jar:/Users/lipenghui/.m2/repository/org/apache/commons/commons-lang3/3.11/commons-lang3-3.11.jar:/Users/lipenghui/.m2/repository/org/asynchttpclient/async-http-client/2.12.1/async-http-client-2.12.1.jar:/Users/lipenghui/.m2/repository/org/asynchttpclient/async-http-client-netty-utils/2.12.1/async-http-client-netty-utils-2.12.1.jar:/Users/lipenghui/.m2/repository/org/reactivestreams/reactive-streams/1.0.3/reactive-streams-1.0.3.jar:/Users/lipenghui/.m2/repository/com/typesafe/netty/netty-reactive-streams/2.0.4/netty-reactive-streams-2.0.4.jar:/Users/lipenghui/.m2/repository/com/sun/activation/javax.activation/1.2.0/javax.activation-1.2.0.jar:/Users/lipenghui/.m2/repository/commons-codec/commons-codec/1.15/commons-codec-1.15.jar:/Users/lipenghui/.m2/repository/com/yahoo/datasketches/sketches-core/0.8.3/sketches-core-0.8.3.jar:/Users/lipenghui/.m2/repository/com/yahoo/datasketches/memory/0.8.3/memory-0.8.3.jar:/Users/lipenghui/.m2/repository/com/google/code/gson/gson/2.8.6/gson-2.8.6.jar:/Users/lipenghui/.m2/repository/org/apache/avro/avro-protobuf/1.10.2/avro-protobuf-1.10.2.jar:/Users/lipenghui/.m2/repository/com/fasterxml/jackson/module/jackson-module-jsonSchema/2.12.3/jackson-module-jsonSchema-2.12.3.jar:/Users/lipenghui/.m2/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/Users/lipenghui/.m2/repository/net/jcip/jcip-annotations/1.0/jcip-annotations-1.0.jar:/Users/lipenghui/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.10.0/log4j-slf4j-impl-2.10.0.jar:/Users/lipenghui/.m2/repository/org/apache/logging/log4j/log4j-api/2.10.0/log4j-api-2.10.0.jar:/Users/lipenghui/.m2/repository/org/apache/logging/log4j/log4j-core/2.10.0/log4j-core-2.10.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-server/4.11.0/bookkeeper-server-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-common/4.11.0/bookkeeper-common-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/stats/bookkeeper-stats-api/4.11.0/bookkeeper-stats-api-4.11.0.jar:/Users/lipenghui/.m2/repository/org/jctools/jctools-core/2.1.2/jctools-core-2.1.2.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-common-allocator/4.11.0/bookkeeper-common-allocator-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-proto/4.11.0/bookkeeper-proto-4.11.0.jar:/Users/lipenghui/.m2/repository/com/google/protobuf/protobuf-java/3.5.1/protobuf-java-3.5.1.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/bookkeeper-tools-framework/4.11.0/bookkeeper-tools-framework-4.11.0.jar:/Users/lipenghui/.m2/repository/org/rocksdb/rocksdbjni/5.13.1/rocksdbjni-5.13.1.jar:/Users/lipenghui/.m2/repository/org/apache/zookeeper/zookeeper/3.5.7/zookeeper-3.5.7.jar:/Users/lipenghui/.m2/repository/org/apache/zookeeper/zookeeper-jute/3.5.7/zookeeper-jute-3.5.7.jar:/Users/lipenghui/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/Users/lipenghui/.m2/repository/io/netty/netty-handler/4.1.32.Final/netty-handler-4.1.32.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport-native-epoll/4.1.32.Final/netty-transport-native-epoll-4.1.32.Final-linux-x86_64.jar:/Users/lipenghui/.m2/repository/io/netty/netty-transport-native-unix-common/4.1.32.Final/netty-transport-native-unix-common-4.1.32.Final.jar:/Users/lipenghui/.m2/repository/io/netty/netty-tcnative-boringssl-static/2.0.20.Final/netty-tcnative-boringssl-static-2.0.20.Final.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/http/http-server/4.11.0/http-server-4.11.0.jar:/Users/lipenghui/.m2/repository/org/apache/bookkeeper/circe-checksum/4.11.0/circe-checksum-4.11.0.jar:/Users/lipenghui/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/lipenghui/.m2/repository/commons-io/commons-io/2.4/commons-io-2.4.jar:/Users/lipenghui/.m2/repository/org/apache/commons/commons-collections4/4.1/commons-collections4-4.1.jar:/Users/lipenghui/.m2/repository/org/bouncycastle/bcpkix-jdk15on/1.60/bcpkix-jdk15on-1.60.jar:/Users/lipenghui/.m2/repository/org/bouncycastle/bcprov-jdk15on/1.60/bcprov-jdk15on-1.60.jar:/Users/lipenghui/.m2/repository/org/bouncycastle/bcprov-ext-jdk15on/1.60/bcprov-ext-jdk15on-1.60.jar:/Users/lipenghui/.m2/repository/com/beust/jcommander/1.48/jcommander-1.48.jar:/Users/lipenghui/.m2/repository/net/java/dev/jna/jna/3.2.7/jna-3.2.7.jar:/Users/lipenghui/.m2/repository/org/apache/httpcomponents/httpclient/4.5.5/httpclient-4.5.5.jar:/Users/lipenghui/.m2/repository/org/apache/httpcomponents/httpcore/4.4.9/httpcore-4.4.9.jar:/Users/lipenghui/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/Users/lipenghui/.m2/repository/commons-configuration/commons-configuration/1.10/commons-configuration-1.10.jar:/Users/lipenghui/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/Users/lipenghui/Library/Application Support/JetBrains/Toolbox/apps/IDEA-U/ch-0/211.7628.21/IntelliJ IDEA.app/Contents/lib/idea_rt.jar io.streamnative.test.AirCompressorTest
Connected to the target VM, address: '127.0.0.1:51570', transport: 'socket'
Exception in thread "main" java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:101)
at io.streamnative.test.AirCompressorTest.main(AirCompressorTest.java:19)
JDK: 1.8.0_281
Aircompressor: 0.19
As far as I can tell, this project has practically zero documentation, and I have no idea how I would go about using it, or even whether this is anything that is usable at all or a work in progress, a proof of concept or some companies' internal code. Am I missing something or looking at the wrong place ? If there is documentation, a link should be placed on the front page/readme.MD
Hey guys,
So I've been successfully using your library with EMR (emr-5.3.1) & Hive (2.1.1) with LZOP_X1 (no constraints) and now moving to Presto (0.157.1) I get the following stack trace:
com.facebook.presto.spi.PrestoException: java.lang.reflect.InvocationTargetException
at com.facebook.presto.hive.HiveSplitSource.propagatePrestoException(HiveSplitSource.java:137)
at com.facebook.presto.hive.HiveSplitSource.isFinished(HiveSplitSource.java:115)
at com.facebook.presto.split.ConnectorAwareSplitSource.isFinished(ConnectorAwareSplitSource.java:63)
at com.facebook.presto.split.BufferingSplitSource.fetchSplits(BufferingSplitSource.java:59)
at com.facebook.presto.split.BufferingSplitSource.lambda$fetchSplits$1(BufferingSplitSource.java:65)
at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:276)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:246)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:78)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:179)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
... 4 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:273)
... 9 more
Caused by: java.lang.NullPointerException
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101)
... 14 more
Now, The query I'm getting this exception with works well in Hive. It's basically:
select * from table limit 10;
I've added an .lzo.index
near my lzop file in S3 but to no eval.
As far as I can tell, DeprecatedLzoTextInputFormat.class
has a member called indexes
which, if not populated well, get NPE here: https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java#L101
As no check is begin made on LzoIndex index
.
Now, I presumed with your library I could pass on that check by it seems like it's not working.
I'm using aircompressor-0.9.jar
. I've copied it to /usr/lib/presto/plugin/hive-hadoop2
and removed any older version that was in there.
I am confident that your code is actually called (from the stack trace, and many many tests I've done with and without aircompressor jar).
So for my question: Did you guys ever managed to resolve this?
Relevant EMR cluster configuration:
{
"classification": "core-site",
"properties": {
"io.compression.codec.lzo.class": "io.airlift.compress.lzo.LzopCodec",
"io.compression.codecs": "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,io.airlift.compress.lzo.LzoCodec,io.airlift.compress.lzo.LzopCodec"
},
"configurations": []
}
Thank you very much!
Is it made for working on android devices?
I'm getting
java.lang.NoSuchFieldError: No static field ARRAY_BYTE_BASE_OFFSET of type I in class Lsun/misc/Unsafe; or its superclasses (declaration of 'sun.misc.Unsafe' appears in /apex/com.android.art/javalib/core-oj.jar)
We are trying to add LzoCodec to Apache Hadoop based on the implementation of aircompressor. apache/hadoop#2159
When we try to integrate it into Hadoop, we get couple tests failures due to java.lang.UnsupportedOperationException: LZO block compressor is not supported
. We find it's because in LzoCodec in aircompressor, we have a static class HadoopLzoCompressor
that returns dummy implementation when getCompressor
is called. Why don't we return LzoCompressor
instead?
Hi,
We are very happy users of your library, many thanks for this awesome piece of code.
We are however having initialization issues on machines using java 16 and above, which seems to be caused by a change in nio.Buffer. A StackTrace is below. Do you have an idea on how to circumvent this?
Many thanks for your help,
Marc
Exception in thread "ImportThread" java.lang.ExceptionInInitializerError
at io.airlift.compress.zstd.ZstdFrameCompressor.writeMagic(ZstdFrameCompressor.java:57)
at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:143)
at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
...
Suppressed: java.lang.NoClassDefFoundError: Could not initialize class io.airlift.compress.zstd.UnsafeUtil
at io.airlift.compress.zstd.ZstdFrameCompressor.writeMagic(ZstdFrameCompressor.java:57)
at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:143)
at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
...
Caused by: io.airlift.compress.IncompatibleJvmException: Zstandard requires access to java.nio.Buffer raw address field
at io.airlift.compress.zstd.UnsafeUtil.<clinit>(UnsafeUtil.java:53)
...
Hi,
Thanks for the tool!
I'm wondering if there are any pitfalls or other considerations in trying to implement the same thing, but instead of UNSAFE.* family of functions -- using standard ByteBuffers? Seems like a straightforward change, but maybe I'm missing something.
Pros would be:
--release
for modern versions (sun.misc is unavailable).The code of course supports ByteBuffers, but only with zero arrayOffset, and internally it just uses them just as a byte[].
I was recently trying to use this library to decode a compressed HDF5 file, the compression side uses the native library version 1.4.5, when I tried to decode it reports "Input is corrupted: offset=2305". Then I switched to zstd-jni, it worked. I wonder if the later versions or some strategies (or levels) are not supported?
It seems that there is no tag for 0.19 version. Could you make a tag for it?
Similar to zstd-jni and apache commons libraries, they offer OuputStream and InputStream classes for zstd compression. This issue is similar to the enhancement feature request brought up in 2020 from issue 112. Would this project welcome this contribution now or have the authors already have this feature implemented but not released? If it's the former, I am happy to contribute this functionality to this project.
The current API requires that the output of compress be put into a scratch buffer and copied over afterwards. It would be great if either we could provide a second buffer to use if needed or an allocator to create a new buffer. So something like:
void compress(ByteBuffer input, ByteBuffer output, ByteBuffer overflow);
where the compressed bytes get put into output until it is full and then put into overflow. Instead of requiring that output is large enough, it would require that output + overflow is large enough.
AFAICT, there is no Hadoop codec for Zstd in this library, despite what the README says, is this correct?
Hi Team,
I would like to generate the class files for any of the compression techniques implemented here in pure Java.
Can you help me with the steps I can do that?
As I understand it sun.misc.Unsafe is removed in JDK17 and later - I wonder if there is any plans to remove the use of this class in air compressor allowing it to be used with also the latest JDKs?
Currently the ZstdFrameDecoder has a lot of the functionality required for someone to build a streaming decoder (thinking to integrate into Netty channel). However, it could use a little refactoring to give visibility into the block header data and critically, https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/zstd/ZstdFrameDecompressor.java is not a public class!
If this library is meant to be actively maintained, I would be happy to contribute, but won't if it's unlikely to get merged or be supported.
Comment from @nezihyigitbasi
Stepping through hadoop lzo impl. (which implements LZO1X) I noticed something different than aircompressor lzo (I don't really know whether aircompressor implements the same algorihtm). Hadoop lzo at the beginning reads two integers from the input stream (4 bytes for original block size + 4 bytes for compressed chunk length) then the rest of the stream is interpreted as compressed data, and it succeeds. I did the same and consumed 8 bytes before passing the data to aircompressor's lzo decompressor, then most of the test cases passed (there were still failures).
Anyway here is the minimal code that shows how airlift lzo decompressor fails while hadoop's lzo decomp. succeeds with the same input.
I got "Overflow detected" at creating an ORC file through presto-orc from large input data so I could not easily reproduce. That might be because I'm using a bit old version of presto 317.
java.lang.IllegalStateException: Overflow detected
at io.airlift.compress.zstd.Util.checkState(Util.java:59)
at io.airlift.compress.zstd.BitOutputStream.close(BitOutputStream.java:85)
at io.airlift.compress.zstd.HuffmanCompressor.compressSingleStream(HuffmanCompressor.java:130)
at io.airlift.compress.zstd.HuffmanCompressor.compress4streams(HuffmanCompressor.java:75)
at io.airlift.compress.zstd.ZstdFrameCompressor.encodeLiterals(ZstdFrameCompressor.java:333)
at io.airlift.compress.zstd.ZstdFrameCompressor.compressBlock(ZstdFrameCompressor.java:224)
at io.airlift.compress.zstd.ZstdFrameCompressor.compressFrame(ZstdFrameCompressor.java:172)
at io.airlift.compress.zstd.ZstdFrameCompressor.compress(ZstdFrameCompressor.java:145)
at io.airlift.compress.zstd.ZstdCompressor.compress(ZstdCompressor.java:45)
at io.prestosql.orc.OrcOutputBuffer.writeChunkToOutputStream(OrcOutputBuffer.java:445)
at io.prestosql.orc.OrcOutputBuffer.flushBufferToOutputStream(OrcOutputBuffer.java:425)
at io.prestosql.orc.OrcOutputBuffer.close(OrcOutputBuffer.java:146)
at io.prestosql.orc.stream.LongOutputStreamV2.close(LongOutputStreamV2.java:739)
at io.prestosql.orc.writer.SliceDirectColumnWriter.close(SliceDirectColumnWriter.java:139)
at io.prestosql.orc.writer.SliceDictionaryColumnWriter.close(SliceDictionaryColumnWriter.java:324)
at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407)
at io.prestosql.orc.OrcWriter.bufferStripeData(OrcWriter.java:369)
at io.prestosql.orc.OrcWriter.flushStripe(OrcWriter.java:331)
at io.prestosql.orc.OrcWriter.close(OrcWriter.java:444)
But I have a question on the these two lines as I'm not familiar with the zstd. If currentAddress < outputLimit
is expected at line 85, currentAddress = outputLimit
looks confusing at line 73. Just raising an exception here might be less confusing.
Current codebase does UNSAFE.getXXX and UNSAFE.putXXX for short, int and long datatypes.
The subsequent code the only works on LITTLE_ENDIAN
You need to conditionally do reversebytes() for these methods for this to work on BIG_ENDIAN machines
ie translate to LTTLE_ENDIAN on getInt etc, and translate from LITTLE_ENDIAN on putInt etc
I posted the solution to identical issue in snappy-java project, which you guys should ideally duplicate
Hello,
could you please add a SECURITY.md
file to your repository, and ideally also enable private vulnerability reporting?
Hi,
I am trying to port ZstdDecompressor to C# and I have hard time understanding how ZstdDecompressor.getDecompressedSize should work.
If I modify testDecompressWithOutputPaddingAndChecksum()
to following
public void testDecompressWithOutputPaddingAndChecksum()
throws IOException
{
int padding = 1021;
byte[] compressed = Resources.toByteArray(getClass().getClassLoader().getResource("data/zstd/with-checksum.zst"));
byte[] uncompressed = Resources.toByteArray(getClass().getClassLoader().getResource("data/zstd/with-checksum"));
byte[] output = new byte[uncompressed.length + padding * 2]; // pre + post padding
int decompressedSize = getDecompressor().decompress(compressed, 0, compressed.length, output, padding, output.length);
long decompressedSize2 = ZstdDecompressor.getDecompressedSize(compressed, 0, compressed.length);
assertEquals(decompressedSize2, 11359, "Should be equal");
assertByteArraysEqual(uncompressed, 0, uncompressed.length, output, padding, decompressedSize);
}
then the decompressedSize2
has value -1 and test fails. Am I missing something in here?
I see a 0.3 release in Maven central, but there isn't a tag of it on github.
Can you make the release tag?
Hi. Is there support for Zstd streaming compression/decompression in this library (like in native zstd's streaming_compression/streaming_decompression examples)?
If not, can you mark this as an enhancement request please?
Hi there,
I was wondering if there are any plans to support the additional functionality that the original zstd library has in /contrib (https://github.com/facebook/zstd/tree/dev/contrib/seekable_format).
It allows for random read access to files without the need to decompress the entire file. I am not familiar with C nor with the zstd codebase, but if someone would be willing to give me a bit of guidance I am happy to try and implement it myself.
Thanks
The Hadoop version of Lz4Codec can not decode silesia/x-ray, compressed by the Airlift Lz4Codec. This appears to be a problem with the native Lz4 code in Hadoop, but it could be a problem with the Airlift Lz4 block compressor, or the Airlift implementation of the Hadoop block stream encoding.
To reproduce remove, check in TestLz4Codec.testCompress
in #34
my code:
private final static Compressor compressor = new SnappyCompressor();
private final static Decompressor decompressor = new SnappyDecompressor();
public byte[] compress(String s) {
ByteBuffer byteBuffer = ByteBuffer.wrap(s.getBytes(StandardCharsets.UTF_8));
int length = compressor.maxCompressedLength(s.getBytes(StandardCharsets.UTF_8).length);
ByteBuffer out = ByteBuffer.allocate(length);
compressor.compress(byteBuffer, out);
return out.array();
}
public String decompress(byte[] s) {
ByteBuffer in = ByteBuffer.wrap(s);
int length = SnappyDecompressor.getUncompressedLength(s, 0);
ByteBuffer out = ByteBuffer.allocate(length);
decompressor.decompress(in, out);
return new String(out.array());
}
public static void main(String[] args) {
String str = "123";
Logger log = LoggerFactory.getLogger("compress");
log.info("char length={},byte array length={}", str.length(), str.getBytes(StandardCharsets.UTF_8).length);
MessageCompresser compresser = new LzwMessageCompresser();
byte[] compressedBytes = compresser.compress(str);
String value = new String(compressedBytes);
log.info("compressed:char length={},byte array length={}", value.length(), value.getBytes(StandardCharsets.UTF_8).length);
log.info("value={}", value);
String strs = compresser.decompress(compressedBytes);
log.info("restore:{}", strs);
}
why is MalformedInputException thrown when decompressing compressed bytes
This issue is copied from trinodb/trino#17792 since I believe this repo is where zstd de/compression is handled.
I have a Hive table built on top of zst compressed data. On a Trino 419 cluster I get the following error when trying to read this table from Trino. That version of Trino uses aircompressor 0.23.
Query 20230607_172621_00003_5hpz7 failed: Error opening Hive split s3://path/to/file.csv.access.log.zst (offset=0, length=1544108): Window size too large (not yet supported): offset=3084
We are currently running Trino 405 and this query executes without an issue. We also have been running previous versions of Trino/Presto and this executed without an issue in the past. That version of Trino uses 0.21
Did something change between aircompressor 0.21 and 0.24 that might have caused this? And is there anything I can do to get past this error? Thanks in advance for your help!
Full stack trace
io.trino.spi.TrinoException: Error opening Hive split s3://path/to/file.csv.access.log.zst (offset=0, length=1544108): Window size too large (not yet supported): offset=3084
at io.trino.plugin.hive.line.LinePageSourceFactory.createPageSource(LinePageSourceFactory.java:179)
at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:218)
at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:156)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:61)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:298)
at io.trino.operator.Driver.processInternal(Driver.java:402)
at io.trino.operator.Driver.lambda$process$8(Driver.java:305)
at io.trino.operator.Driver.tryWithLock(Driver.java:701)
at io.trino.operator.Driver.process(Driver.java:297)
at io.trino.operator.Driver.processForDuration(Driver.java:268)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:888)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:556)
at io.trino.$gen.Trino_18f7842____20230607_162211_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.airlift.compress.MalformedInputException: Window size too large (not yet supported): offset=3084
at io.airlift.compress.zstd.Util.verify(Util.java:45)
at io.airlift.compress.zstd.ZstdFrameDecompressor.decodeCompressedBlock(ZstdFrameDecompressor.java:303)
at io.airlift.compress.zstd.ZstdIncrementalFrameDecompressor.partialDecompress(ZstdIncrementalFrameDecompressor.java:236)
at io.airlift.compress.zstd.ZstdInputStream.read(ZstdInputStream.java:89)
at io.airlift.compress.zstd.ZstdHadoopInputStream.read(ZstdHadoopInputStream.java:53)
at com.google.common.io.CountingInputStream.read(CountingInputStream.java:64)
at java.base/java.io.InputStream.readNBytes(InputStream.java:506)
at io.trino.hive.formats.line.text.TextLineReader.fillBuffer(TextLineReader.java:248)
at io.trino.hive.formats.line.text.TextLineReader.(TextLineReader.java:67)
at io.trino.hive.formats.line.text.TextLineReaderFactory.createLineReader(TextLineReaderFactory.java:77)
at io.trino.plugin.hive.line.LinePageSourceFactory.createPageSource(LinePageSourceFactory.java:171)
... 17 more
When I write bytes through ZstdOutputStream it pushes none of the compressed byes (or any bytes). Analogous code with Snappy works ok. Compressing bytes via new ZstdCompressor() then calling compress() method works ok.
Code example:
byte[] someTextBytes = new byte[]{90,115,116,100,79,117,116,112,117,116,83,116,114,101,97,109,32,
105,115,32,110,111,116,32,119,111,114,107,105,110,103,44,32,98,117,116,32,83,110,97,112,
112,121,70,114,97,109,101,100,79,117,116,112,117,116,83,116,114,101,97,109,32,105,115,
32,119,111,114,107,105,110,103,46,90,115,116,100,79,117,116,112,117,116,83,116,114,101,97,109,32,
105,115,32,110,111,116,32,119,111,114,107,105,110,103,44,32,98,117,116,32,83,110,97,112,
112,121,70,114,97,109,101,100,79,117,116,112,117,116,83,116,114,101,97,109,32,105,115,
32,119,111,114,107,105,110,103,46};
System.out.println("\n\nUnCompressed bytes to SnappyCompressOut -> System.out:");
System.out.println("----------------------------------------------------------");
SnappyFramedOutputStream snappyOS = new SnappyFramedOutputStream(System.out);
snappyOS.write(someTextBytes);
snappyOS.flush();
System.out.println("\n----------------------------------------------------------");
System.out.println("UnCompressed bytes to ZstdCompressOut -> System.out:");
System.out.println("----------------------------------------------------------");
ZstdOutputStream zstdOS = new ZstdOutputStream(System.out);
zstdOS.write(someTextBytes);
zstdOS.flush();
System.out.println("\n----------------------------------------------------------");
Result:
UnCompressed bytes to SnappyCompressOut -> System.out:
----------------------------------------------------------
)�.�I��I
----------------------------------------------------------
UnCompressed bytes to ZstdCompressOut -> System.out:
----------------------------------------------------------
----------------------------------------------------------
I tried file output stream, also tried wrapping various out streams in BufferedOutputStream, just to see where the issue lies. It looks like issue lies in private ZstdOutputStream.compressIfNecessary() where it decides not to .writeChunk()
ZstdCompressor.maxCompressedLength
isn't an accurate translation of the ZSTD_COMPRESSBOUND
macro due to some missing & misplaced parenthesis. I believe the following diff would be more accurate. Not sending a PR because I haven't looked into tests.
diff --git a/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java b/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java
index 23ace52..b714d63 100644
--- a/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java
+++ b/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java
@@ -30,7 +30,7 @@ public class ZstdCompressor
int result = uncompressedSize + (uncompressedSize >>> 8);
if (uncompressedSize < MAX_BLOCK_SIZE) {
- result += MAX_BLOCK_SIZE - (uncompressedSize >>> 11);
+ result += ((MAX_BLOCK_SIZE - uncompressedSize) >>> 11);
}
return result;
This happened with aircompressor-0.9.
is aircompressor not supported on 64 bit sparc platforms?
Here is the native stack:
----------------- lwp# 2 / thread# 2 --------------------
ffffffff7e7dccbc _lwp_kill (6, 0, ffffffff7e949968, ffffffffffffffff, ffffffff7e93e000, 0) + 8
ffffffff7e74c250 abort (1, 1d8, ffffffff7db9b1cc, 1f1ebc, 0, 0) + 118
ffffffff7db84a48 _1cCosFabort6Fb_v (1, 1, 4b318, ffffffff7e0dda20, 55902c, 4b000) + 58
ffffffff7ddfb2d4 _1cHVMErrorOreport_and_die6M_v (1, ffffffff7e1eeb35, 100110800, ffffffff7db98100, ffffffff7e2d5a80, ffffffff7e2751c0) + 10ac
ffffffff7db9574c JVM_handle_solaris_signal (a, ffffffff7affa2a0, ffffffff7dd7cff0, ffffffff7aff9b30, ffffffffffaaf928, ffffffff7aff9fc0) + c0c
ffffffff7db8d364 signalHandler (a, ffffffff7affa2a0, ffffffff7aff9fc0, ffffffff7e94ec38, 1001042d0, 2) + 1c
ffffffff7e7d8d6c __sighndlr (a, ffffffff7affa2a0, ffffffff7aff9fc0, ffffffff7db8d348, 0, 9) + c
ffffffff7e7cc8d4 call_user_handler (ffffffff7e600a00, ffffffff7e600a00, ffffffff7aff9fc0, c, 0, 0) + 3e0
ffffffff7e7ccae0 sigacthandler (0, ffffffff7affa2a0, ffffffff7aff9fc0, ffffffff7e600a00, 0, ffffffff7e93e000) + 68
--- called from signal handler with signal 10 (SIGBUS) ---
ffffffff7dd7cff0 Unsafe_GetInt (ffffffff7affc5e8, ffffffff7e26f240, ffffffff7affa510, 11, 100110800, 191800) + 174
ffffffff6bb450a0 * *sun/misc/Unsafe.getInt(Ljava/lang/Object;J)I [compiled]
ffffffff6b007b18 * io/airlift/compress/snappy/SnappyRawCompressor.compress(Ljava/lang/Object;JJLjava/lang/Object;JJ[S)I+462
ffffffff6b007b18 * io/airlift/compress/snappy/SnappyCompressor.compress([BII[BII)I+57
ffffffff6b007b18 * io/airlift/compress/snappy/SnappyFramedOutputStream.writeCompressed([BII)V+59
ffffffff6b008068 * io/airlift/compress/snappy/SnappyFramedOutputStream.flushBuffer()V+34
ffffffff6b008068 * io/airlift/compress/snappy/SnappyFramedOutputStream.flush()V+32
Current ZstdCompressor hardcode the compressionLevel as CompressionParameters.DEFAULT_COMPRESSION_LEVEL. It needs to be configuable.
Currently only the default level is implemented, and before compression level can be set by end users the other strategies must be implemented.
Is there any documentation to use LzopCodec codec in apache-spark ?
That would be very helpful
Currently the Zstd compressionLevel is hard-coded, Can you add a method to modify compressionLevel? Thx.
I tried running:
java -cp aircompressor-0.12-SNAPSHOT-tests.jar io.airlift.compress.benchmark.CompressionBenchmark
Got the following error:
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/openjdk/jmh/runner/options/Options
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:522)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:504)
Caused by: java.lang.ClassNotFoundException: org.openjdk.jmh.runner.options.Options
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Why is my data compressed in java different from the data compressed by the lzo library implemented by ANSI C using jni on the mobile side
OpenSearch is evaluating the pure java implementation of Zstd using AirCompressor. I stumbled on a thoughtful comment here which certainly aligns to the reasons we avoid jni code as "top level" modules or plugins on OpenSearch core (we have similar reasons on Lucene). We realize that comment is now five years old, and so @reta ran benchmarks that seem to show significant performance differences between the pure java implementation (hotspot has gotten better of course) and jni. Do these number look valid to folks on this project? What are the potential pitfalls to running those benchmarks that we need to be aware of? Are there certain config conditions that should be followed to squeeze better performance?
Thanks in advance for any assistance that can be provided.
Hello first of all thanks a lot for your library already tested on snappy zstd and lz4 and indeed it's much faster. I read that lz4 compression with double buffering is much more efficient do you know if it's supported? I didn't try to implement yet it's just a genuine question before implementing it
I'm trying out this library for snappy compression and decompression. Currently I'm using the Xerial Snappy library. This may be a very dumb question, but I can't seem to figure this out:
How do we know the required size of the output buffer?
public ByteArray decompress(ByteArray compressed){
var decompressor = SnappyDecompressor();
var outputBuffer = ByteBuffer.allocate(?????);
decompressor.decompress(ByteBuffer.wrap(compressed), outputBuffer);
return outputBuffer.array()
}
ByteBuffer
s by design cannot resize dynamically. And I don't know anything about the input array, except that it has been previously compressed by snappy. So... how should I size the output buffer?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.