Comments (18)
This is very nice! 🥇 I will run this and fix the issues. Thanks!
from unishox2.
Fair enough. Running on my laptop atm, I'm at 9.6 million execs and zero crashes so far, so it looks like you did a very good job :)
from unishox2.
Forgot to mention it when I stopped, but I ran the test till about 116 million runs IIRC, no further issues. It did slow down a lot from the starting 1k/second unfortunately, as afl started generating large test cases (like 80kB).
from unishox2.
Hi, I agree the decompress code should check for NULL, but is the input valid? Was it obtained from unishox2_compress
?
from unishox2.
The input is invalid. In my case, it may come from untrusted parties.
from unishox2.
ok, I will fix the code to check NULL. Even if it might fix this and not crash, I am not sure if there will be other cases where it might crash for invalid input. I will check those too, but validating the input using a checksum mechanism may be the safest option.
from unishox2.
The input is unknown so a malicious attacker can generate any checksum they like along with a malicious input.
from unishox2.
:-o
ok, understood!
from unishox2.
Also crashes with input 310a (two bytes with those hex values) in a different place.
Do you want to make this code safe against arbitrary inputs, or do you think it'd take too much work ?
from unishox2.
If the former, you can build unishox2 with -fsanitize=address, and buikd this test program:
#include <stdint.h>
#include <stdio.h>
#include <malloc.h>
#include "common/unishox2.h"
int LLVMFuzzerTestOneInput(const uint8_t *buf, size_t len)
{
static char out[16 * 65536];
unishox2_decompress((const char*)buf, len, out, sizeof(out), USX_HCODES_DFLT, USX_HCODE_LENS_DFLT, USX_FREQ_SEQ_TXT, USX_TEMPLATES);
return 0;
}
int main(int argc, const char **argv)
{
FILE *f;
long pos;
size_t len;
char *buf;
if (argc < 2)
{
printf("usage: %s <filename>\n", argv[0]);
return 1;
}
f = fopen(argv[1], "r");
if (!f) return 1;
if (fseek(f, 0, SEEK_END) < 0) return 1;
pos = ftell(f);
if (pos < 0) return 1;
len = pos;
if (fseek(f, 0, SEEK_SET) < 0) return 1;
buf = malloc(len);
if (!buf) return 1;
if (fread(buf, 1, len, f) != (size_t)len) return 1;
return LLVMFuzzerTestOneInput((const uint8_t*)buf, len);
}
Then run:
mkdir -p fuzz-data && echo ' ' > fuzz-data/SEED && afl-fuzz -m none -i fuzz-data -o fuzz-out ./a.out @@
This will leave all crashing inputs in the fuzz-data directory for you to look at.
from unishox2.
You need to install ASAN (for -fsanitize=address) and American fuzzy lop (for afl-fuzz).
from unishox2.
Hi, I have fixed the issues that showed up on AFL but it is still running and does not look like it will finish anytime soon. But I have gone through the code and think there aren't any more issues. I have made a release with the fixes 1.0.3.
from unishox2.
That was fast, thanks!
AFL will run for a loooong time. It may find more crashes as it finds data that exercises different code paths. It's an exploration with a very short view range.
I notice it only executes 6 times a second. That is very slow, the test program executes more than 100 times a second for me, on a not too recent laptop. This makes finding new crashes
much faster.
from unishox2.
Hm.. I am running it on Macbook Air 2017. I tried compiling it with -O3 and on a Ramdisk but not much improvement. looks like I will have to find a better machine to run it on. If there are more crashes I will update here again. Thanks !!
from unishox2.
Possibly runtime linking ? Loading/linking at every process start is quite heavy.
Running ldd on the test binary should have as few dynamically loaded objects as possible.
My test program has 7 libs.
Running that program here, I get about 1k runs a second.
from unishox2.
Unishox does not depend on other libraries. I ran the equivalent of ldd and got this:
% otool -L a.out
a.out:
@rpath/libclang_rt.asan_osx_dynamic.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
and I could not find a way of statically linking them. Also it seems it won't make much difference even if I did so. So looks like the only choice to find a faster machine :)
from unishox2.
@moneromooo-monero Thats great news! Thanks for letting me know!!
This exercise was a good one also because I am working on other such projects, especially Unishox3, which is expected to surpass GZip for compression ratio and hopefully reach closer to its speed (I am no Jean-loup Gailly :)), but it would have some demands on RAM/Flash proportionate to desired compression ratio.
from unishox2.
will be great if u have some benchmarks on performance of unishox in time and memory used other than just the obvious storage savings like this just compare with lz4 and snappy will do i guess coz the rest we can reference here:
https://www.scylladb.com/2019/10/07/compression-in-scylla-part-two/
at least hv a guideline on where to use the rest e.g. > 1kb or something
but please do make unishox 3 available first coz seems exciting to have it as production ready
from unishox2.
Related Issues (20)
- Benchmark against Snappy HOT 5
- Decompress length is greater than original length HOT 7
- Questions about e.g. handling arbitrary bytestrings; and some feedback HOT 3
- Simple question HOT 4
- Unishox2/Arduino library crashes in append_bits HOT 2
- Errors in linting with tool-cppcheck HOT 2
- Documentation HOT 3
- any golang binding for this? HOT 2
- can u do one just for email? e.g. -.@ HOT 9
- great work by the way, what's the difference between w-olen and without? HOT 2
- Reimplementation Licensing HOT 1
- benchmarks available? HOT 1
- Benchmarking Unishox HOT 2
- Can this compression library be used when the input encoding is not UTF8 or the character set is not Unicode? HOT 1
- Optimize wordlist.h for Unishox3 HOT 4
- Unishox3 not working for ESP32 HOT 8
- Can I know in advance what the output size will be from the input in unishox2? HOT 1
- Example to decode a single line from H file
- when is unishox 3 ready? just curious. been watching this repo for a year+ HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unishox2.