Comments (9)
Not sure if this answers your question, but yes,
email addresses can be compressed using this library
from unishox2.
no i mean do 1 just for compressing anything with alphanum, "-", "." , and "@" characters. definitely more efficient right? how do i set what characters to compress only?
from unishox2.
Can use preset 3 as in ./test_unishox2 "[email protected]" 3
but there isn't any difference as @ and . occur only once
from unishox2.
i think the compression can be better with only 3 symbols and not all?
right?
is there a way to set the symbols for compression? this will be even better.
e.g. ./test_unishox2 "[email protected]" 100 "@,_,.,-"
from unishox2.
If you see the model at https://github.com/siara-cc/Unishox2/blob/master/promo/model.png?raw=true
most symbols you have mentioned are at favorable positions except perhaps @ and -.
Even if we do as you mentioned, we may get 1 byte improvement at the max as these symbols mostly appear only once.
You could try it out by changing the symbol table in the code by moving all symbols to beginning of Set 2 and also use 2 as preset.
from unishox2.
1 byte improvement is very very very good for email addresses.
i think next version shld hv a way for user to define the symbols easier as i've mentioned.
to change hard coded values, i'm kind of worried will hv breaking changes.
will u do the user defined symbols add on?
from unishox2.
i just realised user defined symbols table can and will compress even better!
u shld do that as unishox3 or something
from unishox2.
Sure I will provide a way for developer or user to customise the symbol table in next version. Thanks!
from unishox2.
I created a separate branch favor-email and changed the symbol table to favour "@,_,.,-". Also created a preset for email with frequency list as "@gmail.com, "@yahoo.com", "@hotmail.com", ".com", ".net" and ".org" and the savings were found to be more than just 1 byte. If you would like to try it:
git clone -b favor-email https://github.com/siara-cc/Unishox2 Unishox2_email
cd Unishox2_email
make
./test_unishox2 "[email protected]" 17
from unishox2.
Related Issues (20)
- Benchmark against Snappy HOT 5
- Decompress length is greater than original length HOT 7
- Questions about e.g. handling arbitrary bytestrings; and some feedback HOT 3
- Simple question HOT 4
- Unishox2/Arduino library crashes in append_bits HOT 2
- Errors in linting with tool-cppcheck HOT 2
- Documentation HOT 3
- any golang binding for this? HOT 2
- great work by the way, what's the difference between w-olen and without? HOT 2
- Crash in decompression, strlen(NULL) HOT 18
- Reimplementation Licensing HOT 1
- benchmarks available? HOT 1
- Benchmarking Unishox HOT 2
- Can this compression library be used when the input encoding is not UTF8 or the character set is not Unicode? HOT 1
- Optimize wordlist.h for Unishox3 HOT 4
- Unishox3 not working for ESP32 HOT 8
- Can I know in advance what the output size will be from the input in unishox2? HOT 1
- Example to decode a single line from H file
- when is unishox 3 ready? just curious. been watching this repo for a year+ HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unishox2.