Comments (13)
Meow hash is heavily based on AES operation which has 128-bit block size. Having 256-bit output will require significant changes to algorithm, it almost will be completely different hash (unless doing some trivial change, like hashing input two times with different "seed" and concatenating two 128-bit outputs).
from meow_hash.
@mmozeiko the thought would be to have a 256-bit hash that has similar speeds to Meow128, and I don't know if a simple "seed" concatenation would do for randomness. Maybe have an intermediate step to mix between the two 128-bit blocks?
from meow_hash.
Mainly 256 bit non-cryptographic hashes don't really do anything. If you are afraid of collisions in 128 bit hashes, by far the most likely reason for a collision is that someone forged a collision, so to get a meaningfully higher security level you need to move to cryptographic hashing. That is why most hash functions don't provide more than 128 bits of output.
from meow_hash.
@NoHatCoder mind if I ask, what are the chances of a hash collision, for both the case of 128-bit and 256-bit, if the server has 2^23 files (a small site)? what about 2^28 files (a large media company)?Considering the bit failure rate of an SSD is about 1e-16 to 1e-17 (2^-53 to 2^-57).
from meow_hash.
2^-83, 2^-211, 2^-73 and 2^-201, respectively. So you are still far more likely to serve the wrong files because cosmic rays flipped a bit somewhere.
Also for your benchmarking, please note that most of the hash functions you find around the internet are broken in some way, for instance you could have a 128 bit hash that seems fine, but if you swap two blocks of the input there is a 2^-32 chance that it causes a collision, so that flaw will dominate the chance of collisions appearing in the wild. Test suites generally won't find problems like that, since they are slightly too unlikely to be found, and the suite might not test a suitable pattern.
from meow_hash.
@NoHatCoder so you took a look at the other issues page? Glad to see people care.
So... out of the ones listed there that has 128 and 256-bit hashes, which ones are "the best"?
(assuming that I will be using both Intel and AMD CPUs which have different ISAs (AES vs SHA2))
from meow_hash.
I have no faith in XXH (all variants), see Cyan4973/xxHash#180
CityHash at least has a seed that does nothing, see http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/ beyond that it is complete mess, not as much designed as just thrown together, combining arbitrary operations with no rhyme or reason.
HighwayHash I have looked at before, probably ok.
The others I don't know.
About ISAs, all modern X86 processors have the AES-NI extension, the is no problem supporting both Intel and AMD processors. The ISA problem for Meow is that there is basically two ways to implement AES and keeping it simple: Transforming the decode to be like the encode, or transforming the encode to be like the decode, neither is inherently better than the other. X86 and ARM made opposite choices, but the way Meow works, it takes extra instructions to make one of the implementations work like the other.
from meow_hash.
@NoHatCoder out of all the methods listed, has anyone ever tried to benchmark all these hashes by speed (and non-cryptographic/"common" collision resistance if it is doable)?
from meow_hash.
Lots of people have made comparing lists of assorted hash functions, including vain attempts at benchmarking quality.
The problem is that it takes a lot of work to dig through the internals of a hash function in order to properly gauge the quality. And all the thank you ever get is to have an internet argument with the creator, and possibly some randos, over whether or not what you found "counts".
from meow_hash.
Any good or bad examples of such a situation where people bicker over which code base to use?
Also are there any organizations that can be "neutral" (at least in the speed front)?
from meow_hash.
The speed part is the easy part, while you can surely mess it up, lots of people are able to do it reasonably well. It seems that most people with the capabilities to be authoritative in the field choose to spend their time on crypto instead, makes it a bit of a wild west.
For bickering, see for instance: vinkla/hashids#48
The library claims to obscure sequential ids so that they can't be guessed. The code is a mixed up substitution cipher, generally a worthless construct, but it falls completely apart when the input is highly guessable. The author doesn't seem to think that there is anything wrong with publishing a library that is basically pointless without the security claims that it is unable to make.
from meow_hash.
I would sincerely hope, in the near future, that a speed test be conducted for all 128 and 256 bit hashes for the sake of comparing performance, even before evaluating randomness/security.
Regarding randomness, what about establishing a standard test suite like https://github.com/aappleby/smhasher/ before initiating a public benchmark?
from meow_hash.
You only get so far with a test suite. Many hashes have been designed using smhasher, meaning that the author have started out by making a broken hash, run it through smhasher, found some brokenness that way, fixed it, and repeated until smhasher didn't report any problems. This basically means that the code is still full of problems, but only those too subtle for smhasher to find remain. Proper vetting must be done analytically, and that is almost impossible to automate.
from meow_hash.
Related Issues (20)
- Example program does not work on Windows HOT 9
- Use streaming construction to hash files HOT 2
- A Sun port i did on a whim, using the system compiler... HOT 3
- Benchmark Results From Ryzen 7 1700 1st Gen HOT 5
- dotnet (c#) bindings HOT 1
- How deterministic is Meow hash? HOT 4
- Consider using -mavx rather than -mavx2 in build.sh's build of meow_bench HOT 1
- Inlining Failed HOT 4
- Errors in contributors links
- MeowU64From only returns the first 64 bytes of the hash HOT 4
- .NET Core 3.1 port. HOT 2
- _ReadWriteBarrier() deprecated HOT 2
- Make input parameters const?
- Buffer overflow when size is not a multiple of 16 (ASan). HOT 2
- Full 128-bit collision between two files HOT 15
- 0.6 candidate patterns HOT 4
- Meow 0.6 candidate functions HOT 4
- Compare against xxHash HOT 2
- Suggestion: API for runtime AES instruction check
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from meow_hash.