Git Product home page Git Product logo

mmh3's Introduction

mmh3

GitHub Super-Linter Build PyPi Version Python Versions License: MIT Total Downloads Recent Downloads

mmh3 is a Python extension for MurmurHash (MurmurHash3), a set of fast and robust non-cryptographic hash functions invented by Austin Appleby.

Combined with probabilistic techniques like a Bloom filter, MinHash, and feature hashing, mmh3 allows you to develop high-performance systems in fields such as data mining, machine learning, and natural language processing.

Another common use of mmh3 is to calculate favicon hashes used by Shodan, the world's first IoT search engine.

How to use

Install

pip install mmh3 # for macOS, use "pip3 install mmh3" and python3

Simple functions

Quickstart:

>>> import mmh3
>>> mmh3.hash("foo") # returns a 32-bit signed int
-156908512
>>> mmh3.hash("foo", 42) # uses 42 as a seed
-1322301282
>>> mmh3.hash("foo", signed=False) # returns a 32-bit unsigned int
4138058784

Other functions:

>>> mmh3.hash64("foo") # two 64 bit signed ints (by using the 128-bit algorithm as its backend)
(-2129773440516405919, 9128664383759220103)
>>> mmh3.hash64("foo", signed=False) #  two 64 bit unsigned ints
(16316970633193145697, 9128664383759220103)
>>> mmh3.hash128("foo", 42) # 128 bit unsigned int
215966891540331383248189432718888555506
>>> mmh3.hash128("foo", 42, signed=True) # 128 bit signed int
-124315475380607080215185174712879655950
>>> mmh3.hash_bytes("foo") # 128 bit value as bytes
'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~'
>>> import numpy as np
>>> a = np.zeros(2 ** 32, dtype=np.int8)
>>> mmh3.hash_bytes(a)
b'V\x8f}\xad\x8eNM\xa84\x07FU\x9c\xc4\xcc\x8e'

Beware that hash64 returns two values, because it uses the 128-bit version of MurmurHash3 as its backend.

hash_from_buffer hashes byte-likes without memory copying. The method is suitable when you hash a large memory-view such as numpy.ndarray.

>>> mmh3.hash_from_buffer(numpy.random.rand(100))
-2137204694
>>> mmh3.hash_from_buffer(numpy.random.rand(100), signed=False)
3812874078

hash64, hash128, and hash_bytes have the third argument for architecture optimization (keyword arg: x64arch). Use True for x64 and False for x86 (default: True):

>>> mmh3.hash64("foo", 42, True) 
(-840311307571801102, -6739155424061121879)

hashlib-style hashers

mmh3 implements hashers whose interfaces are similar to hashlib in the standard library: mmh3_32() for 32 bit hashing, mmh3_x64_128() for 128 bit hashing optimized for x64 architectures, and mmh3_x86_128() for 128 bit hashing optimized for x86 architectures.

In addition to the standard digest() method, each hasher has sintdigest(), which returns a signed integer, and uintdigest(), which returns an unsigned integer. 128 bit hashers also have stupledigest() and utupledigest() which return two 64 bit integers.

Note that as of version 4.1.0, the implementation is still experimental and its performance can be unsatisfactory (especially mmh3_x86_128()). Also, hexdigest() is not supported. Use digest().hex() instead.

>>> import mmh3
>>> hasher = mmh3.mmh3_x64_128(seed=42)
>>> hasher.update(b"foo")
>>> hasher.update(b"bar")
>>> hasher.update("foo") # str inputs are not allowed for hashers
TypeError: Strings must be encoded before hashing
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
>>> hasher.digest()
b'\x82_n\xdd \xac\xb6j\xef\x99\xb1e\xc4\n\xc9\xfd'
>>> hasher.sintdigest() # 128 bit signed int
-2943813934500665152301506963178627198
>>> hasher.uintdigest() # 128 bit unsigned int
337338552986437798311073100468589584258
>>> hasher.stupledigest() # two 64 bit signed ints
(7689522670935629698, -159584473158936081)
>>> hasher.utupledigest() # two 64 bit unsigned ints
(7689522670935629698, 18287159600550615535)

Changelog

4.1.0 (2024-01-09)

  • Add support for Python 3.12.
  • Change the project structure to fix issues when using Bazel (#50).
  • Fix incorrect type hints (#51).
  • Fix invalid results on s390x when the arg x64arch of hash64 or hash_bytes is set to False (#52).

4.0.1 (2023-07-14)

  • Fix incorrect type hints.
  • Refactor the project structure (#48).

4.0.0 (2023-05-22)

  • Add experimental support for hashlib-compliant hasher classes (#39). Note that they are not yet fully tuned for performance.
  • Add support for type hints (#44).
  • Add wheels for more platforms (musllinux, s390x, win_arm64, and macosx_universal2).
  • Drop support for Python 3.7, as it will reach the end of life on 2023-06-27.
  • Switch license from CC0 to MIT (#43).
  • Add a code of conduct (the ACM Code of Ethics and Professional Conduct).
  • Backward incompatible changes:
    • A hash function now returns the same value under big-endian platforms as that under little-endian ones (#47).
    • Remove the __version__ constant from the module (#42). Use importlib.metadata instead.

See CHANGELOG.md for the complete changelog.

License

MIT, unless otherwise noted within a file.

Known Issues

Getting different results from other MurmurHash3-based libraries

By default, mmh3 returns signed values for 32-bit and 64-bit versions and unsigned values for hash128, due to historical reasons. Please use the keyword argument signed to obtain a desired result.

From version 4.0.0, mmh3 returns the same value under big-endian platforms as that under little-endian ones, while the original C++ library is endian-sensitive. If you need to obtain the original-compliant results under big-endian environments, please use version 3.*.

For compatibility with Google Guava (Java), see https://stackoverflow.com/questions/29932956/murmur3-hash-different-result-between-python-and-java-implementation.

For compatibility with murmur3 (Go), see #46.

Unexpected results when given non 32-bit seeds

Version 2.4 changed the type of seeds from signed 32-bit int to unsigned 32-bit int. The resulting values with signed seeds still remain the same as before, as long as they are 32-bit.

>>> mmh3.hash("aaaa", -1756908916) # signed representation for 0x9747b28c
1519878282
>>> mmh3.hash("aaaa", 2538058380) # unsigned representation for 0x9747b28c
1519878282

Be careful so that these seeds do not exceed 32-bit. Unexpected results may happen with invalid values.

>>> mmh3.hash("foo", 2 ** 33)
-156908512
>>> mmh3.hash("foo", 2 ** 34)
-156908512

Contributing Guidelines

See CONTRIBUTING.md.

Authors

MurmurHash3 was originally developed by Austin Appleby and distributed under public domain https://github.com/aappleby/smhasher.

Ported and modified for Python by Hajime Senuma.

See also

Tutorials (High-Performance Computing)

The following textbooks and tutorials are great sources to learn how to use mmh3 (and other hash algorithms in general) for high-performance computing.

Tutorials (Internet of Things)

Shodan, the world's first IoT search engine, uses MurmurHash3 hash values for favicons (icons associated with web pages). ZoomEye follows Shodan's convention. Calculating these values with mmh3 is useful for OSINT and cybersecurity activities.

Similar libraries

mmh3's People

Contributors

arieleizenberg avatar doozr avatar dshein-alt avatar hajimes avatar honnibal avatar n-dusan avatar pik avatar wbolster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmh3's Issues

hash64/hash128/hash_bytes(..., x64arch = False) fail on s390x

Test cases for the functions hash64/hash128/hash_bytes fail on s390x when the arg x64arch is False (https://github.com/hajimes/mmh3/actions/runs/7396371609), likely because the architecture is big-endian.

  >       assert mmh3.hash64("foo", signed=False, x64arch=False) == (
              6968798590592097061,
              6968798590746895717,
          )
  E       assert (630394342576...8590746895717) == (696879859059...8590746895717)
  E         At index 0 diff: 6303943425762141541 != 6968798590592097061
  E         Use -v to get more diff

The result should be the same as the value in little-endian environments (feature from 4.0.0).

How Golang call this?

my colleague use this function to generate a int64, could you please add a demo how Go call this function to get the same number? thank you in advance

mmh3 not building anymore

mmh3 is not compiling when installing through pip via packet or git.

This has been raised once in the past (#7) and fixed but the issue appeared again.

Specs of my system:
gcc: 7.3.0
os: ubuntu 16 / ubuntu 18
pip: 9.0.1
git: 2.17.1
python: 2.7.15rc1

Error is exactly the same:

building 'mmh3' extension
  creating build
  creating build/temp.linux-x86_64-2.7
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c 
mmh3module.cpp -o build/temp.linux-x86_64-2.7/mmh3module.o
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c 
MurmurHash3.cpp -o build/temp.linux-x86_64-2.7/MurmurHash3.o
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:60:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
   ^
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:65:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
   ^
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:73:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint32_t fmix ( uint32_t h )
   ^
  MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
   #define FORCE_INLINE attribute((always_inline))
                                 ^
  MurmurHash3.cpp:86:1: note: in expansion of macro ‘FORCE_INLINE’
   FORCE_INLINE uint64_t fmix ( uint64_t k )
   ^
  MurmurHash3.cpp: In function ‘void MurmurHash3_x86_32(const void*, int, uint32_t, void*)’:
  MurmurHash3.cpp:117:36: error: ‘getblock’ was not declared in this scope
       uint32_t k1 = getblock(blocks,i);
                                      ^
  MurmurHash3.cpp:148:15: error: ‘fmix’ was not declared in this scope
     h1 = fmix(h1);
                 ^
  MurmurHash3.cpp: In function ‘void MurmurHash3_x86_128(const void*, int, uint32_t, void*)’:
  MurmurHash3.cpp:178:40: error: ‘getblock’ was not declared in this scope
       uint32_t k1 = getblock(blocks,i*4+0);
                                          ^
  MurmurHash3.cpp:244:15: error: ‘fmix’ was not declared in this scope
     h1 = fmix(h1);
                 ^
  MurmurHash3.cpp: In function ‘void MurmurHash3_x64_128(const void*, int, uint32_t, void*)’:
  MurmurHash3.cpp:279:40: error: ‘getblock’ was not declared in this scope
       uint64_t k1 = getblock(blocks,i*2+0);
                                          ^
  MurmurHash3.cpp:329:15: error: ‘fmix’ was not declared in this scope
     h1 = fmix(h1);
                 ^
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for mmh3

Not able to install it how to fix it

C:\Users\Saurabh bhandari\AppData\Local\Programs\Python\Python38>pip install mmh3
Collecting mmh3
Using cached mmh3-2.5.1.tar.gz (9.8 kB)
Using legacy 'setup.py install' for mmh3, since package 'wheel' is not installed.
Installing collected packages: mmh3
Running setup.py install for mmh3 ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"'; file='"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-record-iye5vtng\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\Include\mmh3'
cwd: C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3
Complete output (11 lines):
running install
running build
running build_ext
building 'mmh3' extension
creating build
creating build\temp.win-amd64-3.8
creating build\temp.win-amd64-3.8\Release
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD "-Ic:\users\saurabh bhandari\appdata\local\programs\python\python38\include" "-Ic:\users\saurabh bhandari\appdata\local\programs\python\python38\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\include" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.8\Release\mmh3module.obj
mmh3module.cpp
mmh3module.cpp(12): fatal error C1083: Cannot open include file: 'stdio.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.26.28801\bin\HostX86\x64\cl.exe' failed with exit status 2
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"'; file='"'"'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-install-ukiszg0q\mmh3\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Saurabh bhandari\AppData\Local\Temp\pip-record-iye5vtng\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\saurabh bhandari\appdata\local\programs\python\python38\Include\mmh3' Check the logs for full command output.

Cannot install mmh3 via pip command windows

I have recently come into the same issue as here which was closed.

Please see:

C:\Users\Luca>python -m pip install murmurhash3
Collecting murmurhash3
  Using cached https://files.pythonhosted.org/packages/b5/f4/1f9c4851667a2541bd151b8d9efef707495816274fada365fa6a31085a32/murmurhash3-2.3.5.tar.gz
Building wheels for collected packages: murmurhash3
  Running setup.py bdist_wheel for murmurhash3 ... error
  Complete output from command C:\Users\Luca\Anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Luca\\AppData\\Local\\Temp\\pip-install-0ftrk0aa\\murmurhash3\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d C:\Users\Luca\AppData\Local\Temp\pip-wheel-6_gzb5c8 --python-tag cp37:
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  creating build
  creating build\temp.win-amd64-3.7
  creating build\temp.win-amd64-3.7\Release
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\Luca\Anaconda3\include -IC:\Users\Luca\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.7\Release\mmh3module.obj
  mmh3module.cpp
  c:\users\luca\appdata\local\temp\pip-install-0ftrk0aa\murmurhash3\murmur_hash_3.hpp(5): error C2371: 'uint32_t': redefinition; different basic types
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
  mmh3module.cpp(9): error C2371: 'int32_t': redefinition; different basic types
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(19): note: see declaration of 'int32_t'
  mmh3module.cpp(12): error C2371: 'uint32_t': redefinition; different basic types
  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

  ----------------------------------------
  Failed building wheel for murmurhash3
  Running setup.py clean for murmurhash3
Failed to build murmurhash3
Installing collected packages: murmurhash3
  Running setup.py install for murmurhash3 ... error
    Complete output from command C:\Users\Luca\Anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Luca\\AppData\\Local\\Temp\\pip-install-0ftrk0aa\\murmurhash3\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Luca\AppData\Local\Temp\pip-record-j4aoi9ln\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build\temp.win-amd64-3.7
    creating build\temp.win-amd64-3.7\Release
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\Luca\Anaconda3\include -IC:\Users\Luca\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.7\Release\mmh3module.obj
    mmh3module.cpp
    c:\users\luca\appdata\local\temp\pip-install-0ftrk0aa\murmurhash3\murmur_hash_3.hpp(5): error C2371: 'uint32_t': redefinition; different basic types
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
    mmh3module.cpp(9): error C2371: 'int32_t': redefinition; different basic types
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(19): note: see declaration of 'int32_t'
    mmh3module.cpp(12): error C2371: 'uint32_t': redefinition; different basic types
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\stdint.h(23): note: see declaration of 'uint32_t'
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2

I need mmh3 for h2o4gpu. I open a question on stackexhange relating to this issue.

Thanks

Make the module hashlib-compliant

mmh3 is currently not hashlib-compliant. This makes it challenging to use it as a replacement for md5 or other cryptographic hashes. A wrapper can be built to make this module hashlib-compliant. One should be able to use the module as hashlib.md5.

update() -- update the current digest with an additional string
digest() -- return the current digest value
hexdigest() -- return the current digest as a string of hexadecimal digits
intdigest() -- return the current digest as an integer
copy() -- return a copy of the current mmh3 object
reset() -- reset state

get murmurhash3 of binary file in python

getting the murmur3 hash of a text file is trivial,
and i can get the murmur2 hash of binary files,
see https://github.com/milahu/murmurhash-cli-python

how to get the murmur3 hash of a binary file?

there is https://pypi.org/project/mmh3-binary/ but its an "empty fork"

expected API

#!/usr/bin/env python3

import mmh3

fd = open('/bin/sh', 'rb')
hash = mmh3.hash_from_buffer(fd)

fd is an io.BufferedReader

ideally, avoid passing a bytes array ... this should support "a million gigabyte" files in theory,
so the bytes should be "streamed" or "piped" into the mmh3 function

currently, mmh3 says

mmh3.hash_from_buffer(fd)
TypeError: a bytes-like object is required, not '_io.BufferedReader'

(Bazel) AttributeError: module 'mmh3' has no attribute 'hash'

Consider: mmh3.hash("Hello World").

Expected behavior: returns 427197390

Actual behavior: raises exception AttributeError: module 'mmh3' has no attribute 'hash'

Regression: This works in version 4.0.0. The error is triggered in version 4.0.1.

Environment: Curiously, this seems to happen when running the test through Bazel, not when installing into a virtual environment. Not sure if the bug is on the mmh3 side or the Bazel side, but something changed between 4.0.0 and 4.0.1. Can you help me figure out what?

To reproduce, get the gist from https://gist.github.com/vonschultz/18b4e58a697d56c8cc421528e0a4ef13 and run

bazelisk test --test_output=streamed //...

Get bazelisk from https://github.com/bazelbuild/bazelisk/releases if you don't already have it.

I'm running Ubuntu 20.04.

return different hash between c++ and python

cpp: (compile with the code https://github.com/hajimes/mmh3/blob/master/MurmurHash3.h)
uint32_t MurmurHash32(const void* key, size_t len) {
uint32_t hash;
MurmurHash3_x86_32(key, (int)len, 0, &hash);
std::cout << "site= " << key << ", code= " << code << std::endl;
return hash;
}

// output: (import mmh3)
site= www.taobao.com, code= 4076543410

python:
import mmh3
res = mmh3.hash(domain, signed=False)
print 'site= %s, code= %s' % (domain, res)

// output:
site= www.taobao.com, code= 3707551990

why the same site "www.taobao.com" get 4076543410 in cpp while 3707551990 in python?

hash128 returns unsigned int

I was using mmh3 as part of a project, and was getting invalid values when I tried to rescaled the hashes into the range [0, 1]. Turns out, mmh3.hash128 was returning unsigned integers, not signed integers as the documentation suggests.

I was using Python 3.6.2 and mmh3 2.4.

Python 3.10 support?

Hi, would it be possible to add wheels or build support for Python 3.10? I ran into this problem when trying to build from source:

#19 51.73     Running setup.py install for mmh3: started
#19 52.09     Running setup.py install for mmh3: finished with status 'error'
#19 52.09     ERROR: Command errored out with exit status 1:
#19 52.09      command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vjnuvpe7/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/mmh3
#19 52.09          cwd: /tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/
#19 52.09     Complete output (12 lines):
#19 52.09     running install
#19 52.09     /usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
#19 52.09       warnings.warn(
#19 52.09     running build
#19 52.09     running build_ext
#19 52.09     building 'mmh3' extension
#19 52.09     creating build
#19 52.09     creating build/temp.linux-x86_64-3.10
#19 52.09     gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c MurmurHash3.cpp -o build/temp.linux-x86_64-3.10/MurmurHash3.o
#19 52.09     gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
#19 52.09     compilation terminated.
#19 52.09     error: command '/usr/bin/gcc' failed with exit code 1
#19 52.09     ----------------------------------------
#19 52.09 ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"'; __file__='"'"'/tmp/pip-install-29v4jtx8/mmh3_e21cf43dc5144a5ca51d99e23e0f7752/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-vjnuvpe7/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/mmh3 Check the logs for full command output.

mmh3 not building anymore?

mmh3 is not compiling when installing through pip via packet or git.
I want to use it for the Bloom filter implemented here:
http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/

Specs of my system:
gcc: 5.1.0
os: archlinux
pip: 7.0.3
git: 2.4.2
python: 3.4.3 / 2.7.9

Output from pip

> sudo pip install mmh3
Collecting mmh3
  Using cached mmh3-2.3.tar.gz
Installing collected packages: mmh3
  Running setup.py install for mmh3
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ia20ohxq/mmh3/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dmuabtur-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build/temp.linux-x86_64-3.4
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I/usr/include/python3.4m -c mmh3module.cpp -o build/temp.linux-x86_64-3.4/mmh3module.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4 -fPIC -I/usr/include/python3.4m -c MurmurHash3.cpp -o build/temp.linux-x86_64-3.4/MurmurHash3.o
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:60:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint32_t getblock ( const uint32_t * p, int i )
     ^
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:65:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i )
     ^
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:73:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint32_t fmix ( uint32_t h )
     ^
    MurmurHash3.cpp:34:31: error: expected constructor, destructor, or type conversion before ‘(’ token
     #define FORCE_INLINE attribute((always_inline))
                                   ^
    MurmurHash3.cpp:86:1: note: in expansion of macro ‘FORCE_INLINE’
     FORCE_INLINE uint64_t fmix ( uint64_t k )
     ^
    MurmurHash3.cpp: In function ‘void MurmurHash3_x86_32(const void*, int, uint32_t, void*)’:
    MurmurHash3.cpp:117:36: error: ‘getblock’ was not declared in this scope
         uint32_t k1 = getblock(blocks,i);
                                        ^
    MurmurHash3.cpp:148:15: error: ‘fmix’ was not declared in this scope
       h1 = fmix(h1);
                   ^
    MurmurHash3.cpp: In function ‘void MurmurHash3_x86_128(const void*, int, uint32_t, void*)’:
    MurmurHash3.cpp:178:40: error: ‘getblock’ was not declared in this scope
         uint32_t k1 = getblock(blocks,i*4+0);
                                            ^
    MurmurHash3.cpp:244:15: error: ‘fmix’ was not declared in this scope
       h1 = fmix(h1);
                   ^
    MurmurHash3.cpp: In function ‘void MurmurHash3_x64_128(const void*, int, uint32_t, void*)’:
    MurmurHash3.cpp:279:40: error: ‘getblock’ was not declared in this scope
         uint64_t k1 = getblock(blocks,i*2+0);
                                            ^
    MurmurHash3.cpp:329:15: error: ‘fmix’ was not declared in this scope
       h1 = fmix(h1);
                   ^
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-ia20ohxq/mmh3/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-dmuabtur-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-ia20ohxq/mmh3

Golang Compatibility

Hello @hajimes thank you so much for providing this murmur3 implementation in Python and for all the work you do in open source; we really appreciate being able to use your library!

I was recently investigating some compatibility issues with the output produced by mmh3 and a Go library we were using internally. I asked the question on Stackoverflow and got a response back:

https://stackoverflow.com/questions/75921577/murmur3-hash-compatibility-between-go-and-python

It looks like the order of the two uint64s returned by the 128-bit algorithm is reversed between the two libraries; but it's simple enough to modify the returned results in either Go or Python to produce compatible hashes.

I was wondering; would you like me to open a PR to update the README with the compatibility information? Is there any other docs I should update in the PR?

Additionally, if there is any way to reverse the order order of the uint64s returned by murmur3 (e.g. with an argument to hash128 or hash_bytes) I'd be happy to open a PR for that as well. Let me know how you'd like to proceed!

Accept Iterable for Performance

Hi, thanks so much for this very useful library! I'm using it to randomize keys for billions of objects to create condensed files that contain groups of thousands to millions of objects at a time.

https://github.com/google/neuroglancer/blob/056a3548abffc3c76c93c7a906f1603ce02b5fa3/src/neuroglancer/datasource/precomputed/sharded.md

It's not critical, but there is a bottleneck step in the front of my Python processing pipeline where the hash is applied to all object labels at once to figure out how to assign them for further processing. The hash function is dominating this calculation.

Function: murmur at line 25
Time per Hit in microseconds
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    25                                           @profile
    26                                           def murmur(x):
    27     69734      74332.0      1.1      1.6    y = uint64(x).tobytes()
    28     69734    4381932.0     62.8     97.2    y = mmh3.hash64(y, x64arch=False)
    29     69733      52731.0      0.8      1.2    return uint64(y[0])

Total time: 5.44635 s
File: REDACTED
Function: compute_shard_location at line 145
Time per Hit in microseconds
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   145                                             @profile
   146                                             def compute_shard_location(self, key):
   147     69734      99805.0      1.4      1.8      chunkid = uint64(key) >> uint64(self.preshift_bits)
   148     69734    4703010.0     67.4     86.4      chunkid = self.hashfn(chunkid)
   149     69733      60072.0      0.9      1.1      minishard_number = uint64(chunkid & self.minishard_mask)
   150     69733      97034.0      1.4      1.8      shard_number = uint64((chunkid & self.shard_mask) >> uint64(self.minishard_bits))
   151     69733     312626.0      4.5      5.7      shard_number = format(shard_number, 'x').zfill(int(np.ceil(self.shard_bits / 4.0)))
   152     69733     102740.0      1.5      1.9      remainder = chunkid >> uint64(self.minishard_bits + self.shard_bits)
   153                                           
   154     69733      71060.0      1.0      1.3      return ShardLocation(shard_number, minishard_number, remainder)

It could be possible to thread this processing, but Python has the GIL. Multiprocessing could work, though the picke/unpickle will also take some time. I was thinking that a neat way to increase thoughput would be to process multiple hashes at once in C, that is, accept both a scalar and an iterator as input to the function. This would allow for the compiler to autovetorize and also avoid Python/C overheads. I'm getting ~66.5k hashes/sec on Apple Silicon M1 ARM64 currently.

I'm thinking of an interface similar to this. The second should be some buffer that is easy to read into numpy.

(lower, upper) = mmh3.hash64(some_bytes, x64arch=False)
[l1,h1,l2,h2] = mmh3.hash64(iterable_containing_bytes, x64arch=False)

Thank you for your consideration and for all the effort you've put into this library!

Remove __version__ in mmh3module.cpp

I plan to remove the line

PyModule_AddStringConstant(module, "__version__", "3.1.0");

in mmh3module.cpp in the next non-trivial update.

The __version__ constant was introduced in 2.1 (2013-02-25), since it was in vogue back then and the only (iirc) way to get the version number of a module from within a python script.

However, Python 3.8 officially introduced importlib.metadata, which is no longer provisional in Python 3.10. Python 3.7 still needs to pip install importlib-metadata, but it will be in EOL soon. Therefore, there is no need to keep the __version__ constant anymore. Plus, keeping the same info in multiple files is a bad (very bad indeed) engineering practice.

See also
https://stackoverflow.com/a/72168209

I will bump up the version to 4.0.0 then, because the removal breaks backward compatibility, however minor it is.

Documentation of why hash64 returns two values

As a user it is very confusing why mmh3.hash64 returns two 64-bit values, whereas hash and hash128 do not. Are we supposed to just pick one of them? Is there a recommended way to combine them?

Decouple the original code from this repository

Currently I'm working on refactoring the library to decouple files whose large part traces back to the original C++ code (specifically, murmurhash3.c and murmurhash3.h) from this repository.

The update is to adress the pre-review process of the Journal of Open Source Software (JOSS), whose managing EiC (Daniel S. Katz) thoughtfully pointed out that it was not clear which part of this library was my (and other contributors') own contributions.
openjournals/joss-reviews#5487

I proposed to use a git submodule to refer to Appleby's repository, and then write a script that converts the original C++ files to more portable C code at compile time.

It turns out, however, readability may be degraded to some extent in my current ad hoc implementation, which may also impact on easiness of extension. Solving these issues will be left for future updates. On the other hand, this update will clarify the extent of the authorship of code and solve the license issue #45.

mmh3 not 64-bit ready

mmh3 cannot hash data larger than 2**31 bytes:

>>> import mmh3
>>> import numpy as np
>>> a = np.zeros(2**30, dtype=np.int8)
>>> mmh3.hash_bytes(a)
b"O\xc5\xf1\xf2\x80';s\x1b\xddc\xa1E\x8d\xe3r"
>>> a = np.zeros(2**32, dtype=np.int8)
>>> mmh3.hash_bytes(a)
Traceback (most recent call last):
  File "<ipython-input-9-918a38167947>", line 1, in <module>
    mmh3.hash_bytes(a)
OverflowError: size does not fit in an int

The solution is to either use the s* code instead of s# in PyArg_ParseTuple(), or define the PY_SSIZET_CLEAN macro and change size fields from int to Py_ssize_t. See https://docs.python.org/2.7/c-api/arg.html . I can also make a PR if you want.

Also, there's no test suite?

Port MurmurHash3 from C++ to C

In the course of implementing hashlib-compliant interfaces (#39), I plan to port the main code of MurmurHash3 from C++ (as originally written by Austin Appleby) to C for portability.

This actually can be done with few hassles, thanks to PEP 7 updates for Python >= 3.6; versions before 3.6 had to conform to C89 and did not officially support <stdint.h> or <inttypes.h>.

In addition, I will relicense these code from the public domain to MIT. The intent is purely for resolving issues related to the public domain and its kin licenses (#43). The text of the original public domain notice will be left for attribution and recognition.

Hash function doesn't recognize signed as a keyword argument

Hi there,

I freshly downloaded mmh through Anaconda, and the mmh.hash function provides the following error message when I run "element= mmh3.hash(element,signed =False)".

TypeError: 'signed' is an invalid keyword argument for this function

Is this an issue with the Anaconda version or something else?

Typing: hash64 missing x64arch argument

The code for mmh3_hash64 in mmh3module.c has a x64arch argument, but the typing file __init__.pyi does not declare this as an argument to hash64. The result is a mypy error if code uses the x64arch keyword argument.

Does not compile in OSX Mavericks

$ pip install mmh3
Downloading/unpacking mmh3
  Running setup.py egg_info for package mmh3

Installing collected packages: mmh3
  Running setup.py install for mmh3
    building 'mmh3' extension
    cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c mmh3module.cpp -o build/temp.macosx-10.9-intel-2.7/mmh3module.o
    clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
    clang: note: this will be a hard error (cannot be downgraded to a warning) in the future
    error: command 'cc' failed with exit status 1
    Complete output from command /Users/andre/work/penv/discosite/bin/python -c "import setuptools;__file__='/Users/andre/work/penv/discosite/build/mmh3/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /var/folders/xq/yt9mr8t52cj9dqjsn8v0x5z40000gn/T/pip-kAdgDV-record/install-record.txt --install-headers /Users/andre/work/penv/discosite/include/site/python2.7:
    running install

running build

running build_ext

building 'mmh3' extension

cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c mmh3module.cpp -o build/temp.macosx-10.9-intel-2.7/mmh3module.o

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

clang: note: this will be a hard error (cannot be downgraded to a warning) in the future

error: command 'cc' failed with exit status 1

----------------------------------------
Command /Users/andre/work/penv/discosite/bin/python -c "import setuptools;__file__='/Users/andre/work/penv/discosite/build/mmh3/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /var/folders/xq/yt9mr8t52cj9dqjsn8v0x5z40000gn/T/pip-kAdgDV-record/install-record.txt --install-headers /Users/andre/work/penv/discosite/include/site/python2.7 failed with error code 1 in /Users/andre/work/penv/discosite/build/mmh3
Storing complete log in /Users/andre/.pip/pip.log

mmh3

I also encountered the problem of mm3. I have been very distressed looking for a solution. The errors I see here are not the same as mine. Can you help me?
——————————————————————————————————————————
1 warning generated.
creating build/lib.macosx-10.6-intel-3.6
/usr/bin/clang++ -bundle -undefined dynamic_lookup -arch i386 -arch x86_64 -g -L/usr/local/opt/openssl/lib -I/usr/local/opt/openssl/include build/temp.macosx-10.6-intel-3.6/mmh3module.o build/temp.macosx-10.6-intel-3.6/MurmurHash3.o -o build/lib.macosx-10.6-intel-3.6/mmh3.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command '/usr/bin/clang++' failed with exit status 1

mmh3 fails to install (via pip) on macOS Mojave

Collecting mmh3
  Using cached mmh3-2.5.1.tar.gz (9.8 kB)
Building wheels for collected packages: mmh3
  Building wheel for mmh3 (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-wheel-ks30qw3g
       cwd: /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/
  Complete output (12 lines):
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  creating build
  creating build/temp.macosx-10.9-x86_64-3.7
  gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c mmh3module.cpp -o build/temp.macosx-10.9-x86_64-3.7/mmh3module.o
  mmh3module.cpp:12:19: fatal error: stdio.h: No such file or directory
   #include <stdio.h>
                     ^
  compilation terminated.
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for mmh3
  Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
    Running setup.py install for mmh3 ... error
    ERROR: Command errored out with exit status 1:
     command: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-record-t_fjan2e/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aeaeaeae/venv/dslearn/bin/../include/site/python3.7/mmh3
         cwd: /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/
    Complete output (12 lines):
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build/temp.macosx-10.9-x86_64-3.7
    gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c mmh3module.cpp -o build/temp.macosx-10.9-x86_64-3.7/mmh3module.o
    mmh3module.cpp:12:19: fatal error: stdio.h: No such file or directory
     #include <stdio.h>
                       ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/aeaeaeae/venv/dslearn/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"'; __file__='"'"'/private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-install-efejad8_/mmh3/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/20/9yq706c906g0l3dc2cxpm2xw0000gn/T/pip-record-t_fjan2e/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aeaeaeae/venv/dslearn/bin/../include/site/python3.7/mmh3 Check the logs for full command output.
WARNING: You are using pip version 20.1.1; however, version 20.2.2 is available.
You should consider upgrading via the '/Users/aeaeaeae/venv/dslearn/bin/python3.7 -m pip install --upgrade pip' command.

hash_bytes broken in 2.5

It looks like this change broke the hash_bytes function in 2.5:

3bf1e5a - Add a keyword argument signed (2 days ago) <Hajime Senuma> 
diff --git a/mmh3module.cpp b/mmh3module.cpp
index ef49083..ba771ee 100644
--- a/mmh3module.cpp
+++ b/mmh3module.cpp
@@ -143,7 +155,7 @@ mmh3_hash_bytes(PyObject *self, PyObject *args, PyObject *keywds)
     static char *kwlist[] = {(char *)"key", (char *)"seed",
       (char *)"x64arch", NULL};
 
-    if (!PyArg_ParseTupleAndKeywords(args, keywds, "s#|IB", kwlist,
+    if (!PyArg_ParseTupleAndKeywords(args, keywds, "s#|IBB", kwlist,
         &target_str, &target_str_len, &seed, &x64arch)) {
         return NULL;

There is no additional is_signed kwarg. I'm not sure if there should be. The result is this error every time the hash_bytes function is used:

RuntimeError: more argument specifiers than keyword list entries (remaining format:'B')

Switch license from CC0 to MIT

I plan to switch the license of this project from CC0 to MIT in the very near future.

The adoption of CC0 was an homage to Austin Appleby, the inventor of the MurmurHash3 algorithm, who published the code under the public domain.

However, CC0 is not recognized as an OSI-approved license, as it was withdrew in 2012 from the review process. Besides, in 2022, the Fedora community said they planned to demote the status of CC0 from "good" to "allowed-content only".

Considering these issues, I made a decision to adopt the MIT License, a simple yet one of the most popular OSI-approved permissive licenses.

Unable to build mmh3 on macOS Mojave

OS X Version: 10.14.2

$ pip install mmh3
Collecting mmh3
  Using cached https://files.pythonhosted.org/packages/fa/7e/3ddcab0a9fcea034212c02eb411433db9330e34d626360b97333368b4052/mmh3-2.5.1.tar.gz
Building wheels for collected packages: mmh3
  Running setup.py bdist_wheel for mmh3 ... error
  Complete output from command /Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-wheel-7q_m8vvr --python-tag cp36:
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  creating build
  creating build/temp.macosx-10.7-x86_64-3.6
  gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c mmh3module.cpp -o build/temp.macosx-10.7-x86_64-3.6/mmh3module.o
  warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
  1 warning generated.
  gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o
  warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
  1 warning generated.
  creating build/lib.macosx-10.7-x86_64-3.6
  g++ -bundle -undefined dynamic_lookup -L/Users/pranjal/anaconda3/lib -arch x86_64 -L/Users/pranjal/anaconda3/lib -arch x86_64 -Qunused-arguments -Qunused-arguments -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/mmh3module.o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/mmh3.cpython-36m-darwin.so
  clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
  ld: library not found for -lstdc++
  clang: error: linker command failed with exit code 1 (use -v to see invocation)
  error: command 'g++' failed with exit status 1

  ----------------------------------------
  Failed building wheel for mmh3
  Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
  Running setup.py install for mmh3 ... error
    Complete output from command /Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-record-38bwkbu3/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_ext
    building 'mmh3' extension
    creating build
    creating build/temp.macosx-10.7-x86_64-3.6
    gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c mmh3module.cpp -o build/temp.macosx-10.7-x86_64-3.6/mmh3module.o
    warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
    1 warning generated.
    gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -I/Users/pranjal/anaconda3/include/python3.6m -c MurmurHash3.cpp -o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o
    warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]
    1 warning generated.
    creating build/lib.macosx-10.7-x86_64-3.6
    g++ -bundle -undefined dynamic_lookup -L/Users/pranjal/anaconda3/lib -arch x86_64 -L/Users/pranjal/anaconda3/lib -arch x86_64 -Qunused-arguments -Qunused-arguments -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/mmh3module.o build/temp.macosx-10.7-x86_64-3.6/MurmurHash3.o -o build/lib.macosx-10.7-x86_64-3.6/mmh3.cpython-36m-darwin.so
    clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
    ld: library not found for -lstdc++
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    error: command 'g++' failed with exit status 1

    ----------------------------------------
Command "/Users/pranjal/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-record-38bwkbu3/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/jd/r4rs3fq51qv90t08ljqpkkhc0000gp/T/pip-install-qi111dnd/mmh3/

Can not compile in windows

C:\Users\10324\Downloads\mmh3-master\mmh3-master>python setup.py build
running build
running build_ext
building 'mmh3' extension
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\10324\AppData\Local\Programs\Python\Python36\include -IC:\Users\10324\AppData\Local\Programs\Python\Python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /EHsc /Tpmmh3module.cpp /Fobuild\temp.win-amd64-3.6\Release\mmh3module.obj
mmh3module.cpp
c:\users\10324\downloads\mmh3-master\mmh3-master\MurmurHash3.h(16): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(23): note: see declaration of 'uint32_t'
mmh3module.cpp(14): error C2371: 'int32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(19): note: see declaration of 'int32_t'
mmh3module.cpp(17): error C2371: 'uint32_t': redefinition; different basic types
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\stdint.h(23): note: see declaration of 'uint32_t'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2

Run at Win10 64 bit

Support for hashing data > 16GB

FORCE_INLINE uint64_t getblock ( const uint64_t * p, int i ) uses a 32-bit integer i for size, this fails for hashing huge blocks. Change to FORCE_INLINE uint64_t getblock ( const uint64_t * p, Py_ssize_t i ).

Here is the pull request:
#34

Make hash functions endian-neutral

The original c++ code of MurmurHash3 by Austin Appleby is endian-sensitive. The advantage of this style is, first and foremost, performance.

However, inconsistency between platforms may cause problems in various fields, e.g., NLP (cf. explosion/murmurhash#26).

In addition, several IoT search engines (including Shodan) use a little-endian variant mmh3 value as the fingerprint of a favicon.

To guarantee portability and consistency across platforms, mmh3 will use little-endian variant values for all architectures from version 4.0.0, even though it will make the hash functions slow on big-endian architectures.

Chained hashing not working as expected

Hello Hajime,

I guess i am doing something wrong - so this is probably not a real issue.
I am trying to hash big files an reading them in chunks for obvious reasons.
As a test i ran the following:

>>> mmh3.hash128('foobar', 0, signed = True)
155033341411922636178181560508455868997
>>> mmh3.hash128('bar',mmh3.hash128('foo', 0,signed = True), signed = True)
144772797738558108830387305245635675932

I expected the hash to be the same in both cases.
Am I missinterpreting the seed value - or is there another way of chaining hashes in murmur in general?

Thanks & Regards,

Martin

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Hi, I got this error under python 3.10

Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mmh3
>>> a=mmh3.hash("abc", 1234)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
>>>

Any solution or code hacking?

Status of project?

Just a friendly check, if there is still anyone here maintaining this project or if anyone knows more about the project status.

  • Last commit to project was almost 2 years ago.
  • There are issues without any response for a year and some open oneliner-PRs that should be easy to take in and that have been open for 6 months+, for example #35, that now makes python version upgrades more difficult if you are not building native modules yourself.
  • Hajime, the project owner, has not had any Github activity at all for over one year.
  • I sent an email to Hajime asking about the future plans, one week ago, and have not received any response. (Please others, let that email and this issue be the only channel of such reminders to avoid nagging the owner.)

While the code itself could be considered "done", the wish for prebuilt wheels will continue.
What is the recommended way forward for users mmh3? Build your own wheels? Alternative modules? Alternative distributions?

Does anyone know more?

Unable to build on mmh

It looks like a previous issue resurfaced. Downloading through pip, I'm unable to build a wheel.

selection_001

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.