matsui528 / rii Goto Github PK
View Code? Open in Web Editor NEWFast and memory-efficient ANN with a subset-search functionality
License: MIT License
Fast and memory-efficient ANN with a subset-search functionality
License: MIT License
When uploading the files generated by python -m build
to pypi, I received the following error. How to fix it?
Summary:
Checking dist/rii-0.2.11-cp312-cp312-linux_x86_64.whl: PASSED
Checking dist/rii-0.2.11.tar.gz: PASSED
Uploading distributions to https://upload.pypi.org/legacy/
Uploading rii-0.2.11-cp312-cp312-linux_x86_64.whl
25l
0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.9 MB • --:-- • ?
0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.9 MB • --:-- • ?
4% ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.9 MB • 00:01 • 166.3 MB/s
8% ━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.9 MB • 00:01 • 1.8 MB/s
22% ━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.9 MB • 00:01 • 1.9 MB/s
22% ━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.4/1.9 MB • 00:01 • 1.9 MB/s
34% ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.7/1.9 MB • 00:01 • 1.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB • 00:00 • 3.8 MB/s
25hWARNING Error during upload. Retry with the --verbose option for more details.
ERROR HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
Binary wheel 'rii-0.2.11-cp312-cp312-linux_x86_64.whl' has an
unsupported platform tag 'linux_x86_64'.
pyproject.toml
#54 by following the official pybind guideline.dist
by python setup.py sdist
is now deprecated. I replaced it with python -m build
. #60python -m build
generates a file something like rii-0.2.11-cp312-cp312-linux_x86_64.whl
. But pypi doesn't accept this name. It should be a simpler filename such as rii-0.2.10.tar.gz
? Ref: previous successful filePip installs fail as of pip version 23.1 due to missing dependencies at build time. pyproject.toml should be added to specify setuptools and pybind.
PR incoming.
Some examples of application.
For example meta-data search by pandas and feature search by Rii (Fig. 5 in the paper):
import pandas as pd
import rii
# Read data
df = pd.read_csv('metadata.csv')
engine = pkl.load(open('rii_densenet.pkl', 'rb'))
# Metadata search
S = df[(df['data']<500) & (df['country']=='Egypt')]['ID']
S = np.sort(np.array(S)) # Target identifiers
# ANN for subset
q = # Read query feature
result = engine.query(q=q, target_ids=S, topk=3)
Currently, we cannot construct the Rii class by adding data sequentially. Should support something like:
a = rii.Rii(codec)
for x in X:
e.add_reconfigure(vecs=x.reshape(1, -1))
/opt/homebrew/Cellar/llvm/17.0.3/lib/clang/17/include/immintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"
14 | #error "This header is only meant to be used on x86 and x64 architecture"
Using clang version 17, MacOS 13.5.2, M2 Max CPU, Python 3.11.
Installing rii on a Mac M2 CPU fails, I'm assuming this happens with other ARM CPUs as well. Working on a PR.
In the readme it said Rii only works with Linux, I've used it in Mac with no issues and now trying to use it in Windows for a client and getting build error. Is there a way to build it in Windows (instructions please) or any plans for windows or multi-platform support in the near future
Under the Data addition and reconfiguration
section in Readme.md
# Add new vectors
X2 = np.random.random((1000, D)).astype(np.float32)
e.add(vecs=X2) # Now N is 11000
e.query(q=q) # Ok. (0.12 msec / query)
# However, if you add quite a lot of vectors, the search might become slower
# because the data structure has been optimized for the initial item size (N=10000)
X3 = np.random.random((1000000, D)).astype(np.float32)
e.add(vecs=X3) # A lot. Now N is 1011000
e.query(q=q) # Slower (0.96 msec/query)
The e.query triggers an AssertionError before calling reconfigure or add_configure
275 """
276 assert 0 < self.N # Make sure there are codes to be searched
--> 277 assert 0 < self.nlist # Make sure posting lists are available
278 assert method in ["auto", "linear", "ivf"]
279
AssertionError:
Should this be updated to use add_configure instead of add before the query call?
Some benchmarking scripts for SIFT1M/1B and Deep1B
It would be great if someone could make a logo for this repo :)
pip install rii -- fails on osx with clang error.
(.venv) 🦖loose-fit$ pip install rii
Collecting rii
Using cached https://files.pythonhosted.org/packages/0a/d5/9e7a32e612b7414c272c0277add9494065e022353d1500963ec2ff0cc7f5/rii-0.2.6.tar.gz
Requirement already satisfied: pybind11>=2.3 in ./.venv/lib/python3.7/site-packages (from rii) (2.4.3)
Requirement already satisfied: nanopq in ./.venv/lib/python3.7/site-packages (from rii) (0.1.8)
Requirement already satisfied: numpy in ./.venv/lib/python3.7/site-packages (from nanopq->rii) (1.18.0)
Requirement already satisfied: scipy in ./.venv/lib/python3.7/site-packages (from nanopq->rii) (1.4.1)
Building wheels for collected packages: rii
Building wheel for rii (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: <path>/.venv/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/setup.py'"'"'; __file__='"'"'/private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-wheel-hwefsd10 --python-tag cp37
cwd: /private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/
Complete output (47 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.14-x86_64-3.7
creating build/lib.macosx-10.14-x86_64-3.7/tests
copying tests/__init__.py -> build/lib.macosx-10.14-x86_64-3.7/tests
copying tests/test_rii.py -> build/lib.macosx-10.14-x86_64-3.7/tests
copying tests/context.py -> build/lib.macosx-10.14-x86_64-3.7/tests
creating build/lib.macosx-10.14-x86_64-3.7/rii
copying rii/rii.py -> build/lib.macosx-10.14-x86_64-3.7/rii
copying rii/__init__.py -> build/lib.macosx-10.14-x86_64-3.7/rii
running build_ext
creating var
creating var/folders
creating var/folders/vt
creating var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn
creating var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/llvm/include -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c /var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmp8p7ifavc.cpp -o var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmp8p7ifavc.o -std=c++17
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/llvm/include -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c /var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmpgibfqmnr.cpp -o var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmpgibfqmnr.o -fvisibility=hidden
building 'main' extension
creating build/temp.macosx-10.14-x86_64-3.7
creating build/temp.macosx-10.14-x86_64-3.7/src
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/llvm/include -UNDEBUG -I<path>/.venv/bin/../include/site/python3.7 -I<path>/.venv/bin/../include/site/python3.7 -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/main.cpp -o build/temp.macosx-10.14-x86_64-3.7/src/main.o -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO="0.2.6" -std=c++17 -fvisibility=hidden -march=native -mtune=native -Ofast
In file included from src/main.cpp:1:
In file included from <path>/.venv/bin/../include/site/python3.7/pybind11/pybind11.h:44:
In file included from <path>/.venv/bin/../include/site/python3.7/pybind11/attr.h:13:
<path>/.venv/bin/../include/site/python3.7/pybind11/cast.h:579:34: error: aligned allocation function of type 'void *(std::size_t, std::align_val_t)' is only available on macOS 10.14 or newer
vptr = ::operator new(type->type_size,
^
<path>/.venv/bin/../include/site/python3.7/pybind11/cast.h:579:34: note: if you supply your own aligned allocation functions, use -faligned-allocation to silence this diagnostic
In file included from src/main.cpp:1:
<path>/.venv/bin/../include/site/python3.7/pybind11/pybind11.h:1008:11: error: 'operator delete' is unavailable: introduced in macOS 10.12
::operator delete(p, s, std::align_val_t(a));
^
/usr/local/opt/llvm/bin/../include/c++/v1/new:208:74: note: 'operator delete' has been explicitly marked unavailable here
_LIBCPP_OVERRIDABLE_FUNC_VIS _LIBCPP_AVAILABILITY_SIZED_NEW_DELETE void operator delete(void* __p, std::size_t __sz, std::align_val_t) _NOEXCEPT;
^
In file included from src/main.cpp:1:
<path>/.venv/bin/../include/site/python3.7/pybind11/pybind11.h:1010:11: error: 'operator delete' is unavailable: introduced in macOS 10.12
::operator delete(p, s);
^
/usr/local/opt/llvm/bin/../include/c++/v1/new:191:74: note: 'operator delete' has been explicitly marked unavailable here
_LIBCPP_OVERRIDABLE_FUNC_VIS _LIBCPP_AVAILABILITY_SIZED_NEW_DELETE void operator delete(void* __p, std::size_t __sz) _NOEXCEPT;
^
3 errors generated.
error: command 'clang' failed with exit status 1
----------------------------------------
ERROR: Failed building wheel for rii
Running setup.py clean for rii
Failed to build rii
Installing collected packages: rii
Running setup.py install for rii ... error
ERROR: Command errored out with exit status 1:
command: <path>/.venv/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/setup.py'"'"'; __file__='"'"'/private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-record-oy9z3y_v/install-record.txt --single-version-externally-managed --compile --install-headers <path>/.venv/bin/../include/site/python3.7/rii
cwd: /private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/
Complete output (42 lines):
running install
running build
running build_py
creating build
creating build/lib.macosx-10.14-x86_64-3.7
creating build/lib.macosx-10.14-x86_64-3.7/tests
copying tests/__init__.py -> build/lib.macosx-10.14-x86_64-3.7/tests
copying tests/test_rii.py -> build/lib.macosx-10.14-x86_64-3.7/tests
copying tests/context.py -> build/lib.macosx-10.14-x86_64-3.7/tests
creating build/lib.macosx-10.14-x86_64-3.7/rii
copying rii/rii.py -> build/lib.macosx-10.14-x86_64-3.7/rii
copying rii/__init__.py -> build/lib.macosx-10.14-x86_64-3.7/rii
running build_ext
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/llvm/include -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c /var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmpabpfuxpv.cpp -o var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmpabpfuxpv.o -std=c++17
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/llvm/include -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c /var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmpedhdv9js.cpp -o var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/tmpedhdv9js.o -fvisibility=hidden
building 'main' extension
creating build/temp.macosx-10.14-x86_64-3.7
creating build/temp.macosx-10.14-x86_64-3.7/src
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/opt/llvm/include -UNDEBUG -I<path>/.venv/bin/../include/site/python3.7 -I<path>/.venv/bin/../include/site/python3.7 -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/main.cpp -o build/temp.macosx-10.14-x86_64-3.7/src/main.o -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO="0.2.6" -std=c++17 -fvisibility=hidden -march=native -mtune=native -Ofast
In file included from src/main.cpp:1:
In file included from <path>/.venv/bin/../include/site/python3.7/pybind11/pybind11.h:44:
In file included from <path>/.venv/bin/../include/site/python3.7/pybind11/attr.h:13:
<path>/.venv/bin/../include/site/python3.7/pybind11/cast.h:579:34: error: aligned allocation function of type 'void *(std::size_t, std::align_val_t)' is only available on macOS 10.14 or newer
vptr = ::operator new(type->type_size,
^
<path>/.venv/bin/../include/site/python3.7/pybind11/cast.h:579:34: note: if you supply your own aligned allocation functions, use -faligned-allocation to silence this diagnostic
In file included from src/main.cpp:1:
<path>/.venv/bin/../include/site/python3.7/pybind11/pybind11.h:1008:11: error: 'operator delete' is unavailable: introduced in macOS 10.12
::operator delete(p, s, std::align_val_t(a));
^
/usr/local/opt/llvm/bin/../include/c++/v1/new:208:74: note: 'operator delete' has been explicitly marked unavailable here
_LIBCPP_OVERRIDABLE_FUNC_VIS _LIBCPP_AVAILABILITY_SIZED_NEW_DELETE void operator delete(void* __p, std::size_t __sz, std::align_val_t) _NOEXCEPT;
^
In file included from src/main.cpp:1:
<path>/.venv/bin/../include/site/python3.7/pybind11/pybind11.h:1010:11: error: 'operator delete' is unavailable: introduced in macOS 10.12
::operator delete(p, s);
^
/usr/local/opt/llvm/bin/../include/c++/v1/new:191:74: note: 'operator delete' has been explicitly marked unavailable here
_LIBCPP_OVERRIDABLE_FUNC_VIS _LIBCPP_AVAILABILITY_SIZED_NEW_DELETE void operator delete(void* __p, std::size_t __sz) _NOEXCEPT;
^
3 errors generated.
error: command 'clang' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: <path>/.venv/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/setup.py'"'"'; __file__='"'"'/private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-install-d76_zncf/rii/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/vt/ybml8dls2v54ywpws5zpfqgm0000gn/T/pip-record-oy9z3y_v/install-record.txt --single-version-externally-managed --compile --install-headers <path>/.venv/bin/../include/site/python3.7/rii Check the logs for full command output.
Merge two instances (with the same fine_quantizer). e.g.:
e0 = rii.Rii(codec)
e1 = copy.deepcopy(e0)
e0.add_configure(X1)
e1.add_configure(X2)
e0.merge(e1) # contain both X1 and X2
With this, one can add data to each instance in parallel on different computers, and merge them in the final instance.
Question: How to handle posting lists and coarse centers? One answer is to use the first one (e0.coarse_centers), and forget about e1 completely (in this case, it will be recommended to run e0.reconfigure again after the merge)
Currently, an index for each item is automatically assigned sequentially. With a wrapper with an index mapper, any index can be assigned for each item, e.g.:
e = rii.RiiIDMap(codec)
vecs = np.random.random((3, D)).astype(np.float32)
ids = [1254, 23, 445]
e.add_configure(vecs=X, ids=ids)
This will support a remove function as well.
e.remove(id=23)
Ref: https://github.com/facebookresearch/faiss/wiki/Pre--and-post-processing#the-indexidmap
When installing the rii package for the first time via pip install rii
, the package can be installed as usual but we get the following error messages.
Collecting rii
Downloading https://files.pythonhosted.org/packages/37/79/379308df392bc07d1c51b288884c2e85ef193c67f7a6908181392778a4e0/rii-0.2.3.tar.gz
Collecting pybind11>=2.2 (from rii)
Downloading https://files.pythonhosted.org/packages/f2/7c/e71995e59e108799800cb0fce6c4b4927914d7eada0723dd20bae3b51786/pybind11-2.2.4-py2.py3-none-any.whl (145kB)
Collecting nanopq (from rii)
Downloading https://files.pythonhosted.org/packages/25/2b/afadc20d14bffe91560543826fee41c4d87ab540b2191a9d5dfdeb783304/nanopq-0.1.6-py3-none-any.whl
Requirement already satisfied: numpy in ./anaconda/lib/python3.6/site-packages (from nanopq->rii) (1.15.2)
Requirement already satisfied: scipy in ./anaconda/lib/python3.6/site-packages (from nanopq->rii) (1.1.0)
Building wheels for collected packages: rii
Running setup.py bdist_wheel for rii: started
Running setup.py bdist_wheel for rii: finished with status 'error'
Complete output from command /home/ubuntu/anaconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-q9a9hf6g/rii/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-tnix8m2r --python-tag cp36:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/tests
copying tests/test_rii.py -> build/lib.linux-x86_64-3.6/tests
copying tests/context.py -> build/lib.linux-x86_64-3.6/tests
copying tests/__init__.py -> build/lib.linux-x86_64-3.6/tests
creating build/lib.linux-x86_64-3.6/rii
copying rii/rii.py -> build/lib.linux-x86_64-3.6/rii
copying rii/__init__.py -> build/lib.linux-x86_64-3.6/rii
running build_ext
creating tmp
gcc -pthread -B /home/ubuntu/anaconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/anaconda/include/python3.6m -c /tmp/tmpgabfpgat.cpp -o tmp/tmpgabfpgat.o -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/ubuntu/anaconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ubuntu/anaconda/include/python3.6m -c /tmp/tmp1ecn6a1p.cpp -o tmp/tmp1ecn6a1p.o -fvisibility=hidden
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
building 'main' extension
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-q9a9hf6g/rii/setup.py", line 128, in <module>
zip_safe=False
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/setuptools/__init__.py", line 140, in setup
return distutils.core.setup(**attrs)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/ubuntu/anaconda/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 188, in run
self.run_command('build')
File "/home/ubuntu/anaconda/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/ubuntu/anaconda/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 78, in run
_build_ext.run(self)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/tmp/pip-install-q9a9hf6g/rii/setup.py", line 112, in build_extensions
build_ext.build_extensions(self)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
self.build_extension(ext)
File "/home/ubuntu/anaconda/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 199, in build_extension
_build_ext.build_extension(self, ext)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/ccompiler.py", line 566, in compile
depends, extra_postargs)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/ccompiler.py", line 341, in _setup_compile
pp_opts = gen_preprocess_options(macros, incdirs)
File "/home/ubuntu/anaconda/lib/python3.6/distutils/ccompiler.py", line 1075, in gen_preprocess_options
pp_opts.append("-I%s" % dir)
File "/tmp/pip-install-q9a9hf6g/rii/setup.py", line 30, in __str__
import pybind11
ModuleNotFoundError: No module named 'pybind11'
----------------------------------------
Failed building wheel for rii
Running setup.py clean for rii
Failed to build rii
Installing collected packages: pybind11, nanopq, rii
Running setup.py install for rii: started
Running setup.py install for rii: finished with status 'done'
Successfully installed nanopq-0.1.6 pybind11-2.2.4 rii-0.2.3
This doesn't happen again when reinstalling the rii: pip uninstall rii nanopq pybind11 && pip install rii
We can reproduce this again by deleting pip cache: pip uninstall rii nanopq pybind11 && rm -rf ~/.cache/pip/ && pip install rii
Any comments or PRs are highly welcome!
Need to include some headers for SIMD codes? PRs are welcome.
log: https://github.com/matsui528/rii/runs/3014401725?check_suite_focus=true
related: #39
Read-the-docs's autodoc (the function to automatically read the docstring inside codes) doesn't work. Any help?
We should see the docstring here, but cannot see anything.
https://rii.readthedocs.io/en/latest/source/api.html
The corresponding rst file is as follows. The original file is here.
API Reference
==============
.. automodule:: rii
Reconfigurable Inverted Index (Rii)
---------------------------------------
.. autoclass:: rii.Rii
:members:
:undoc-members:
Strangely, if we render the document manually, it works.
cd docs
pip install -r requirements.txt
make html
python -m http.serer 8000
# Go to `docs/_build/html`
It might be better to explicitly define a codec first:
# Instantiate PQ codec (encoder/decoder)
codec = nanopq.PQ(M=32).fit(vecs=Xt)
# Instantiate with M=32 sub-spaces
e = rii.Rii(fine_quantizer=codec)
In addition, add_configure
would be more proper name compared to add_reconfigure
, since it is not a "re"configuration.
# Add vectors
e.add_configure(vecs=X)
Suppose e.fit(X)
fails, for example the number of vectors is less than 256:
~/usr/src/anaconda/anaconda3/envs/sandbox/lib/python3.6/site-packages/nanopq/pq.py in fit(self, vecs, iter, seed)
61 assert vecs.ndim == 2
62 N, D = vecs.shape
---> 63 assert self.Ks < N, "the number of training vector should be more than Ks"
64 assert D % self.M == 0, "input dimension must be dividable by M"
65 self.Ds = int(D / self.M)
AssertionError: the number of training vector should be more than Ks
Here, even though e.fit(X)
fails, an empty codec is assigned to self.fine_quantizer
. This prohibits us to call e.fig(X)
again:
~/usr/src/anaconda/anaconda3/envs/sandbox/lib/python3.6/site-packages/rii/rii.py in fit(self, vecs, iter, seed)
130
131 """
--> 132 assert self.fine_quantizer is None, "`fit` should be called only once"
133 assert vecs.dtype == np.float32
134
AssertionError: `fit` should be called only once
This isn't a proper behaviour. After a user finds that e.fit(X)
fails, she will call e.fit(X)
again with some new X
, and it should work.
This can be solved by assigning an empty codec to self.fine_quantizer
in the constructor of the Rii class, and replacing an error checking from assert self.fine_quantier is None
to assert self.codewords
Hi @matsui528 ,
Thank you for releasing such an awesome library.
Now I want to implement a subset search system based on dot product.
Do you have any plans to develop such algorithms?
I found that annoy
implements similarity search algorithms using dot product, but it did not focus on the subset search.
https://github.com/spotify/annoy/blob/84f1c68e43b41daf1ce22c1eb4f3a8d3e0086403/src/annoylib.h#L505
When I try to run rii.Rii(fine_quantizer=codec)
I get the following AttributeError
.
AttributeError: module 'main' has no attribute 'RiiCpp'
Any thoughts?
In rii.py line 35 there is an assert for Ks to check if it is less than 256 (which should be 2**32), but in nanopq pq.py line 37 similar assert checks to see if Ks is less than 2**32 (which is correct IMHO). So when passing in a nanopq to rii it throws AssertionError if the Ks is larger than 256. I am using a very large dataset 6.7 million rows. So my Ks is set 2**12 = 4096 and nanopq works as expected, but rii throws the AssertionError.
With specific target data and specific query, rii fails with segmentation fault.
Here is the code which can reproduce segmentation fault.
import nanopq
import numpy
import rii
data = numpy.array([[1, 1, 1, 1, 1, 1, 1, 1] for _ in range(200)] + [[2, 2, 2, 2, 2, 2, 2, 2] for _ in range(200)] + [
[2, 2, 2, 2, 2, 2, 2, 3] for _ in range(200)], dtype=numpy.float32)
codec = nanopq.PQ(M=8, verbose=False).fit(data)
R = rii.Rii(fine_quantizer=codec)
R.add_configure(data)
for _ in range(100):
R.add(numpy.array([[1, 1, 1, 1, 2, 2, 2, 2]], dtype=numpy.float32))
R.reconfigure()
R.query(numpy.array([1, 1, 1, 1, 2, 2, 2, 2], dtype=numpy.float32), topk=10, target_ids=numpy.arange(400))
I have a RII object with 3.3 billion scale dataset which was batch loaded using the add(update_posting_lists=False)
and then at the end I ran the reconfigure()
, but it crashes the python kernel in the reconfigure step. I tried adding and then configuring it immediateIy and it was taking forever, didn't know if it worked or just hung. I was looking at the code and saw some comments about large memory consumption, is there a alternate way to do this without crashing?
Hey does this use L2 norm or cosine simuluarity to figure out the distance?
SSE, AVX, and AVX512
It seems that it's hard to test the code on Travis CI with Mac OS X. It is because the Rii project is based on cpp and python with pybind11. Any suggestions or PRs are welcome
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.