rapidsai / kvikio Goto Github PK
View Code? Open in Web Editor NEWKvikIO - High Performance File IO
Home Page: https://docs.rapids.ai/api/kvikio/stable/
License: Apache License 2.0
KvikIO - High Performance File IO
Home Page: https://docs.rapids.ai/api/kvikio/stable/
License: Apache License 2.0
Please update instruction or build script to make it work on any CUDA toolkit-enabled system.
❯ pyenv activate cucim-3.8
❯ cd python
❯ python -m pip install .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gbae/repo/kvikio/python
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Collecting cython
Downloading Cython-0.29.28-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
|████████████████████████████████| 1.9 MB 1.1 MB/s
Building wheels for collected packages: kvikio
Building wheel for kvikio (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /home/gbae/.pyenv/versions/cucim-3.8/bin/python /home/gbae/.pyenv/versions/cucim-3.8/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmp7ibfglvk
cwd: /tmp/pip-req-build-d06p8fxv
Complete output (27 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/kvikio
copying kvikio/cufile.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/zarr.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/thread_pool.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/_version.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/__init__.py -> build/lib.linux-x86_64-3.8/kvikio
creating build/lib.linux-x86_64-3.8/kvikio/_lib
copying kvikio/_lib/__init__.py -> build/lib.linux-x86_64-3.8/kvikio/_lib
copying kvikio/_lib/arr.pyi -> build/lib.linux-x86_64-3.8/kvikio/_lib
UPDATING build/lib.linux-x86_64-3.8/kvikio/_version.py
set build/lib.linux-x86_64-3.8/kvikio/_version.py to '0+unknown'
running build_ext
building 'kvikio._lib.libkvikio' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/kvikio
creating build/temp.linux-x86_64-3.8/kvikio/_lib
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O3 -Wall -I/home/linuxbrew/.linuxbrew/opt/zlib -I/home/linuxbrew/.linuxbrew/opt/zlib -fPIC -O3 -I/home/gbae/.pyenv/versions/3.8.12/include -I/usr/local/cuda/include -I/home/gbae/.pyenv/versions/cucim-3.8/include -I/home/gbae/.pyenv/versions/3.8.12/include/python3.8 -c kvikio/_lib/libkvikio.cpp -o build/temp.linux-x86_64-3.8/kvikio/_lib/libkvikio.o -std=c++17
kvikio/_lib/libkvikio.cpp:752:10: fatal error: kvikio/utils.hpp: No such file or directory
752 | #include <kvikio/utils.hpp>
| ^~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
----------------------------------------
ERROR: Failed building wheel for kvikio
Failed to build kvikio
ERROR: Could not build wheels for kvikio which use PEP 517 and cannot be installed directly
WARNING: You are using pip version 21.1.1; however, version 22.0.3 is available.
You should consider upgrading via the '/home/gbae/.pyenv/versions/cucim-3.8/bin/python -m pip install --upgrade pip' command.
❯ cd python
❯ python -m pip install .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /home/gbae/repo/kvikio/python
DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Collecting cython
Downloading Cython-0.29.28-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
|████████████████████████████████| 1.9 MB 679 kB/s
Building wheels for collected packages: kvikio
Building wheel for kvikio (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /home/gbae/.pyenv/versions/cucim-3.8/bin/python /home/gbae/.pyenv/versions/cucim-3.8/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpio38_f81
cwd: /tmp/pip-req-build-nnr151e7
Complete output (62 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/kvikio
copying kvikio/cufile.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/zarr.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/thread_pool.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/_version.py -> build/lib.linux-x86_64-3.8/kvikio
copying kvikio/__init__.py -> build/lib.linux-x86_64-3.8/kvikio
creating build/lib.linux-x86_64-3.8/kvikio/_lib
copying kvikio/_lib/__init__.py -> build/lib.linux-x86_64-3.8/kvikio/_lib
copying kvikio/_lib/arr.pyi -> build/lib.linux-x86_64-3.8/kvikio/_lib
UPDATING build/lib.linux-x86_64-3.8/kvikio/_version.py
set build/lib.linux-x86_64-3.8/kvikio/_version.py to '0+unknown'
running build_ext
building 'kvikio._lib.libkvikio' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/kvikio
creating build/temp.linux-x86_64-3.8/kvikio/_lib
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O3 -Wall -I/home/linuxbrew/.linuxbrew/opt/zlib -I/home/linuxbrew/.linuxbrew/opt/zlib -fPIC -O3 -I/home/gbae/.pyenv/versions/3.8.12/include -I/home/gbae/repo/kvikio/cpp/include -I/usr/local/cuda/include -I/home/gbae/.pyenv/versions/cucim-3.8/include -I/home/gbae/.pyenv/versions/3.8.12/include/python3.8 -c kvikio/_lib/libkvikio.cpp -o build/temp.linux-x86_64-3.8/kvikio/_lib/libkvikio.o -std=c++17
kvikio/_lib/libkvikio.cpp:15862:20: warning: ‘__pyx_mdef___pyx_memoryviewslice_3__setstate_cython__’ defined but not used [-Wunused-variable]
15862 | static PyMethodDef __pyx_mdef___pyx_memoryviewslice_3__setstate_cython__ = {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_memoryviewslice_3__setstate_cython__, METH_O, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:15804:20: warning: ‘__pyx_mdef___pyx_memoryviewslice_1__reduce_cython__’ defined but not used [-Wunused-variable]
15804 | static PyMethodDef __pyx_mdef___pyx_memoryviewslice_1__reduce_cython__ = {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_memoryviewslice_1__reduce_cython__, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:12959:20: warning: ‘__pyx_mdef___pyx_memoryview_3__setstate_cython__’ defined but not used [-Wunused-variable]
12959 | static PyMethodDef __pyx_mdef___pyx_memoryview_3__setstate_cython__ = {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_memoryview_3__setstate_cython__, METH_O, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:12901:20: warning: ‘__pyx_mdef___pyx_memoryview_1__reduce_cython__’ defined but not used [-Wunused-variable]
12901 | static PyMethodDef __pyx_mdef___pyx_memoryview_1__reduce_cython__ = {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_memoryview_1__reduce_cython__, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:12807:20: warning: ‘__pyx_mdef_15View_dot_MemoryView_10memoryview_23copy_fortran’ defined but not used [-Wunused-variable]
12807 | static PyMethodDef __pyx_mdef_15View_dot_MemoryView_10memoryview_23copy_fortran = {"copy_fortran", (PyCFunction)__pyx_memoryview_copy_fortran, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:12712:20: warning: ‘__pyx_mdef_15View_dot_MemoryView_10memoryview_21copy’ defined but not used [-Wunused-variable]
12712 | static PyMethodDef __pyx_mdef_15View_dot_MemoryView_10memoryview_21copy = {"copy", (PyCFunction)__pyx_memoryview_copy, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:12635:20: warning: ‘__pyx_mdef_15View_dot_MemoryView_10memoryview_19is_f_contig’ defined but not used [-Wunused-variable]
12635 | static PyMethodDef __pyx_mdef_15View_dot_MemoryView_10memoryview_19is_f_contig = {"is_f_contig", (PyCFunction)__pyx_memoryview_is_f_contig, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:12558:20: warning: ‘__pyx_mdef_15View_dot_MemoryView_10memoryview_17is_c_contig’ defined but not used [-Wunused-variable]
12558 | static PyMethodDef __pyx_mdef_15View_dot_MemoryView_10memoryview_17is_c_contig = {"is_c_contig", (PyCFunction)__pyx_memoryview_is_c_contig, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:8685:20: warning: ‘__pyx_mdef___pyx_MemviewEnum_3__setstate_cython__’ defined but not used [-Wunused-variable]
8685 | static PyMethodDef __pyx_mdef___pyx_MemviewEnum_3__setstate_cython__ = {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_MemviewEnum_3__setstate_cython__, METH_O, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:8449:20: warning: ‘__pyx_mdef___pyx_MemviewEnum_1__reduce_cython__’ defined but not used [-Wunused-variable]
8449 | static PyMethodDef __pyx_mdef___pyx_MemviewEnum_1__reduce_cython__ = {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_MemviewEnum_1__reduce_cython__, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:8071:20: warning: ‘__pyx_mdef___pyx_array_3__setstate_cython__’ defined but not used [-Wunused-variable]
8071 | static PyMethodDef __pyx_mdef___pyx_array_3__setstate_cython__ = {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_array_3__setstate_cython__, METH_O, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kvikio/_lib/libkvikio.cpp:8013:20: warning: ‘__pyx_mdef___pyx_array_1__reduce_cython__’ defined but not used [-Wunused-variable]
8013 | static PyMethodDef __pyx_mdef___pyx_array_1__reduce_cython__ = {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_array_1__reduce_cython__, METH_NOARGS, 0};
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
g++ -pthread -shared -L/home/linuxbrew/.linuxbrew/opt/readline/lib -L/home/gbae/.pyenv/versions/3.8.12/lib -L/home/linuxbrew/.linuxbrew/lib -L/home/linuxbrew/.linuxbrew/opt/readline/lib -L/home/gbae/.pyenv/versions/3.8.12/lib -L/home/linuxbrew/.linuxbrew/lib build/temp.linux-x86_64-3.8/kvikio/_lib/libkvikio.o -L/home/gbae/.pyenv/versions/cucim-3.8/lib/python3.8/site-packages -L/usr/local/cuda/lib64 -lcuda -lnvidia-ml -lcufile -o build/lib.linux-x86_64-3.8/kvikio/_lib/libkvikio.cpython-38-x86_64-linux-gnu.so
/home/linuxbrew/.linuxbrew/bin/ld: cannot find -lnvidia-ml
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++' failed with exit code 1
----------------------------------------
ERROR: Failed building wheel for kvikio
I workarounded the issue through the following change.
--- a/python/setup.py
+++ b/python/setup.py
@@ -45,8 +45,8 @@ this_setup_scrip_dir = os.path.dirname(os.path.realpath(__file__))
# Throughout the script, we will populate `include_dirs`,
# `library_dirs` and `depends`.
-include_dirs = [os.path.dirname(sysconfig.get_path("include"))]
-library_dirs = [get_python_lib()]
+include_dirs = [os.path.dirname(sysconfig.get_path("include"))] + ["/home/gbae/repo/kvikio/cpp/include"]
+library_dirs = [get_python_lib(), "/usr/local/cuda/lib64/stubs"]
extra_objects = []
depends = [] # Files to trigger rebuild when modified
Currently there are a few metadata keys that get handled specially in the GDSStore
(see below).
Lines 35 to 39 in 0251229
Though one that isn't handled currently is the consolidated metadata key (.zmetadata
), which can show up for some Zarr stores.
Would be good to filter this one out too
Currently nvCOMP is fetched by CMake
kvikio/python/cmake/CMakeLists.txt
Line 15 in d9eee8b
Now that there is an nvcomp
conda package, it would be good to switch over to using that instead
The size of chunks in the pinned memory pool should be limited by the slice size, since each chunk is always used by a single thread.
For users getting started, it would be nice to have either an example in a tutorial or a notebook showing how to plug KvikIO into Zarr.
Hi Team,
Is there any out of the box solution provided for object reads from S3 over any known sdk -- boto or azure for Cufile?
I could not find any ready solution for this hence decided to open a query here....
I am currently working an an infra project to test the performances of storage devices -- and a part of that work involves object reads from S3.
I have tried downloading the S3 object and then go for Cufile reads, however, it seems somewhat like taking a detour to reach the destination -- and I am mostly concerned with the performance implications over the extra code!
Therefore I wanted to know if there are any solutions provided in kvikIO for such a use case? If not, maybe you could suggest an alternative to approaching the problem without going through the download & read path?
N.B.: I have also played around with DALI, so open to any suggestions, if it involves DALI as well....
Many thanks in advance.
The nvCOMP portion of the code here currently requires NumPy & CuPy whereas KvikIO does not. It may be worthwhile to relax this constraint in nvCOMP to simplify usage requirements.
Hello!
I tried to install Kvikio using the following code:
conda install -c rapidsai -c conda-forge kvikio
but anaconda said it couldn't find the package, I am using a windows 10 system, can anyone help me resolve this problem? Thank you!
Documenting an offline discussion with @jakirkham in preparation of CUDA 12 bring-up on conda-forge.
Currently, there're some places where libXXXXX.so
without any SONAME being dlopen'd, for example,
kvikio/cpp/include/kvikio/shim/cufile.hpp
Line 53 in db0a3e7
libXXXXX.so
symlink is supposed to exist only in the libXXXXX-dev
package by the stand practice of Linux distros, not in libXXXXX
which only contains libXXXXX.so.{major}
. The dlopen'd names need to be canonicalized.
Note: Text borrowed from Leo's issue ( rapidsai/cudf#12708 ) and tweaked
Hello, the steps for reproducing:
gh repo clone rapidsai/kvikio
cd kvikio/cpp
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
make
./basic_io
KvikIO defaults:
Compatibility mode: disabled
DriverProperties:
Version: 0.0
Allow compatibility mode: true
Pool mode - enabled: false, threshold: 4 kb
Max pinned memory: 4294967292 kb
terminate called after throwing an instance of 'std::system_error'
debugging:
cgdb ./basic_io
...
bt
...
#7 0x00007ffff78a5fcd in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x555555597860 <typeinfo for std::system_error@GLIBCXX_3.4.11>, dest=0x7ffff78d5fd0 <std::system_error::~system_error()>)
at /usr/src/debug/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#8 0x0000555555564dd4 in kvikio::detail::open_fd (file_path="/tmp/test-file", flags="w", o_direct=true, mode=436) at /home/alsam/work/github/kvikio/cpp/examples/../include/kvikio/file_handle.hpp:85
#9 0x0000555555564f48 in kvikio::FileHandle::FileHandle (this=0x7fffffffe6c0, file_path="/tmp/test-file", flags="w", mode=436, compat_mode=false)
failed to open the file /tmp/test-file
for writing with o_direct=true, mode=436
$ touch /tmp/test-file
$ echo $?
0
Thanks
Create fixtures that allow a range variation in the nvcomp
test values. These values are hard coded very specifically and any time a change in C++ nvcomp that slightly modifies their compression parameters, tests will fail here. If we provide fixtures with set values and a utility function for comparison that lets them vary slightly we'll gain the benefit of tracking their output and temporary data sizes, without having to constantly retool for failing CI.
Hi,
I tried the gdsio tool and works fine as expected, compat mode is slower than GPU direct reads. But when checking this library using the given python example, it doesn't work the same as expected (compat mode is faster than gpudirect read). Could you please let me know?
changed example that is given in README.md
!/usr/bin/python
import cupy
import kvikio
import time
import kvikio.defaults
def main(nelem):
print("Tensor size:", nelem)
path = "/mnt/tmp/test-file"
start_time = time.time()
a = cupy.arange(nelem)
f = kvikio.CuFile(path, "w")
# Write whole array to file
f.write(a)
f.close()
print("--- %s seconds write time---" % (time.time() - start_time))
# Read whole array from file
start_time = time.time()
b = cupy.empty_like(a)
print("--- %s seconds buffer creation---" % (time.time() - start_time))
f = kvikio.CuFile(path, "r")
f.read(b)
print("--- %s seconds buffer creation and load time---" % (time.time() - start_time))
assert all(a == b)
# Use contexmanager
start_time = time.time()
c = cupy.empty_like(a)
with kvikio.CuFile(path, "r") as f:
f.read(c)
print("--- %s seconds buffer creation and load time with context manager---" % (time.time() - start_time))
assert all(a == c)
# Non-blocking read
start_time = time.time()
d = cupy.empty_like(a)
with kvikio.CuFile(path, "r") as f:
future1 = f.pread(d[:nelem/2])
future2 = f.pread(d[nelem/2:], file_offset=d[:nelem/2].nbytes)
future1.get() # Wait for first read
future2.get() # Wait for second read
print("--- %s seconds buffer creation and load time with non block read---" % (time.time() - start_time))
start_time = time.time()
assert all(a == d)
print("--- %s seconds assertion time---" % (time.time() - start_time))
arr_sizes = [100, 1000000]
kvikio.defaults.compat_mode_reset(False)
assert not kvikio.defaults.compat_mode()
for elem in arr_sizes:
main(elem)
kvikio.defaults.compat_mode_reset(True)
assert kvikio.defaults.compat_mode()
print("COMPAT MODE..")
for elem in arr_sizes:
main(elem)
output:
Tensor size: 100
--- 0.36174535751342773 seconds write time---
--- 0.00011444091796875 seconds buffer creation---
--- 0.003509044647216797 seconds buffer creation and load time---
--- 0.000995635986328125 seconds buffer creation and load time with context manager---
--- 0.0020360946655273438 seconds buffer creation and load time with non block read---
--- 0.0019338130950927734 seconds assertion time---
Tensor size: 1000000
--- 0.3805253505706787 seconds write time---
--- 0.0003936290740966797 seconds buffer creation---
--- 0.02455925941467285 seconds buffer creation and load time---
--- 0.045484304428100586 seconds buffer creation and load time with context manager---
--- 0.07375788688659668 seconds buffer creation and load time with non block read---
--- 18.388749361038208 seconds assertion time---
COMPAT MODE..
Tensor size: 100
--- 0.04293179512023926 seconds write time---
--- 9.965896606445312e-05 seconds buffer creation---
--- 0.00044655799865722656 seconds buffer creation and load time---
--- 0.0001728534698486328 seconds buffer creation and load time with context manager---
--- 0.00022649765014648438 seconds buffer creation and load time with non block read---
--- 0.0018930435180664062 seconds assertion time---
Tensor size: 1000000
--- 0.05194258689880371 seconds write time---
--- 1.8596649169921875e-05 seconds buffer creation---
--- 0.002271890640258789 seconds buffer creation and load time---
--- 0.002416372299194336 seconds buffer creation and load time with context manager---
--- 0.0020689964294433594 seconds buffer creation and load time with non block read---
--- 18.475173473358154 seconds assertion time---
I tried the conda create line in the readme but it doesn't work
conda create -n kvikio_env -c rapidsai-nightly -c conda-forge python=3.8 cudatoolkit=11.5 kvikio
Mamba fails with
Encountered problems while solving:
- package kvikio-22.04.00a220318-cuda_11_py38_g4a300f5_27 requires python_abi 3.8.* *_cp38, but none of the providers can be installed
Conda fails with
nsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions
Package libgcc-ng conflicts for:
kvikio -> libgcc-ng[version='>=12']
kvikio -> cudatoolkit[version='>=11,<12.0a0'] -> libgcc-ng[version='>=10.3.0|>=9.4.0|>=9.3.0|>=7.3.0|>=7.5.0|>=5.4.0']
Package python conflicts for:
kvikio -> cupy -> python[version='2.7.*|3.5.*|3.6.*|3.8.*|>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0|>
=3.5|3.4.*|>=3.6,<4|3.9.*']
python=3.8
Package libstdcxx-ng conflicts for:
kvikio -> libstdcxx-ng[version='>=12']
kvikio -> cudatoolkit[version='>=11,<12.0a0'] -> libstdcxx-ng[version='>=10.3.0|>=9.4.0|>=9.3.0|>=7.3.0|>=7.5.0|>=5.4.0']
Package cudatoolkit conflicts for:
cudatoolkit=11.5
kvikio -> cupy -> cudatoolkit[version='10.0|10.0.*|10.1|10.1.*|10.2|10.2.*|11.0|11.0.*|11.4|11.4.*|>=11.2,<12|11.1|11.1.*|9.2|9.2.*|>=11.2,<12.0a0|>=10.0.130,<10.1.0a0
|>=9.2,<9.3.0a0|>=9.0,<9.1.0a0|>=8.0,<9.0a0|8.0.*|9.0.*|>=11.5,<12']
kvikio -> cudatoolkit[version='>=11,<12.0a0']
Package _openmp_mutex conflicts for:
kvikio -> libgcc-ng[version='>=12'] -> _openmp_mutex[version='>=4.5']
python=3.8 -> libgcc-ng[version='>=10.3.0'] -> _openmp_mutex[version='>=4.5']
cudatoolkit=11.5 -> libgcc-ng[version='>=10.3.0'] -> _openmp_mutex[version='>=4.5']The following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.17=0
- feature:|@/linux-64::__glibc==2.17=0
- kvikio -> __glibc[version='>=2.17,<3.0.a0']
- kvikio -> cupy -> __glibc[version='>=2.17']
- python=3.8 -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.17
Note that strict channel priority may have removed packages required for satisfiability.
We should create Conda packages, both regular and nightly and releases.
@quasiben and @jakirkham I am hope this is something you guys can help with? :)
Hi all,
it would be helpful if IOFuture has a .done() member such as concurrent futures, i.e. a routine which checks whether the IO is complete. This would help iterating through lists/queues of outstanding requests and working on those which are done first.
for handle in iofuture_list:
if not handle.done():
continue
#do whatever
Best Regards
Thorsten
Hi,
I was wondering if we could use the kvikIO Cufile for RDMA IO ops?
I am working on a project that involves Client space Read/Write from our own distributed filesystem (RDMA capable).
Now that the system read/write has been established, I am looking to add some further implementation that involves direct IO through the GPU.
I am using DGX2 which runs my client application.
So, is there any way I could use the kvikIO implementation for the above use case?
If yes, could you guide me with an example code implementation, which could help in this regard?
We use this command to install kvikio as follows:
conda create -n kvikio_env -c rapidsai-nightly -c conda-forge python=3.8 cudatoolkit=11.5 kvikio
But When I use test code, an error occurs:
Some environmental configuration information is as follows.
RuntimeError: libcuda.so: cannot open shared object file: No such file or directory
But we can find the file in:
/usr/local/cuda-11.7/targets/x86_64-linux/lib/stubs/
My Cuda version is 11.7 and my GPU is P100.
Linux kernel is 5.4.0-70, gcc and g++ both version 11.
And the environment is:
export PATH=$PATH:/root/anaconda3/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.7/lib64
export PATH=$PATH:/usr/local/cuda-11.7/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.7
export CUFILE_PATH="/usr/local/cuda-11.7/targets/x86_64-linux/lib/"
export DALI_EXTRA_PATH="/mnt/optane/wjtest/DALI_extra/"
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/root/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/root/anaconda3/etc/profile.d/conda.sh" ]; then
. "/root/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/root/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
source activate
As synchronization might be quite expensive, one common strategy a user might want to employ is adding a callback to their IOFuture
s. This could be done using an API like concurrent.futures.Future.add_done_callback
. IOW call the callback with the IOFuture
as an object. This way the result or exception could be gotten and operated on if needed without blocking.
Hello. I decided to ask for help here, because I'm lost. GDS seems very unfriendly to get working. Do you have any instructions on how to get it working? As a developes how do you get it? Are you using ancient ubuntu 20.04? Thanks
Currently one needs to perform each read/write and wait on them individually. Like so
kvikio/python/examples/hello_world.py
Lines 31 to 34 in 7a43759
However one use case is to do a bunch of writes or reads asynchronously. For example ( zarr-developers/zarr-python#1040 ). Doing this would accumulate a list
of IOFuture
s from each operation.
In this case it would be helpful to have an API that is able to wait
on that whole list
of operations to complete. Perhaps like concurrent.futures.wait
.
Is there an example of using Zarr IO from C++? Is arbitrary metadata supported? Is there an example of that? Does kvikio
even offer Zarr IO? After #82 is integrated, would it be possible to read/write Zarr directly from the host memory?
cufile.so
might crash when used within a VM in the cloud.
KvikIO should detect this and fallback to its own implementation.
Currently if a user tries to write a mixture of host & device frames, this doesn't work as the following error will show up for any host frames
kvikio/python/kvikio/_lib/libkvikio.pyx
Line 86 in 7a43759
This matters as using KvikIO in Distributed & Dask-CUDA will depend on handling some host frames along the way while reading and writing.
Maybe there could be some kind of fallback to handle host memory in these cases?
When trying to force the benchmark to run with GDS enabled I get the following error:
KVIKIO_COMPAT_MODE=0 python benchmarks/single-node-io.py
Traceback (most recent call last):
File "/home/quasiben/Github/kvikio/python/benchmarks/single-node-io.py", line 401, in <module>
main(args)
File "/home/quasiben/Github/kvikio/python/benchmarks/single-node-io.py", line 307, in main
read, write = API[api](args)
File "/home/quasiben/Github/kvikio/python/benchmarks/single-node-io.py", line 28, in run_cufile
kvikio.memory_register(data)
File "/home/quasiben/miniconda3/envs/kvikio_dev/lib/python3.9/site-packages/kvikio/__init__.py", line 13, in memory_register
return libkvikio.memory_register(buf)
File "libkvikio.pyx", line 44, in kvikio._lib.libkvikio.memory_register
RuntimeError: libcufile.so.1: cannot open shared object file: No such file or directory
I just installed CUDA 11.8 and confirmed I have libcufile.so:
/usr/local/cuda/lib64/libcufile.so -> libcufile.so.0
Additionally, during buliding kvikIO finds cuFILE:
cmake -DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX} ${CMAKE_EXTRA_ARGS} ..
-- The CXX compiler identification is GNU 9.5.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /home/quasiben/miniconda3/envs/kvikio_dev/bin/x86_64-conda-linux-gnu-c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found cuFile: /usr/local/cuda/lib64/libcufile.so
-- Configuring done
Should we also search for libcufile.so in addition to libcufile.so.1 ?
void* lib = load_library("libcufile.so.1");
Pushing for CUDA 12 Conda package builds of libraries for RAPIDS 23.04. This issue is for tracking that work.
Requirements (please recheck):
Note: nvcomp
is only need for compression (not IO). Also pynvml is only needed in the benchmarks (not for library usage).
Our get_nvcomp.cmake
should be updated to use rapids_cpm_nvcomp
from rapids-cmake
. That will keep kvikio from getting out of sync with the rest of RAPIDS and better ensure that it stays up to date. I believe that there have been a lot of relevant bugfixes/improvements (for cudf) to nvcomp in the last couple of releases.
Originally posted by @vyasr in #96 (comment)
I'm requesting the ability to connect a file descriptor, like a socket handle, to Kvikio so the data from that descriptor can be polled from gpu and directly loaded without having to reference back to a CPU data handle. Currently, the CuFile
submodule of KvikIO does not support file descriptors.
A similar API to this in python might be the socket
module.
Usecase:
Data streaming from a socket connection to a waiting process with a fully GPU workload.
Getting inconsistent errors at the end of process lifetime when "KVIKIO" policy is used in cuIO.
No issues in the actual tests.
The are multiple possible outcomes: crash, segfault, success. Sample output:
30: double free or corruption (!prev)
1/3 Test #30: JSON_TEST ........................Subprocess aborted***Exception: 2.68 sec
Repro persists when all cuFile calls outside of kvikIO are removed.
No repro with other versions of CUDA toolkit.
No repro when using libcudf's cuFile path instead of kvikIO.
We can use Kerchunk to GDS accelerate HDF5 reads like we do in the Legate backend: #222
However, Kerchunk doesn't support writes thus for that we need another approach.
Would it make sense to start using rapids-cython
here as in cuDF?
Any other best practices we should adopt here to modernize our scikit-build/CMake files?
Maybe some of these items are worth addressing ( #58 (comment) )
Would be useful to have wheels for KvikIO as well. This could be helpful for users coming from pip
(like some Dask-CUDA users). Raising this to track adding them.
Note: There may be some things that need to be fixed as part of this like issue ( #233 )
Meta Issue to track support of new cuFile features.
It would be good to build the C++ example in CI with -DCMAKE_BUILD_TYPE=Debug
in order to detect issues downstream such as rapidsai/cudf#10703.
If we could also run the example (basic_io
), it would be great but that isn't essential.
Meta Issue to track work related to Zarr-KvikIO
Zarr PRs:
KvikIO PRs:
I am running inside a Docker container, specifically one created using rapids-compose. I am able to successfully build both the Python and C++ kvikio libraries, but both of them fail at runtime with a cuFile internal error. For example, using the exact C++ building instructions here appears to succeed, but then running the basic_io
executable gives
(rapids) rapids@compose:~/kvikio/cpp/build$ ./examples/basic_io
KvikIO config:
Compatibility mode: disabled
DriverProperties:
Version: 0.0
Allow compatibility mode: true
Pool mode - enabled: false, threshold: 4 kb
Max pinned memory: 33554432 kb
terminate called after throwing an instance of 'kvikio::CUfileException'
what(): cuFile error at: /home/vyasr/local/rapids/kvikio/cpp/examples/../include/kvikio/file_handle.hpp:117: internal error
Aborted (core dumped)
I get the same cuFile... internal error
if I try to run pytest
on anything in the python/tests
directory.
In some cases a user needs to write a list
of binary data (like using writelines
). It would be useful to have an API like this when working with multiple frames.
Similarly it might be handy to have some kind of API for reading in data to multiple buffers. There is not exactly an analogous API in Python. Though socket.recvmsg_into
is close (so might be a starting point).
This can be particularly helpful if the GIL is released in the background as there can be one API call that releases the GIL as opposed to a couple in rapid succession.
It seems a bit counterintuitive that there is
reset_num_threads
reset_task_size
but
compat_mode_reset
For consistency, I would have expected the last one to be reset_compat_mode
. Do you think this is worth changing via a deprecation cycle?
The C++ API consistently has all three functions in a form ending in _reset
, but if we did that on the Python side we would have to change two of the existing functions.
Hi,
I'm trying to compress a file using the nvcomp LZ4Compressor but ended up gettting file more than the actual size.
Kindly advise what i'm doing wrong.
import os
import cupy as cp
import numpy as np
import kvikio.nvcomp as nvc
import humanize
DTYPE = cp.int32
dtype = cp.dtype(DTYPE)
filename = '/home/arul/Downloads/kjv10.txt'
print('File :', filename, ' Size: ', humanize.naturalsize(os.path.getsize(filename)))
testfile = open(filename).read()
data = np.frombuffer(bytes(testfile, 'utf-8'), dtype=np.int8)
data_gpu = cp.array(data, dtype=DTYPE)
compressor = nvc.LZ4Compressor(data_gpu.dtype)
compressed = compressor.compress(data_gpu)
gpu_comp_file = open('/home/arul/Downloads/kjv10-gpu-compressed.txt.lz4', 'wb')
gpu_comp_file.write(compressed.tobytes())
gpu_comp_file.close()
print('Compressed Size: ', humanize.naturalsize(compressed.size))
del compressor
del compressed
Result:
File : /home/arul/Downloads/kjv10.txt Size: 4.4 MB
Compressed Size: 17.4 MB
Before proposing the new encode_batch
and decode_batch
API (#248) to numcodecs Codec, I think we should introduce a new context argument similar to Zarr's Context
.
This way, we can use the meta_array
option to specify the output memory type of encode_batch
and decode_batch
.
cc. @Alexey-Kamenev
This feature adds support for nvCOMP batch/low-level API which allows to process multiple chunks in parallel.
The proposed implementation provides an easy way to use the API via well-known numcodecs Codec API. Using numcodecs
also enables seamless integration with libraries such as zarr
that use numcodecs
internally.
Additionally, using nvCOMP batch API enables interoperability between existing codecs and nvCOMP batch codec. For example, the data can be compressed on CPU using default LZ4
codec and then decompressed on GPU using proposed nvCOMP batch codec.
To support batch mode, Codec interface was extended with functions, encode_batch
and decode_batch
, which implement batch mode.
Note that the current version of zarr
does not support chunk-parallel functionality, but there is a proposal for this feature.
Currently the following compression/decompression algorithms are supported:
nvCOMP also supports other algorithms which can be relatively easily added to kvikio.
Example of usage:
import numcodecs
import numpy as np
# Get the codec from the numcodecs registry.
codec = numcodecs.registry.get_codec(dict(id="nvcomp_batch", algorithm="lz4"))
# Creater 2 chunks. The chunks do not have to be the same size.
shape = (4, 8)
chunk1, chunk2 = np.random.randn(2, *shape).astype(np.float32)
# Compress data.
data_comp = codec.encode_batch([chunk1, chunk2])
# Decompress.
data_decomp = codec.decode_batch(data_comp)
# Verify.
np.testing.assert_equal(data_decomp[0].view(np.float32).reshape(shape), chunk1)
np.testing.assert_equal(data_decomp[1].view(np.float32).reshape(shape), chunk2)
zarr
(no parallel chunking yet - see the note above).import numcodecs
import numpy as np
import zarr
# Get the codec from the numcodecs registry.
codec = numcodecs.registry.get_codec(dict(id="nvcomp_batch", algorithm="lz4"))
shape = (16, 16)
chunks = (8, 8)
# Create data and compress.
data = np.random.randn(*shape).astype(np.float32)
z1 = zarr.array(data, chunks=chunks, compressor=codec)
# Store in compressed format.
zarr_store = zarr.MemoryStore()
zarr.save_array(zarr_store, z1, compressor=codec)
# Read back/decompress.
z2 = zarr.open_array(zarr_store)
np.testing.assert_equal(z1[:], z2[:])
If desired, the API can also be used directly, without having to use numcodecs API.
running the test cases with numpy 1.22.2:
/opt/kvikio/python# pytest tests/
=========================================================================================================== test session starts ===========================================================================================================
platform linux -- Python 3.8.10, pytest-7.2.2, pluggy-1.0.0
rootdir: /opt/kvikio/python
plugins: typeguard-2.13.3, shard-0.1.2, xdist-3.2.1, hypothesis-5.35.1, rerunfailures-11.1.2, xdoctest-1.0.2
collected 381 items / 1 error / 1 skipped
Running 381 items in this shard
================================================================================================================= ERRORS ==================================================================================================================
__________________________________________________________________________________________________ ERROR collecting tests/test_numpy.py ___________________________________________________________________________________________________
ImportError while importing test module '/opt/kvikio/python/tests/test_numpy.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_numpy.py:7: in <module>
from kvikio.numpy import LikeWrapper, tofile
/usr/local/lib/python3.8/dist-packages/kvikio/numpy.py:10: in <module>
from numpy._typing._array_like import ArrayLike
E ModuleNotFoundError: No module named 'numpy._typing'
cf. numpy._typing
is introduced since numpy 1.23.0
numpy/numpy@7739583
Currently we have API docs for Cufile
, but not for the compression components. Would be good to include something about these.
cc @thomcom
I am trying to statically link cufile (libcufile_static.a
) but it fails when calling the cuda api. Am I doing something stupid?
mylib.cpp
#include <iostream>
#include <cassert>
#include <nvml.h>
#include <cuda.h>
#include <cuda_runtime_api.h>
using namespace std;
void test() {
cout << "test()" << endl;
nvmlReturn_t e1 = nvmlInit();
assert(e1 == NVML_SUCCESS);
cudaError e2 = cudaSetDevice(0);
assert(e2 == cudaSuccess);
CUdevice cu_dev{};
CUresult e3 = cuCtxGetDevice(&cu_dev);
cout << "cuCtxGetDevice(): " << e3 << endl;
assert(e3 == CUDA_SUCCESS);
}
myapp.cpp
void test();
int main()
{
test();
}
build.sh
set -e
CUFILE=/usr/local/cuda/lib64/libcufile_static.a
#CUFILE=
g++ -c -O1 -g -std=gnu++17 -fPIC -I /usr/local/cuda/include mylib.cpp
g++ -shared -std=gnu++17 -O1 -g mylib.o $CUFILE \
-L/usr/local/cuda/lib64 -lcudart -lnvidia-ml -lcuda -o mylib.so
g++ myapp.cpp -std=gnu++17 -O1 -g -Wl,-rpath,/usr/local/cuda/lib64 -Wl,-rpath,. mylib.so -o myapp
./myapp
Placing the three files in a folder and running sh build.sh
produces:
$ sh build.sh
test()
cuCtxGetDevice(): 999
myapp: mylib.cpp:21: void test(): Assertion `e3 == CUDA_SUCCESS' failed.
Aborted (core dumped)
Now, if I do dynamic linking instead by setting CUFILE=
and adding -lcufile
in the build script, it works:
$ sh build.sh
test()
cuCtxGetDevice(): 0
I'm seeing either a double free or invalid pointer error every time I complete a benchmark run. Here are the logs from runs:
/mnt/nvme0/aborkar/kvikio/python/benchmarks$ python3 single-node-io.py -d /mnt/nvme0/aborkar/ -t 24 --nruns 3 2>&1 | tee kvikio_local_nvme.log
Roundtrip benchmark
----------------------------------
GPU | NVIDIA A100-SXM4-80GB (dev #0)
GPU Memory Total | 80.00 GiB
BAR1 Memory Total | 128.00 GiB
GDS driver | v2.13
GDS config.json | /usr/local/cuda-11.8/gds/cufile.json
----------------------------------
nbytes | 10485760 bytes (10.00 MiB)
4K aligned | True
pre-reg-buf | True
diretory | /mnt/nvme0/aborkar
nthreads | 24
nruns | 3
==================================
cufile read | 1.64 GiB/s ± 5.49 % (1.66 GiB/s, 1.72 GiB/s, 1.54 GiB/s)
cufile write | 3.31 GiB/s ± 12.42 % (2.88 GiB/s, 3.34 GiB/s, 3.70 GiB/s)
posix read | 2.32 GiB/s ± 46.37 % (1.12 GiB/s, 2.63 GiB/s, 3.20 GiB/s)
posix write | 0.95 GiB/s ± 13.13 % (824.43 MiB/s, 1.03 GiB/s, 1.01 GiB/s)
double free or corruption (!prev)
/mnt/nvme0/aborkar/kvikio/python/benchmarks$ python3 single-node-io.py -d /mnt/nvme0/aborkar/ -t 8 --nruns 3 2>&1 | tee kvikio_local_nvme.log
Roundtrip benchmark
----------------------------------
GPU | NVIDIA A100-SXM4-80GB (dev #0)
GPU Memory Total | 80.00 GiB
BAR1 Memory Total | 128.00 GiB
GDS driver | v2.13
GDS config.json | /usr/local/cuda-11.8/gds/cufile.json
----------------------------------
nbytes | 10485760 bytes (10.00 MiB)
4K aligned | True
pre-reg-buf | True
diretory | /mnt/nvme0/aborkar
nthreads | 8
nruns | 3
==================================
cufile read | 1.50 GiB/s ± 9.57 % (1.36 GiB/s, 1.48 GiB/s, 1.65 GiB/s)
cufile write | 3.52 GiB/s ± 12.89 % (3.00 GiB/s, 3.84 GiB/s, 3.72 GiB/s)
posix read | 2.84 GiB/s ± 50.86 % (1.17 GiB/s, 3.66 GiB/s, 3.69 GiB/s)
posix write | 0.96 GiB/s ± 15.34 % (814.17 MiB/s, 1.01 GiB/s, 1.08 GiB/s)
free(): invalid pointer
Hi! We've been talking about adding nvcomp
to kvikio
. I'm looking to add python bindings for Snappy, Cascaded, and Lz4 algorithms from nvcomp
. In order to do so, we'll need to add the python bindings nvcomp.pyx
and nvcomp.pxd
to kvikio/python/kvikio/_lib
and a wrapper for them, nvcomp.py
. Once this is done I'll write tests.
A CMakeFlag -DUSE_NVCOMP=True
will be added, disabled by default.
We're planning on using the nvcomp headers that are installed by cudf, which can be installed via conda, right?
I'm also looking into adding kvikio
as another library option for https://github.com/trxcllnt/rapids-compose, which will make maintenance and development quite easy.
GDS doesn't work on WSL even for compatibility mode.
test.py
import cupy
import kvikio
a = cupy.arange(100)
f = kvikio.CuFile("test-file", "w")
# Write whole array to file
f.write(a)
f.close()
b = cupy.empty_like(a)
f = kvikio.CuFile("test-file", "r")
# Read whole array from file
f.read(b)
assert all(a == b)
❯ python test.py
Assertion failure, file index :cufio-udev line :143
[1] 13856 abort python test.py
FYI, cuCIM handles the issue by checking platform and skip executing cuFileDriverOpen()
.
Not critical since kvikio doesn't have wheels, but scikit-build has bugs with include_package_data
so specifying package_data
explicitly (like in the non-legate setup.py) is safer. That said I don't see wheels happening for legate-kvikio anytime soon so it's mostly just to be safe.
Originally posted by @vyasr in #232 (comment)
Currently this is making use of pynvml
in a few places:
kvikio/python/benchmarks/single-node-io.py
Lines 248 to 250 in c152f63
However we would like to move to nvidia-ml-py
in the future. Raising this issue to track this work
The overhead of KvikIO becomes significant for small reads and writes.
In cuDF we had to skip KvikIO when reading and writing small buffers. See rapidsai/cudf#12780 and rapidsai/cudf#12841
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.