Comments (6)
I got it working in Colab:
https://github.com/ZoliN/colab/blob/main/hdrnetOrigSlice.ipynb
It uses the original bilateral slice function from 2017 (only GPU).
I built it with the latest slice function(GPU+CPU) too with some hacking, however the GPU kernel crashes, so it is only usable only on CPU device at the moment:
https://github.com/ZoliN/colab/blob/main/hdrnetNewSlice.ipynb
These all use TF1.x. The slice OP can be built with TF2 too, however layers.py would have to be rewritten I think.
from hdrnet.
I suffered the same issue.
from hdrnet.
There is a bug in tensorflow most likely that crashes that GPU kernel due to wrong shapes.
from hdrnet.
I tried to solve issue but still have issue as like upper problem..
Could you advise to solve this issue?
Hi,
Some of the problems that are discussed in this issue are already rised in issues: #4 & #9 . I decided to open a new issue as I want to be able to execute this project in any of its version (not necessarly the latest). I included some of the trails I did.
I've tried two versions of this project and failed in both of them. Following some hints that I saw in other issues, I made some progress but didn't succeed. I would like to share my experiments and ask for suggestions.
The information that is missing in this project is what third parties & versions should be used in the compilation & how to arrange them.
The latest commit
The first step was to try the latest commit #7f71f44 (2022-05-08)
The latest commit compilation
As reported in other issues, simply executing
make
results with$ make nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true 2022-05-27 13:11:53.745577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ops/bilateral_slice.cu.cc:23:10: fatal error: third_party/array/array.h: No such file or directory #include "third_party/array/array.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed make: *** [build/bilateral_slice.cu.o] Error 1Adding the array third party
Following answer in issue #4 I cloned the array third party from https://github.com/dsharlet/array/ (commit ID #344d75d of 2022-04-11). I placed this project under
hdrnet/ops/third_party/array
Now I had some progress in runningmake
:$ make nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true 2022-05-27 13:13:12.143046: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 ops/bilateral_slice.cu.cc:24:10: fatal error: third_party/tensorflow/core/util/gpu_kernel_helper.h: No such file or directory #include "third_party/tensorflow/core/util/gpu_kernel_helper.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed make: *** [build/bilateral_slice.cu.o] Error 1The tensorflow third party
Changing the include switches
As the error seems to relate to tensorflow, I have tested the command that should provide the location of the tensorflow include files:
python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'
This results with:2022-05-27 13:14:54.228153: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 /miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include
Replacing the
python -c...
command with the python tensorflow include path results with the same error.Copy the tensorflow core to the thirdparty folder
The next step was to copy the folder of
tensorflow/core/util/gpu_kernel_helper.h
(of the tensorflow project - commitID #0976345ba57) to the third party folder (I copied the full folder structure)Running
make
now results with the folllowing error:nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true 2022-05-28 06:23:47.224210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 In file included from ops/bilateral_slice.cu.cc:24:0: ops/third_party/tensorflow/core/util/gpu_kernel_helper.h:24:10: fatal error: third_party/gpus/cuda/include/cuda_fp16.h: No such file or directory #include "third_party/gpus/cuda/include/cuda_fp16.h" ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed make: *** [build/bilateral_slice.cu.o] Error 1The cuda third party
I have copied the location of the cuda_fp16 (
/usr/local/cuda/include/cuda_fp16.h
) to the third_party/gpus/cuda location. This by itself didn't work. So I added the ops folder to the include path by manually executing:nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -Iops
This results with another error message: (click to open)
Trying to fix this by upgrading the compiler toc++14
:nvcc -std c++14 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -Iops
This results with the following error message:
2022-05-28 16:35:50.746921: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 /miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/file_system.h(556): warning: overloaded virtual function "tensorflow::FileSystem::FilesExist" is only partially overridden in class "tensorflow::WrappedFileSystem" /miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/file_system.h(556): warning: overloaded virtual function "tensorflow::FileSystem::CreateDir" is only partially overridden in class "tensorflow::WrappedFileSystem" /miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/env.h(482): warning: overloaded virtual function "tensorflow::Env::RegisterFileSystem" is only partially overridden in class "tensorflow::EnvWrapper" ops/third_party/array/array.h(2065): warning: "nda::array_ref<T, Shape>::operator nda::const_array_ref<const float, nda::shape_of_rank<5UL>>() const [with T=const float, Shape=nda::shape_of_rank<5UL>]" will not be called for implicit or explicit conversions detected during instantiation of class "nda::array_ref<T, Shape> [with T=const float, Shape=nda::shape_of_rank<5UL>]" ops/bilateral_slice.cu.cc(37): here ops/third_party/array/array.h(2065): warning: "nda::array_ref<T, Shape>::operator nda::const_array_ref<const float, nda::shape_of_rank<3UL>>() const [with T=const float, Shape=nda::shape_of_rank<3UL>]" will not be called for implicit or explicit conversions detected during instantiation of class "nda::array_ref<T, Shape> [with T=const float, Shape=nda::shape_of_rank<3UL>]" ops/bilateral_slice.cu.cc(37): here ops/bilateral_slice.cu.cc(74): error: namespace "std" has no member "clamp" ops/bilateral_slice.cu.cc(77): error: namespace "std" has no member "clamp" ops/bilateral_slice.cu.cc(80): error: namespace "std" has no member "clamp" ops/third_party/array/array.h(2065): warning: "nda::array_ref<T, Shape>::operator nda::const_array_ref<const float, nda::shape_of_rank<4UL>>() const [with T=const float, Shape=nda::shape_of_rank<4UL>]" will not be called for implicit or explicit conversions detected during instantiation of class "nda::array_ref<T, Shape> [with T=const float, Shape=nda::shape_of_rank<4UL>]" ops/bilateral_slice.cu.cc(96): here ops/bilateral_slice.cu.cc(203): error: namespace "std" has no member "clamp" ops/bilateral_slice.cu.cc(206): error: namespace "std" has no member "clamp" ops/bilateral_slice.cu.cc(209): error: namespace "std" has no member "clamp" 6 errors detected in the compilation of "ops/bilateral_slice.cu.cc".
Searching more about this issue, seems that the std::clamp is implemented in
c++17
:nvcc -std c++17 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -Iops
Results with the following error message: (click to view)
The initial commit
Following the suggestion here #9 I tried to take the initial commit (#5ac95ef of 2017-08-21)
First compilation of the initial commit
- It appears that this commit requires
tensorflow_gpu==1.1.0
- andpython 2.7
updated in the environmentclick for pip list content
3. When I try to compile according to the readme:cd hdrnet make
I executed
~/GIT/hdrnet/hdrnet$ make
and recieve:nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true In file included from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:14:0, from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4, from ops/bilateral_slice.cu.cc:19: /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/Core:42:14: fatal error: math_functions.hpp: No such file or directory #include <math_functions.hpp> ^~~~~~~~~~~~~~~~~~~~ compilation terminated. Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed make: *** [build/bilateral_slice.cu.o] Error 1The initial commit - adding third party
Understand that the thirdparty folder is missing, I cloned the eigen project: https://gitlab.com/libeigen/eigen.git to the folder:
hdrnet/third_party/eigen3
As a commit ID for the eigne project I tried commit #5c68ba41a (2017-02-21).
Executing make results with:
nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true In file included from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:14:0, from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4, from ops/bilateral_slice.cu.cc:19: /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/Core:42:14: fatal error: math_functions.hpp: No such file or directory #include <math_functions.hpp> ^~~~~~~~~~~~~~~~~~~~ compilation terminated. Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed make: *** [build/bilateral_slice.cu.o] Error 1The initial commit - Debugging the compilation error
I tried to execute the compilation command manually:
nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true
As the function
python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'
returned:/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include
I executed:
nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=trueAs this returned the same error as before, I added the location of the
<math_functions.hpp>
to the include folder in the compilation (-I/usr/local/cuda/include/crt
):nvcc -std c++11 -c ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -I/usr/local/cuda/include/crtI recieved the following error of cuda not supported: (click to view)
At this point I'm stuck at the moment...My cuda version is:
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0
from hdrnet.
I got it working in Colab: https://github.com/ZoliN/colab/blob/main/hdrnetOrigSlice.ipynb It uses the original bilateral slice function from 2017 (only GPU).
I built it with the latest slice function(GPU+CPU) too with some hacking, however the GPU kernel crashes, so it is only usable only on CPU device at the moment: https://github.com/ZoliN/colab/blob/main/hdrnetNewSlice.ipynb
These all use TF1.x. The slice OP can be built with TF2 too, however layers.py would have to be rewritten I think.
I've tried your Colab and works great on the base version of Colab. However, when I switch to Colab pro, it stopped working. It builds fine and I get the same output as on the base version of Colab but py.test is throwing Assertion errors and Loss is blowing up when I train. I know this sounds like a question that I should be asking Google Colab for but just wondering if you have any idea as to why this is happening?
`_________________ BilateralSliceApplyTest.test_input_gradient __________________
self = <hdrnet.test.ops_test.BilateralSliceApplyTest testMethod=test_input_gradient>
def test_input_gradient(self):
for dev in ['/gpu:0']:
batch_size = 1
h = 8
w = 5
gh = 6
gw = 3
d = 7
i_chans = 3
o_chans = 3
grid_shape = [batch_size, gh, gw, d, (1+i_chans)*o_chans]
guide_shape = [batch_size, h, w]
input_shape = [batch_size, h, w, i_chans]
output_shape = [batch_size, h, w, o_chans]
grid_data = np.random.rand(*grid_shape).astype(np.float32)
guide_data = np.random.rand(*guide_shape).astype(np.float32)
input_data = np.random.rand(*input_shape).astype(np.float32)
with tf.device(dev):
grid_tensor = tf.convert_to_tensor(grid_data,
name='data',
dtype=tf.float32)
guide_tensor = tf.convert_to_tensor(guide_data,
name='guide',
dtype=tf.float32)
input_tensor = tf.convert_to_tensor(input_data,
name='input',
dtype=tf.float32)
output_tensor = ops.bilateral_slice_apply(grid_tensor, guide_tensor, input_tensor, has_offset=True)
with self.test_session():
err = tf.test.compute_gradient_error(
input_tensor,
input_shape,
output_tensor,
output_shape)
self.assertLess(err, 3e-4)
E AssertionError: 0.9942179322242737 not less than 0.0003
test/ops_test.py:506: AssertionError
----------------------------- Captured stderr call -----------------------------
2022-09-17 14:15:11.833032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2022-09-17 14:15:11.833117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-09-17 14:15:11.833127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2022-09-17 14:15:11.833134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2022-09-17 14:15:11.833216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4884 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
=========================== short test summary info ============================
FAILED test/ops_test.py::BilateralSliceTest::test_grid_optimize - AssertionEr...
FAILED test/ops_test.py::BilateralSliceTest::test_guide_gradient - AssertionE...
FAILED test/ops_test.py::BilateralSliceTest::test_guide_optimize - AssertionE...
FAILED test/ops_test.py::BilateralSliceTest::test_interpolate - AssertionErro...
FAILED test/ops_test.py::BilateralSliceTest::test_optimize_both - AssertionEr...
FAILED test/ops_test.py::BilateralSliceApplyTest::test_grid_gradient - Assert...
FAILED test/ops_test.py::BilateralSliceApplyTest::test_guide_gradient - Asser...
FAILED test/ops_test.py::BilateralSliceApplyTest::test_input_gradient - Asser...
=================== 8 failed, 4 passed, 2 skipped in 27.87s ====================
[ ]
%cd /content/hdrnet
!wget https://data.csail.mit.edu/graphics/hdrnet/pretrained_models.zip
!unzip pretrained_models.zip
`
from hdrnet.
Related Issues (18)
- Maximum number of epochs or steps HOT 5
- How to compile the custom op?
- TF2+ support
- trying to setup a windows build for the slice oper HOT 1
- could you supply the android souce demo?thank you1 HOT 1
- How to run the code in Python3? HOT 1
- Question on bilateral slice operation
- Switch from mirror to clamp boundary conditions in bilateral_slice
- Question on training HOT 1
- I tried to run the initial version of hdrnet.
- Implement pure Python version of bilateral_slice
- parameters in the model
- Deprecated requirement: numpy.distutils
- Fatal error: third_party/array/array.h: No such file or directory HOT 9
- JAX: Cache intermediates to speed up guide vjp
- there is no the file: pretrained_models/download.py HOT 2
- what is the error means? i use c++11 , when i run "make" in the hdrnet folder, get follows errors, "third_party/array/array.h" come from this repo:https://github.com/dsharlet/array/ HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hdrnet.