Git Product home page Git Product logo

Comments (6)

ZoliN avatar ZoliN commented on May 8, 2024 3

I got it working in Colab:
https://github.com/ZoliN/colab/blob/main/hdrnetOrigSlice.ipynb
It uses the original bilateral slice function from 2017 (only GPU).

I built it with the latest slice function(GPU+CPU) too with some hacking, however the GPU kernel crashes, so it is only usable only on CPU device at the moment:
https://github.com/ZoliN/colab/blob/main/hdrnetNewSlice.ipynb

These all use TF1.x. The slice OP can be built with TF2 too, however layers.py would have to be rewritten I think.

from hdrnet.

TrungKhoaLe avatar TrungKhoaLe commented on May 8, 2024

I suffered the same issue.

from hdrnet.

eduardinjo avatar eduardinjo commented on May 8, 2024

There is a bug in tensorflow most likely that crashes that GPU kernel due to wrong shapes.

from hdrnet.

Egkang-Luis avatar Egkang-Luis commented on May 8, 2024

I tried to solve issue but still have issue as like upper problem..

Could you advise to solve this issue?

Hi,

Some of the problems that are discussed in this issue are already rised in issues: #4 & #9 . I decided to open a new issue as I want to be able to execute this project in any of its version (not necessarly the latest). I included some of the trails I did.

I've tried two versions of this project and failed in both of them. Following some hints that I saw in other issues, I made some progress but didn't succeed. I would like to share my experiments and ask for suggestions.

The information that is missing in this project is what third parties & versions should be used in the compilation & how to arrange them.

The latest commit

The first step was to try the latest commit #7f71f44 (2022-05-08)

The latest commit compilation

As reported in other issues, simply executing make results with

$ make
nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true
2022-05-27 13:11:53.745577: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
ops/bilateral_slice.cu.cc:23:10: fatal error: third_party/array/array.h: No such file or directory
 #include "third_party/array/array.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed
make: *** [build/bilateral_slice.cu.o] Error 1

Adding the array third party

Following answer in issue #4 I cloned the array third party from https://github.com/dsharlet/array/ (commit ID #344d75d of 2022-04-11). I placed this project under hdrnet/ops/third_party/array Now I had some progress in running make :

$ make
nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true
2022-05-27 13:13:12.143046: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
ops/bilateral_slice.cu.cc:24:10: fatal error: third_party/tensorflow/core/util/gpu_kernel_helper.h: No such file or directory
 #include "third_party/tensorflow/core/util/gpu_kernel_helper.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed
make: *** [build/bilateral_slice.cu.o] Error 1

The tensorflow third party

Changing the include switches

As the error seems to relate to tensorflow, I have tested the command that should provide the location of the tensorflow include files: python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())' This results with:

2022-05-27 13:14:54.228153: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include

Replacing the python -c... command with the python tensorflow include path results with the same error.

Copy the tensorflow core to the thirdparty folder

The next step was to copy the folder of tensorflow/core/util/gpu_kernel_helper.h (of the tensorflow project - commitID #0976345ba57) to the third party folder (I copied the full folder structure)

Running make now results with the folllowing error:

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true
2022-05-28 06:23:47.224210: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
In file included from ops/bilateral_slice.cu.cc:24:0:
ops/third_party/tensorflow/core/util/gpu_kernel_helper.h:24:10: fatal error: third_party/gpus/cuda/include/cuda_fp16.h: No such file or directory
 #include "third_party/gpus/cuda/include/cuda_fp16.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed
make: *** [build/bilateral_slice.cu.o] Error 1

The cuda third party

I have copied the location of the cuda_fp16 (/usr/local/cuda/include/cuda_fp16.h) to the third_party/gpus/cuda location. This by itself didn't work. So I added the ops folder to the include path by manually executing:

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -Iops

This results with another error message: (click to open)
Trying to fix this by upgrading the compiler to c++14:

nvcc -std c++14 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -Iops

This results with the following error message:

2022-05-28 16:35:50.746921: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/file_system.h(556): warning: overloaded virtual function "tensorflow::FileSystem::FilesExist" is only partially overridden in class "tensorflow::WrappedFileSystem"

/miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/file_system.h(556): warning: overloaded virtual function "tensorflow::FileSystem::CreateDir" is only partially overridden in class "tensorflow::WrappedFileSystem"

/miniconda/envs/HDRNET/lib/python3.6/site-packages/tensorflow/include/tensorflow/core/platform/env.h(482): warning: overloaded virtual function "tensorflow::Env::RegisterFileSystem" is only partially overridden in class "tensorflow::EnvWrapper"

ops/third_party/array/array.h(2065): warning: "nda::array_ref<T, Shape>::operator nda::const_array_ref<const float, nda::shape_of_rank<5UL>>() const [with T=const float, Shape=nda::shape_of_rank<5UL>]" will not be called for implicit or explicit conversions
          detected during instantiation of class "nda::array_ref<T, Shape> [with T=const float, Shape=nda::shape_of_rank<5UL>]" 
ops/bilateral_slice.cu.cc(37): here

ops/third_party/array/array.h(2065): warning: "nda::array_ref<T, Shape>::operator nda::const_array_ref<const float, nda::shape_of_rank<3UL>>() const [with T=const float, Shape=nda::shape_of_rank<3UL>]" will not be called for implicit or explicit conversions
          detected during instantiation of class "nda::array_ref<T, Shape> [with T=const float, Shape=nda::shape_of_rank<3UL>]" 
ops/bilateral_slice.cu.cc(37): here

ops/bilateral_slice.cu.cc(74): error: namespace "std" has no member "clamp"

ops/bilateral_slice.cu.cc(77): error: namespace "std" has no member "clamp"

ops/bilateral_slice.cu.cc(80): error: namespace "std" has no member "clamp"

ops/third_party/array/array.h(2065): warning: "nda::array_ref<T, Shape>::operator nda::const_array_ref<const float, nda::shape_of_rank<4UL>>() const [with T=const float, Shape=nda::shape_of_rank<4UL>]" will not be called for implicit or explicit conversions
          detected during instantiation of class "nda::array_ref<T, Shape> [with T=const float, Shape=nda::shape_of_rank<4UL>]" 
ops/bilateral_slice.cu.cc(96): here

ops/bilateral_slice.cu.cc(203): error: namespace "std" has no member "clamp"

ops/bilateral_slice.cu.cc(206): error: namespace "std" has no member "clamp"

ops/bilateral_slice.cu.cc(209): error: namespace "std" has no member "clamp"

6 errors detected in the compilation of "ops/bilateral_slice.cu.cc".

Searching more about this issue, seems that the std::clamp is implemented in c++17:

nvcc -std c++17 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -Iops

Results with the following error message: (click to view)

The initial commit

Following the suggestion here #9 I tried to take the initial commit (#5ac95ef of 2017-08-21)

First compilation of the initial commit

  1. It appears that this commit requires tensorflow_gpu==1.1.0 - and python 2.7 updated in the environment

click for pip list content
3. When I try to compile according to the readme:

    cd hdrnet
    make

I executed ~/GIT/hdrnet/hdrnet$ make and recieve:

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true
In file included from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:14:0,
                 from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4,
                 from ops/bilateral_slice.cu.cc:19:
/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/Core:42:14: fatal error: math_functions.hpp: No such file or directory
     #include <math_functions.hpp>
              ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed
make: *** [build/bilateral_slice.cu.o] Error 1

The initial commit - adding third party

Understand that the thirdparty folder is missing, I cloned the eigen project: https://gitlab.com/libeigen/eigen.git to the folder: hdrnet/third_party/eigen3

As a commit ID for the eigne project I tried commit #5c68ba41a (2017-02-21).

Executing make results with:

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true
In file included from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:14:0,
                 from /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4,
                 from ops/bilateral_slice.cu.cc:19:
/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/Core:42:14: fatal error: math_functions.hpp: No such file or directory
     #include <math_functions.hpp>
              ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
Makefile:31: recipe for target 'build/bilateral_slice.cu.o' failed
make: *** [build/bilateral_slice.cu.o] Error 1

The initial commit - Debugging the compilation error

I tried to execute the compilation command manually:

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I`python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())'` -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true

As the function python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())' returned: /miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include

I executed:

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true

As this returned the same error as before, I added the location of the <math_functions.hpp> to the include folder in the compilation (-I/usr/local/cuda/include/crt):

nvcc -std c++11 -c  ops/bilateral_slice.cu.cc -o build/bilateral_slice.cu.o -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -I/miniconda/envs/p27/lib/python2.7/site-packages/tensorflow/include -expt-relaxed-constexpr -Wno-deprecated-gpu-targets -ftz=true -I/usr/local/cuda/include/crt

I recieved the following error of cuda not supported: (click to view)
At this point I'm stuck at the moment...

My cuda version is:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

from hdrnet.

fangli0906 avatar fangli0906 commented on May 8, 2024

I got it working in Colab: https://github.com/ZoliN/colab/blob/main/hdrnetOrigSlice.ipynb It uses the original bilateral slice function from 2017 (only GPU).

I built it with the latest slice function(GPU+CPU) too with some hacking, however the GPU kernel crashes, so it is only usable only on CPU device at the moment: https://github.com/ZoliN/colab/blob/main/hdrnetNewSlice.ipynb

These all use TF1.x. The slice OP can be built with TF2 too, however layers.py would have to be rewritten I think.

I've tried your Colab and works great on the base version of Colab. However, when I switch to Colab pro, it stopped working. It builds fine and I get the same output as on the base version of Colab but py.test is throwing Assertion errors and Loss is blowing up when I train. I know this sounds like a question that I should be asking Google Colab for but just wondering if you have any idea as to why this is happening?

`_________________ BilateralSliceApplyTest.test_input_gradient __________________

self = <hdrnet.test.ops_test.BilateralSliceApplyTest testMethod=test_input_gradient>

def test_input_gradient(self):
  for dev in ['/gpu:0']:
    batch_size = 1
    h = 8
    w = 5
    gh = 6
    gw = 3
    d = 7
    i_chans = 3
    o_chans = 3
    grid_shape = [batch_size, gh, gw, d, (1+i_chans)*o_chans]
    guide_shape = [batch_size, h, w]
    input_shape = [batch_size, h, w, i_chans]
    output_shape = [batch_size, h, w, o_chans]

    grid_data = np.random.rand(*grid_shape).astype(np.float32)
    guide_data = np.random.rand(*guide_shape).astype(np.float32)
    input_data = np.random.rand(*input_shape).astype(np.float32)

    with tf.device(dev):
      grid_tensor = tf.convert_to_tensor(grid_data,
                                         name='data',
                                         dtype=tf.float32)
      guide_tensor = tf.convert_to_tensor(guide_data,
                                          name='guide',
                                          dtype=tf.float32)
      input_tensor = tf.convert_to_tensor(input_data,
                                          name='input',
                                          dtype=tf.float32)

      output_tensor = ops.bilateral_slice_apply(grid_tensor, guide_tensor, input_tensor, has_offset=True)

    with self.test_session():
      err = tf.test.compute_gradient_error(
          input_tensor,
          input_shape,
          output_tensor,
          output_shape)
    self.assertLess(err, 3e-4)

E AssertionError: 0.9942179322242737 not less than 0.0003

test/ops_test.py:506: AssertionError
----------------------------- Captured stderr call -----------------------------
2022-09-17 14:15:11.833032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2022-09-17 14:15:11.833117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-09-17 14:15:11.833127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2022-09-17 14:15:11.833134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2022-09-17 14:15:11.833216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4884 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
=========================== short test summary info ============================
FAILED test/ops_test.py::BilateralSliceTest::test_grid_optimize - AssertionEr...
FAILED test/ops_test.py::BilateralSliceTest::test_guide_gradient - AssertionE...
FAILED test/ops_test.py::BilateralSliceTest::test_guide_optimize - AssertionE...
FAILED test/ops_test.py::BilateralSliceTest::test_interpolate - AssertionErro...
FAILED test/ops_test.py::BilateralSliceTest::test_optimize_both - AssertionEr...
FAILED test/ops_test.py::BilateralSliceApplyTest::test_grid_gradient - Assert...
FAILED test/ops_test.py::BilateralSliceApplyTest::test_guide_gradient - Asser...
FAILED test/ops_test.py::BilateralSliceApplyTest::test_input_gradient - Asser...
=================== 8 failed, 4 passed, 2 skipped in 27.87s ====================
[ ]
%cd /content/hdrnet
!wget https://data.csail.mit.edu/graphics/hdrnet/pretrained_models.zip
!unzip pretrained_models.zip
`

from hdrnet.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.