rocmsoftwareplatform / pytorch Goto Github PK
View Code? Open in Web Editor NEWThis project forked from pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home Page: http://pytorch.org
License: Other
This project forked from pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home Page: http://pytorch.org
License: Other
The goal is to create an all-in-one docker image and documentation for Pytorch/Caffe2.
The requirement:
Target audience: noob user who does not know anything.
Coverage: Include build and install instructions for C2 as well as its dependencies.
Assumption: that the user will have ROCm 1.8 installed on their system on Vega10.
The goal is to
The latest ROCm PyTorch docker image (rocm/pytorch:rocm2.1) appears not to include Boost, resulting in a build failure when following the recommended build process at https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm.
After manually installing Boost, PyTorch builds successfully.
Trying to execute this examples from Pytorch tutorial (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors, https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-defining-new-autograd-functions) hangs after first iteration. Others like this work https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors-and-autograd
# -*- coding: utf-8 -*-
import torch
dtype = torch.float
device = torch.device("cuda:0")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
- pytorch build on top of docker image rocm/pytorch:rocm1.9.2
- PyTorch version: 1.0.0a0+ee1f7b8
- OS: Ubuntu 18.04.1 LTS
- GPU VegaFE, amdgpu, 1.9-307, 4.15.0-42-generic, x86_64: installed
- CMake version: version 3.6.3
- Python version: 2.7
The goal is to eliminate the overhead in the current implementation. Expect to see on par performance between different frameworks.
The goal is to create a tracker on the upstream project.
Whenever anything cu. change (a potential break to pyHipify process), we will get notified.
Landing Zone
The goal is to enable rocprim ops on the project. Also, create the new rule in pyhipify.
The goal is to develop a strategy to properly provisioning AMD CI node for Pytorch/Caffe2.
Recently one of the CI workers was silently updated to Linux kernel 4.15 and hence cripple rocm stack.
One of the unit tests is failing due to this and gate upstream PR merge. As we are adding more AMD nodes to the CI pool; need to develop a protocol to provision those nodes.
Use option 3 from here https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm
In step 5, there's a choice (not sure why) to use one of two repos. Use the first one.
git clone https://github.com/pytorch/pytorch.git
or
git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
Once built, run the unit-tests with the command below.
PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py βverbose
======================================================================
FAIL: test_print (test_torch.TestTorch)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/pytorch/test/test_torch.py", line 8906, in test_print
self.assertExpectedInline(str(x), '''tensor([1.0000e+28, 1.0000e-28])''')
File "/data/pytorch/test/expecttest.py", line 195, in assertExpectedInline
self.assertMultiLineEqual(expect, actual, msg=help_text)
AssertionError: 'tensor([1.0000e+28, 1.0000e-28])' != 'tensor([1.00000e+28, 1.00000e-28])'
- tensor([1.0000e+28, 1.0000e-28])
+ tensor([1.00000e+28, 1.00000e-28])
? + +
: To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)
Create documentation for the open source community on how to start working with the ROCm PyTorch Docker Image. Provide a clear summary on what's supported at the moment and what's on the agenda.
The goal is to evaluate the high-level approach to enable Pytorch/caffe2 distributed training
I'm trying to build pytorch for ROCm, and it fails with this log:
[ 67%] Building CXX object modules/module_test/CMakeFiles/caffe2_module_test_dynamic.dir/module_test_dynamic.cc.o
[ 67%] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o
[ 67%] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state_dlpack.cc.o
[ 67%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCStorage.cu.o
/home/john/git/pytorch-rocmfork/aten/src/THC/THCStorage.cu:4:10: fatal error: 'thrust/device_ptr.h' file not found
#include <thrust/device_ptr.h>
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
/home/john/git/pytorch-rocmfork/aten/src/THC/THCStorage.cu:4:10: fatal error: 'thrust/device_ptr.h' file not found
#include <thrust/device_ptr.h>
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
CMake Error at /opt/rocm/hip/cmake/FindHIP/run_make2cmake.cmake:18 (file):
file failed to open for reading (No such file or directory):/home/john/git/pytorch-rocmfork/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCStorage.cu.o.depend.pre
CMake Error at caffe2_hip_generated_THCStorage.cu.o.cmake:134 (message):
Error generating
/home/john/git/pytorch-rocmfork/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/./caffe2_hip_generated_THCStorage.cu.omake[2]: *** [caffe2/CMakeFiles/caffe2_hip.dir/build.make:22908: caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCStorage.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3395: caffe2/CMakeFiles/caffe2_hip.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Steps to reproduce the behavior:
python3 tools/amd_build/build_pytorch_amd.py
USE_ROCM=1 python3 setup.py build
I expect pytorch to compile.
Ubuntu 18.10.
-- Using python found in /usr/bin/python3
HIP VERSION: 1.5.18353
***** Library versions from dpkg *****
hsakmt-roct VERSION: 1.0.9-8-g238782c
hsakmt-roct-dev VERSION: 1.0.9-8-g238782c
hsa-ext-rocr-dev VERSION: 1.1.9-9-ge4ab040
hsa-rocr-dev VERSION: 1.1.9-9-ge4ab040
hcc VERSION: 1.2.18354
hip_base VERSION: 1.5.18353
hip_hcc VERSION: 1.5.18353
***** Library versions from cmake find_package *****
rocrand VERSION: 1.8.1
hiprand VERSION: 1.8.1
rocblas VERSION: 0.14.2.4
miopen VERSION: 1.5.0-e1f0433
miopengemm VERSION: 1.1.5-9547fb9
rocfft VERSION: 0.8.6.0
hipsparse VERSION: 0.1.3.2
rocsparse VERSION: 0.1.3.2
ROCm is enabled.
NOTE: I do NOT have the ubuntu package libthrust-dev
installed.. if I do, then it fails with other errors saying cuda isn't installed.
Have at least some rocm demos working on my HP EliteBook 745 G5, Ryzen 5 PRO 2500U. Thought I'd see if I can use pytorch with rocm yet for my ML projects.
I'm trying to build PyTorch to run on ROCm (Ubuntu 18.04) and am having issues. I tried the following.
I followed https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm but it seems to have failed at pyyaml (https://gist.github.com/briansp2020/114bd75ff0182197cf7efc7af265e89c)
I got over the error by installing wheel. However, the build still failed later (https://gist.github.com/briansp2020/2719353d626968082410011dc36608cf)
I tried build it in tensorflow docker and I get https://gist.github.com/briansp2020/2a109c0f1d40b45299cb73a76a255767
It seems the wiki is old and I needed to get latest rocSPARSE (https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases) to get past the CMake phase. Unfortunately, build still failed(https://gist.github.com/briansp2020/52047cf73d8d59ddd72f730d779b952c)...
Do you have up to date instruction on how to build PyTorch with ROCm? My goal is to run fast.ai on Vega FE with ROCm.
Thanks!
python ../../caffe2/python/convnet_benchmarks.py --batch_size 1 --model OverFeat --net_type simple --layer_wise_benchmark True 2>&1 | tee caffe2_overfeat_bs1.txt
I0801 15:38:58.072324 30959 net_simple.cc:101] Starting benchmark.
I0801 15:38:58.072343 30959 net_simple.cc:102] Running warmup runs.
Memory access fault by GPU node-1 on address 0x524400000. Reason: Page not present or supervisor privilege.
OverFeat: running forward-backward.
*** Aborted at 1533163139 (unix time) try "date -d @1533163139" if you are using GNU date ***
PC: @ 0x7f307508d428 gsignal
*** SIGABRT (@0x3e8000078ef) received by PID 30959 (TID 0x7f3023c7c700) from PID 30959; stack trace: ***
@ 0x7f3075433390 (unknown)
@ 0x7f307508d428 gsignal
@ 0x7f307508f02a abort
@ 0x7f30423e5155 (unknown)
@ 0x7f30423ebafd (unknown)
@ 0x7f30423b6817 (unknown)
@ 0x7f30754296ba start_thread
@ 0x7f307515f41d clone
@ 0x0 (unknown)
Hi. Thanks for this work guys.
I was curious as to whether you had been able to bench the framework on amd gpus ? I've successfully build pytorch with rocm support following your instructions, and the benchs I got don't seem right. I'm testing with a Radeon 580, which should be like half the performance as 1080 Ti, and I'm seeing more like 9-10 times drop in performances on convolution. The tensorflow benchs already show that the gap shouldn't be that wide.
Is this supposed to be normal for the moment ?
In order to improve build times, I've extended sccache to support HCC: https://github.com/Jorghi12/sccache. Once a full enumeration of the HCC command line arguments are obtained, I'll send a more robust solution upstream into Mozzila's sccache branch.
Hi, Im trying to get my AMD system set up to run some torch software , I prefer not to have to mess with Docker, is there a reason to do this ?
Is there a way to build this w/o docker?
Python3
root@0e76836e0bcf:/data/pytorch/examples/rl_a3c_pytorch# python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch._C._cuda_getDevice()
THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=53 error=35 : CUDA driver version is insufficient for CUDA runtime version
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/torch/csrc/cuda/Module.cpp:53
>>> quit()
Python2
root@0e76836e0bcf:/data/pytorch/examples/rl_a3c_pytorch# python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch._C._cuda_getDevice()
0L
>>> quit()
Python3 was built as follows:
First installed the following:
apt-get install python3-dev
apt-get install -y python3-pip
alias python=python3
then built with no changes
.jenkins/pytorch/build.sh
Currently, we do not support multi-GPU on ROCm, nor do we assume it works right now. This issue is tracking insights and progress as we are trying to enable this.
The goal is to review cudnn implementation of ops and implement miopen version if applicable
The PyTorch-based Faster R-CNN model use a few special CUDA kernels such as NMS, ROI_Pooing, ROI_Align and ROI_Crop.
The integration steps under CUDA are available here
For ROCm integration, I'm guessing the first step is hipification.
/opt/rocm/hip/bin/hipify-perl nms_cuda_kernel.cu > nms_hip_kernel.cpp
What's next? PyTorch 1.0 related packaging?
Kindly provide instructions for the rest of the steps for PyTorch 1.0
The instructions that would perhaps replace the code snippets below.
from torch.utils.cpp_extension import CUDAExtension
I've tried a few things but the above import
seem hard-wired to CUDA.
if torch.cuda.is_available() and CUDA_HOME is not None:
extension = CUDAExtension
sources += source_cuda
define_macros += [("WITH_CUDA", None)]
extra_compile_args["nvcc"] = [
"-DCUDA_HAS_FP16=1",
"-D__CUDA_NO_HALF_OPERATORS__",
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]
ext_modules = [
extension(
"model._C",
sources,
include_dirs=include_dirs,
define_macros=define_macros,
extra_compile_args=extra_compile_args,
)
]
PyTorch ROCm for a while had the ATen ROCm tests disabled https://github.com/ROCmSoftwarePlatform/pytorch/blob/master/caffe2/CMakeLists.txt#L341
Basically, re-enable them and handle any HCC related issues that come up in the process of doing so.
Currently, the PyTorch profiler is disabled when building with ROCm. In the future, we'd like to start using the profiler here https://github.com/GPUOpen-Tools/RCP.
Compiling PyTorch in the rocm/pytorch:rocm2.1
docker, I'm getting a ton of warning: loop not unrolled
printing out. I don't see them in any of your CI output or other snippets posted here, so I wondered if this might be the reason for my problems. I have three tests failing, two with errors similar to another open issue, and neural network training isn't working for me.
In the PyTorch beginning tutorial, there are no errors, but the network is clearly not being trained:
[1, 2000] loss: 2.304
[1, 4000] loss: 2.303
[1, 6000] loss: 2.303
[1, 8000] loss: 2.303
[1, 10000] loss: 2.303
[1, 12000] loss: 2.304
[2, 2000] loss: 2.303
[2, 4000] loss: 2.303
[2, 6000] loss: 2.303
[2, 8000] loss: 2.304
[2, 10000] loss: 2.304
[2, 12000] loss: 2.303
Finished Training
Just to be clear, the loss function should converge towards 1.0, and does when run via CPU.
My PyTorch is at least partly working - I've been using it to run https://github.com/xinntao/ESRGAN, and the results are clearly superior to running via CPU. I have no idea if I'm doing something wrong with the compile or there's a bug somewhere, but it seems to be training rather than executing that is broken.
rocm/pytorch:rocm2.1
docker after apt full-update
. Host: Ubuntu 18.10, Ryzen 5 1600x, 16GB RAM. I've tried both lowering MAX_JOBS
and creating a large swap file to avoid memory issues, but none of that affects the errors.
Here's everything from your environment script that got a value:
PyTorch version: 1.1.0a0+c751cf8
Is debug build: No
OS: Ubuntu 16.04.5 LTS
CMake version: version 3.6.3
Python version: 2.7
Is CUDA available: Yes
Versions of relevant libraries:
[pip] numpy==1.15.4
[pip] torch==1.1.0a0+c751cf8
[pip] torchvision==0.2.1
R9 Fury, target gfx803
. I wonder if using an older, non-default target may be part of my problem. I understand older GPUs naturally receive less focus, though I hope you'll be able to look at it if there is a gfx803
issue.
Example warning:
In file included from /data/development/rocm-pytorch/aten/src/THH/THHTensorSort.cuh:8:
/data/development/rocm-pytorch/aten/src/THH/THHSortUtils.cuh:141:1:
warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
Test Output:
======================================================================
FAIL: test_broadcast_batched_matmul (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/development/rocm-pytorch/test/common_utils.py", line 296, in wrapper
method(*args, **kwargs)
File "/data/development/rocm-pytorch/test/test_cuda.py", line 2218, in test_broadcast_batched_matmul
_TestTorchMixin._test_broadcast_batched_matmul(self, lambda t: t.cuda())
File "/data/development/rocm-pytorch/test/test_torch.py", line 3760, in _test_broadcast_batched_matmul
verify_batched_matmul(*indices)
File "/data/development/rocm-pytorch/test/test_torch.py", line 3752, in verify_batched_matmul
self.assertEqual(truth, maybe_squeeze_result(l, r, out))
File "/data/development/rocm-pytorch/test/common_utils.py", line 427, in assertEqual
assertTensorsEqual(x, y)
File "/data/development/rocm-pytorch/test/common_utils.py", line 408, in assertTensorsEqual
self.assertTrue(torch.equal(nan_mask, torch.isnan(b)), message)
AssertionError: False is not true :
======================================================================
FAIL: test_broadcast_fused_matmul (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/development/rocm-pytorch/test/common_utils.py", line 296, in wrapper
method(*args, **kwargs)
File "/data/development/rocm-pytorch/test/test_cuda.py", line 2215, in test_broadcast_fused_matmul
_TestTorchMixin._test_broadcast_fused_matmul(self, lambda t: t.cuda())
File "/data/development/rocm-pytorch/test/test_torch.py", line 3689, in _test_broadcast_fused_matmul
self.assertEqual(r0, r1)
File "/data/development/rocm-pytorch/test/common_utils.py", line 427, in assertEqual
assertTensorsEqual(x, y)
File "/data/development/rocm-pytorch/test/common_utils.py", line 419, in assertTensorsEqual
self.assertLessEqual(max_err, prec, message)
AssertionError: tensor(9., device='cuda:0', dtype=torch.float32) not less than or equal to 1e-05 :
======================================================================
FAIL: test_randperm_cuda (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/data/development/rocm-pytorch/test/common_utils.py", line 296, in wrapper
method(*args, **kwargs)
File "/data/development/rocm-pytorch/test/test_cuda.py", line 2513, in test_randperm_cuda
self.assertEqual(res1, res2, 0)
File "/data/development/rocm-pytorch/test/common_utils.py", line 427, in assertEqual
assertTensorsEqual(x, y)
File "/data/development/rocm-pytorch/test/common_utils.py", line 419, in assertTensorsEqual
self.assertLessEqual(max_err, prec, message)
AssertionError: tensor(9223372036854775492, device='cuda:0') not less than or equal to 0 :
----------------------------------------------------------------------
Ran 150 tests in 7.430s
FAILED (failures=3, skipped=92)
The goal is to integrate MIOpen RNN APIs into caffe2.
Identify and fix outstanding issues with fp16.
After enabling Detectron files hipification and building in PR #295, there are warnings while building the project from the following files:
sigmoid_focal_loss_op_hip.cc
ps_roi_pool_op_hip.cc
smooth_l1_loss_op_hip.cc
The warnings are because HIP does not overload max
and abs
functions.
Please add appropriate checks #if defined (__HIP_PLATFORM_HCC__)
and use more specific HIP functions like fmaxf
and fabsf
in the corresponding CUDA files.
Currently, when using PyTorch with ROCm, you'll notice the following error:
import torch
torch.Tensor(1).cuda()
RuntimeError: torch.cuda.sparse.FloatTensor is not enabled.
However, the error disappears by executing torch.cuda._lazy_init() very early.
import torch
torch.cuda._lazy_init()
torch.Tensor(1).cuda()
tensor([ 0], device='cuda:0')
PyTorch ROCM package as a pip package like tensorflow-rocm would be great.
While it is already a pain for newer users to get things up and running, pytorch installation for rocm platform is just a lot for newer users. Since there is a tensorflow-rocm package for new users to easily download and install, I think PyTorch should have it too for the users who prefer pytorch over tensorflow.
PyTorch ROCM package as a pip package like tensorflow-rocm would be great.
A conda package would be great too
Aten appears to me missing files for caffe2_hip and prevents the installation of torch
In file included from /home/user/dev/pytorch/aten/src/THC/THCTensorIndex.cu:12:
/home/user/dev/pytorch/aten/src/THC/THCAtomics.cuh:145:35: error: static declaration of 'atomicAdd' follows non-static declaration
static inline device void atomicAdd(double address, double val) { }
^
/opt/rocm/hip/include/hip/hcc_detail/hip_atomic.h:73:8: note: previous definition is here
double atomicAdd(double address, double val)
^
1 error generated.
[100%] Linking HIP shared library ../lib/libcaffe2_hip.so
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THC/caffe2_hip_generated_THCTensorIndex.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THC/caffe2_hip_generated_THCTensorScatterGather.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_FeatureLPPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_IndexLinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialAdaptiveAveragePooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialAdaptiveMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialClassNLLCriterion.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialFractionalMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialGridSamplerBilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialReflectionPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialReplicationPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialSubSampling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialUpSamplingBilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialUpSamplingNearest.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalReflectionPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalReplicationPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalUpSamplingLinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalUpSamplingNearest.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricAdaptiveAveragePooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricAdaptiveMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricAveragePooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricDilatedMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricFractionalMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricGridSamplerBilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricReplicationPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricUpSamplingNearest.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricUpSamplingTrilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_Activation.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_Distributions.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_EmbeddingBag.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_Gesv.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_SummaryOps.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_TensorCompare.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/sparse/cuda/caffe2_hip_generated_SparseCUDATensor.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/sparse/cuda/caffe2_hip_generated_SparseCUDATensorMath.cu.o'
[100%] Built target caffe2_hip
Install the project...
CMake Error at caffe2/cmake_install.cmake:69 (file):
file INSTALL cannot find
"/home/user/dev/pytorch/build/lib/libcaffe2_hip.so".
Call Stack (most recent call first):
cmake_install.cmake:86 (include)
Ubuntu 16.04
Kernel 4.13.0-45-generic
build inside docker does not work.
Steps to reproduce the behavior:
Build from sources section of ./rocm-docs/caffe2-build.md
error:
caffe2/CMakeFiles/caffe2.dir/build.make:4182: recipe for target 'caffe2/CMakeFiles/caffe2.dir/contrib/aten/aten_op.cc.o' failed
make[2]: *** [caffe2/CMakeFiles/caffe2.dir/contrib/aten/aten_op.cc.o] Error 254
We are observing consistent performance drops for Resnet50 and Resnet101 with PyTorch on both Vega20 and MI25. MIOpen commit details below.
MIOpen Commit Details
commit 74782da0cf9b1dff8ea6dcfe14e450a3531359d1
Author: Daniel Lowell [email protected]
Date: Mon Dec 17 16:53:33 2018 -0600
Removed redundant else condition.
Steps to reproduce the behavior:
GPU's observed: MI25, Vega20
ROCm Version: 1.9.307, 1.9.211
... the following works
>>> import caffe2.python._import_c_extension as C
>>> C.has_hip_support
True
>>> from caffe2.python import core, workspace, brew
>>> workspace.NumGpuDevices()
4
MIOpen RNN ParamAccess Op segfaults.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.