rocmsoftwareplatform / pytorch Goto Github PK

This project forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

License: Other

Shell 0.38% Python 55.87% CMake 0.71% Makefile 0.01% C++ 36.22% C 1.51% Cuda 3.04% Objective-C 0.03% Objective-C++ 1.26% CSS 0.01% HTML 0.01% Batchfile 0.02% Dockerfile 0.05% PowerShell 0.01% Java 0.12% Assembly 0.30% Ruby 0.01% Starlark 0.29% GLSL 0.18% GDB 0.01%

pytorch rocm

pytorch's Issues

[Caffe2] Docker image and Documentation

The goal is to create an all-in-one docker image and documentation for Pytorch/Caffe2.

The requirement:

Target audience: noob user who does not know anything.
Coverage: Include build and install instructions for C2 as well as its dependencies.
Assumption: that the user will have ROCm 1.8 installed on their system on Vega10.

[Caffe2] Update pre-hipified files (ops)

Update pre-hipified files (ops)

[Caffe2] Enable core test on MIOPEN/HIP upstream

Enable core tests on MIOPEN/HIP upstream

[Caffe2] HIP operated tests update

The goal is to

update Caffe2 core tests
update Caffe2 operator tests
fix failing tests
add the passing tests to pytorch/Caffe2 CI

Building PyTorch for ROCm instructions

The latest ROCm PyTorch docker image (rocm/pytorch:rocm2.1) appears not to include Boost, resulting in a build failure when following the recommended build process at https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm.

After manually installing Boost, PyTorch builds successfully.

[Pytorch] Tensor tutorial examples hang

🐛 Bug

Trying to execute this examples from Pytorch tutorial (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors, https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-defining-new-autograd-functions) hangs after first iteration. Others like this work https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors-and-autograd

# -*- coding: utf-8 -*-

import torch


dtype = torch.float
device = torch.device("cuda:0")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

Environment

- pytorch build on top of docker image rocm/pytorch:rocm1.9.2
- PyTorch version: 1.0.0a0+ee1f7b8
- OS: Ubuntu 18.04.1 LTS
- GPU VegaFE, amdgpu, 1.9-307, 4.15.0-42-generic, x86_64: installed
- CMake version: version 3.6.3
- Python version: 2.7

[Caffe2] Investigate the performance gap between Caffe2 and Tensorflow port

The goal is to eliminate the overhead in the current implementation. Expect to see on par performance between different frameworks.

[Caffe2] Revisit skip tests

[Caffe2] upstream tracker

The goal is to create a tracker on the upstream project.
Whenever anything cu. change (a potential break to pyHipify process), we will get notified.

Landing Zone

must-have: daily track
nice-to-have: real-time tracking

[Caffe2] rocprim ops

The goal is to enable rocprim ops on the project. Also, create the new rule in pyhipify.

Better Provisioning for AMD CI node

The goal is to develop a strategy to properly provisioning AMD CI node for Pytorch/Caffe2.

Recently one of the CI workers was silently updated to Linux kernel 4.15 and hence cripple rocm stack.
One of the unit tests is failing due to this and gate upstream PR merge. As we are adding more AMD nodes to the CI pool; need to develop a protocol to provision those nodes.

Minor format error terminates the unit-test run.

🐛 Bug

To Reproduce

Use option 3 from here https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm
In step 5, there's a choice (not sure why) to use one of two repos. Use the first one.
git clone https://github.com/pytorch/pytorch.git or
git clone https://github.com/ROCmSoftwarePlatform/pytorch.git
Once built, run the unit-tests with the command below.

PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py –verbose

======================================================================
FAIL: test_print (test_torch.TestTorch)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/pytorch/test/test_torch.py", line 8906, in test_print
    self.assertExpectedInline(str(x), '''tensor([1.0000e+28, 1.0000e-28])''')
  File "/data/pytorch/test/expecttest.py", line 195, in assertExpectedInline
    self.assertMultiLineEqual(expect, actual, msg=help_text)
AssertionError: 'tensor([1.0000e+28, 1.0000e-28])' != 'tensor([1.00000e+28, 1.00000e-28])'
- tensor([1.0000e+28, 1.0000e-28])
+ tensor([1.00000e+28, 1.00000e-28])
?               +        +
 : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)

[Caffe2] Enable ROCM support for pytorch examples/usages

https://github.com/pytorch/examples

[Caffe2] LSTM benchmark

run LSTM benchmark on ROCM

[PyTorch] Docker Image and Documentation

Create documentation for the open source community on how to start working with the ROCm PyTorch Docker Image. Provide a clear summary on what's supported at the moment and what's on the agenda.

Distributed Traning

The goal is to evaluate the high-level approach to enable Pytorch/caffe2 distributed training

Build error: THCStorage.cu:4:10: fatal error: 'thrust/device_ptr.h' file not found

🐛 Bug

I'm trying to build pytorch for ROCm, and it fails with this log:

[ 67%] Building CXX object modules/module_test/CMakeFiles/caffe2_module_test_dynamic.dir/module_test_dynamic.cc.o
[ 67%] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o
[ 67%] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state_dlpack.cc.o
[ 67%] Building HIPCC object caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCStorage.cu.o
/home/john/git/pytorch-rocmfork/aten/src/THC/THCStorage.cu:4:10: fatal error: 'thrust/device_ptr.h' file not found
#include <thrust/device_ptr.h>
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
/home/john/git/pytorch-rocmfork/aten/src/THC/THCStorage.cu:4:10: fatal error: 'thrust/device_ptr.h' file not found
#include <thrust/device_ptr.h>
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
CMake Error at /opt/rocm/hip/cmake/FindHIP/run_make2cmake.cmake:18 (file):
file failed to open for reading (No such file or directory):
/home/john/git/pytorch-rocmfork/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCStorage.cu.o.depend.pre
CMake Error at caffe2_hip_generated_THCStorage.cu.o.cmake:134 (message):
Error generating
/home/john/git/pytorch-rocmfork/build/caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/./caffe2_hip_generated_THCStorage.cu.o

make[2]: *** [caffe2/CMakeFiles/caffe2_hip.dir/build.make:22908: caffe2/CMakeFiles/caffe2_hip.dir/__/aten/src/THC/caffe2_hip_generated_THCStorage.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3395: caffe2/CMakeFiles/caffe2_hip.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

To Reproduce

Steps to reproduce the behavior:

Install ROCm / libraries from apt repo
install rocmSPARSE and hipSPARSE from source
clone rocm pytorch repo
python3 tools/amd_build/build_pytorch_amd.py
USE_ROCM=1 python3 setup.py build

Expected behavior

I expect pytorch to compile.

Environment

Ubuntu 18.10.
-- Using python found in /usr/bin/python3
HIP VERSION: 1.5.18353

***** Library versions from dpkg *****

hsakmt-roct VERSION: 1.0.9-8-g238782c
hsakmt-roct-dev VERSION: 1.0.9-8-g238782c
hsa-ext-rocr-dev VERSION: 1.1.9-9-ge4ab040
hsa-rocr-dev VERSION: 1.1.9-9-ge4ab040
hcc VERSION: 1.2.18354
hip_base VERSION: 1.5.18353
hip_hcc VERSION: 1.5.18353

***** Library versions from cmake find_package *****

rocrand VERSION: 1.8.1
hiprand VERSION: 1.8.1
rocblas VERSION: 0.14.2.4
miopen VERSION: 1.5.0-e1f0433
miopengemm VERSION: 1.1.5-9547fb9
rocfft VERSION: 0.8.6.0
hipsparse VERSION: 0.1.3.2
rocsparse VERSION: 0.1.3.2
ROCm is enabled.

NOTE: I do NOT have the ubuntu package libthrust-dev installed.. if I do, then it fails with other errors saying cuda isn't installed.

Additional context

Have at least some rocm demos working on my HP EliteBook 745 G5, Ryzen 5 PRO 2500U. Thought I'd see if I can use pytorch with rocm yet for my ML projects.

Building PyTorch with ROCm

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

I'm trying to build PyTorch to run on ROCm (Ubuntu 18.04) and am having issues. I tried the following.

I followed https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm but it seems to have failed at pyyaml (https://gist.github.com/briansp2020/114bd75ff0182197cf7efc7af265e89c)
I got over the error by installing wheel. However, the build still failed later (https://gist.github.com/briansp2020/2719353d626968082410011dc36608cf)
I tried build it in tensorflow docker and I get https://gist.github.com/briansp2020/2a109c0f1d40b45299cb73a76a255767

It seems the wiki is old and I needed to get latest rocSPARSE (https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases) to get past the CMake phase. Unfortunately, build still failed(https://gist.github.com/briansp2020/52047cf73d8d59ddd72f730d779b952c)...

Do you have up to date instruction on how to build PyTorch with ROCm? My goal is to run fast.ai on Vega FE with ROCm.

Thanks!

[Caffe2] GPU memory access fault while running OverFeat benchmark for batch size 1 and 4

python ../../caffe2/python/convnet_benchmarks.py --batch_size 1 --model OverFeat --net_type simple --layer_wise_benchmark True 2>&1 | tee caffe2_overfeat_bs1.txt
I0801 15:38:58.072324 30959 net_simple.cc:101] Starting benchmark.
I0801 15:38:58.072343 30959 net_simple.cc:102] Running warmup runs.
Memory access fault by GPU node-1 on address 0x524400000. Reason: Page not present or supervisor privilege.
OverFeat: running forward-backward.
*** Aborted at 1533163139 (unix time) try "date -d @1533163139" if you are using GNU date ***
PC: @ 0x7f307508d428 gsignal
*** SIGABRT (@0x3e8000078ef) received by PID 30959 (TID 0x7f3023c7c700) from PID 30959; stack trace: ***
@ 0x7f3075433390 (unknown)
@ 0x7f307508d428 gsignal
@ 0x7f307508f02a abort
@ 0x7f30423e5155 (unknown)
@ 0x7f30423ebafd (unknown)
@ 0x7f30423b6817 (unknown)
@ 0x7f30754296ba start_thread
@ 0x7f307515f41d clone
@ 0x0 (unknown)

[Pytorch] AMD GPUs benchmarks

Hi. Thanks for this work guys.

I was curious as to whether you had been able to bench the framework on amd gpus ? I've successfully build pytorch with rocm support following your instructions, and the benchs I got don't seem right. I'm testing with a Radeon 580, which should be like half the performance as 1080 Ti, and I'm seeing more like 9-10 times drop in performances on convolution. The tensorflow benchs already show that the gap shouldn't be that wide.

Is this supposed to be normal for the moment ?

[Caffe2] Investigate cross entropy ops

enable model tests

[PyTorch] Update sccache to support HCC

In order to improve build times, I've extended sccache to support HCC: https://github.com/Jorghi12/sccache. Once a full enumeration of the HCC command line arguments are obtained, I'll send a more robust solution upstream into Mozzila's sccache branch.

[Caffe2] Enable fp16 tests upstream

Enable fp16 tests upstream

Building PyTorch w/o Docker ?

Hi, Im trying to get my AMD system set up to run some torch software , I prefer not to have to mess with Docker, is there a reason to do this ?

Is there a way to build this w/o docker?

"torch._C._cuda_getDevice()" fails in Python3 but succeeds in Python2

Python3

root@0e76836e0bcf:/data/pytorch/examples/rl_a3c_pytorch# python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch._C._cuda_getDevice()
THCudaCheck FAIL file=/pytorch/torch/csrc/cuda/Module.cpp line=53 error=35 : CUDA driver version is insufficient for CUDA runtime version
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /pytorch/torch/csrc/cuda/Module.cpp:53
>>> quit()

Python2

root@0e76836e0bcf:/data/pytorch/examples/rl_a3c_pytorch# python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch._C._cuda_getDevice()
0L
>>> quit()

Python3 was built as follows:

First installed the following:

apt-get install python3-dev
apt-get install -y python3-pip
alias python=python3

then built with no changes

.jenkins/pytorch/build.sh

Multi-GPU support

Currently, we do not support multi-GPU on ROCm, nor do we assume it works right now. This issue is tracking insights and progress as we are trying to enable this.

[Caffe2] Update on MIOPEN pooling ops

global pooling fix
perf enhancment

[Caffe2] Cudnn operator update

The goal is to review cudnn implementation of ops and implement miopen version if applicable

affine_channel
sigmoid/tanh
transpose
dropout
depthwise_3x3_conv

Need recipe for integrating custom CUDA kernels

📚 Documentation

The PyTorch-based Faster R-CNN model use a few special CUDA kernels such as NMS, ROI_Pooing, ROI_Align and ROI_Crop.

The integration steps under CUDA are available here

For ROCm integration, I'm guessing the first step is hipification.

/opt/rocm/hip/bin/hipify-perl nms_cuda_kernel.cu > nms_hip_kernel.cpp

What's next? PyTorch 1.0 related packaging?
Kindly provide instructions for the rest of the steps for PyTorch 1.0

The instructions that would perhaps replace the code snippets below.

from torch.utils.cpp_extension import CUDAExtension

I've tried a few things but the above import seem hard-wired to CUDA.

if torch.cuda.is_available() and CUDA_HOME is not None:
     extension = CUDAExtension
     sources += source_cuda
     define_macros += [("WITH_CUDA", None)]
     extra_compile_args["nvcc"] = [
         "-DCUDA_HAS_FP16=1",
         "-D__CUDA_NO_HALF_OPERATORS__",
         "-D__CUDA_NO_HALF_CONVERSIONS__",
         "-D__CUDA_NO_HALF2_OPERATORS__",
     ]

   ext_modules = [
        extension(
            "model._C",
            sources,
            include_dirs=include_dirs,
            define_macros=define_macros,
            extra_compile_args=extra_compile_args,
        )
    ]

[PyTorch] Re-enable ATen ROCm tests.

PyTorch ROCm for a while had the ATen ROCm tests disabled https://github.com/ROCmSoftwarePlatform/pytorch/blob/master/caffe2/CMakeLists.txt#L341

Basically, re-enable them and handle any HCC related issues that come up in the process of doing so.

[PyTorch] Integrate PyTorch with the Radeon Compute Profiler.

Currently, the PyTorch profiler is disabled when building with ROCm. In the future, we'd like to start using the profiler here https://github.com/GPUOpen-Tools/RCP.

Cannot train on gfx803

🐛 Bug

Compiling PyTorch in the rocm/pytorch:rocm2.1 docker, I'm getting a ton of warning: loop not unrolled printing out. I don't see them in any of your CI output or other snippets posted here, so I wondered if this might be the reason for my problems. I have three tests failing, two with errors similar to another open issue, and neural network training isn't working for me.

In the PyTorch beginning tutorial, there are no errors, but the network is clearly not being trained:

[1,  2000] loss: 2.304
[1,  4000] loss: 2.303
[1,  6000] loss: 2.303
[1,  8000] loss: 2.303
[1, 10000] loss: 2.303
[1, 12000] loss: 2.304
[2,  2000] loss: 2.303
[2,  4000] loss: 2.303
[2,  6000] loss: 2.303
[2,  8000] loss: 2.304
[2, 10000] loss: 2.304
[2, 12000] loss: 2.303
Finished Training

Just to be clear, the loss function should converge towards 1.0, and does when run via CPU.

My PyTorch is at least partly working - I've been using it to run https://github.com/xinntao/ESRGAN, and the results are clearly superior to running via CPU. I have no idea if I'm doing something wrong with the compile or there's a bug somewhere, but it seems to be training rather than executing that is broken.

Environment

rocm/pytorch:rocm2.1 docker after apt full-update. Host: Ubuntu 18.10, Ryzen 5 1600x, 16GB RAM. I've tried both lowering MAX_JOBS and creating a large swap file to avoid memory issues, but none of that affects the errors.

Here's everything from your environment script that got a value:

PyTorch version: 1.1.0a0+c751cf8
Is debug build: No

OS: Ubuntu 16.04.5 LTS
CMake version: version 3.6.3

Python version: 2.7
Is CUDA available: Yes

Versions of relevant libraries:
[pip] numpy==1.15.4
[pip] torch==1.1.0a0+c751cf8
[pip] torchvision==0.2.1

GPU

R9 Fury, target gfx803. I wonder if using an older, non-default target may be part of my problem. I understand older GPUs naturally receive less focus, though I hope you'll be able to look at it if there is a gfx803 issue.

Output

Example warning:

In file included from /data/development/rocm-pytorch/aten/src/THH/THHTensorSort.cuh:8:
/data/development/rocm-pytorch/aten/src/THH/THHSortUtils.cuh:141:1: 
warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]

Test Output:

======================================================================
FAIL: test_broadcast_batched_matmul (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/development/rocm-pytorch/test/common_utils.py", line 296, in wrapper
    method(*args, **kwargs)
  File "/data/development/rocm-pytorch/test/test_cuda.py", line 2218, in test_broadcast_batched_matmul
    _TestTorchMixin._test_broadcast_batched_matmul(self, lambda t: t.cuda())
  File "/data/development/rocm-pytorch/test/test_torch.py", line 3760, in _test_broadcast_batched_matmul
    verify_batched_matmul(*indices)
  File "/data/development/rocm-pytorch/test/test_torch.py", line 3752, in verify_batched_matmul
    self.assertEqual(truth, maybe_squeeze_result(l, r, out))
  File "/data/development/rocm-pytorch/test/common_utils.py", line 427, in assertEqual
    assertTensorsEqual(x, y)
  File "/data/development/rocm-pytorch/test/common_utils.py", line 408, in assertTensorsEqual
    self.assertTrue(torch.equal(nan_mask, torch.isnan(b)), message)
AssertionError: False is not true : 

======================================================================
FAIL: test_broadcast_fused_matmul (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/development/rocm-pytorch/test/common_utils.py", line 296, in wrapper
    method(*args, **kwargs)
  File "/data/development/rocm-pytorch/test/test_cuda.py", line 2215, in test_broadcast_fused_matmul
    _TestTorchMixin._test_broadcast_fused_matmul(self, lambda t: t.cuda())
  File "/data/development/rocm-pytorch/test/test_torch.py", line 3689, in _test_broadcast_fused_matmul
    self.assertEqual(r0, r1)
  File "/data/development/rocm-pytorch/test/common_utils.py", line 427, in assertEqual
    assertTensorsEqual(x, y)
  File "/data/development/rocm-pytorch/test/common_utils.py", line 419, in assertTensorsEqual
    self.assertLessEqual(max_err, prec, message)
AssertionError: tensor(9., device='cuda:0', dtype=torch.float32) not less than or equal to 1e-05 : 

======================================================================
FAIL: test_randperm_cuda (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/development/rocm-pytorch/test/common_utils.py", line 296, in wrapper
    method(*args, **kwargs)
  File "/data/development/rocm-pytorch/test/test_cuda.py", line 2513, in test_randperm_cuda
    self.assertEqual(res1, res2, 0)
  File "/data/development/rocm-pytorch/test/common_utils.py", line 427, in assertEqual
    assertTensorsEqual(x, y)
  File "/data/development/rocm-pytorch/test/common_utils.py", line 419, in assertTensorsEqual
    self.assertLessEqual(max_err, prec, message)
AssertionError: tensor(9223372036854775492, device='cuda:0') not less than or equal to 0 : 

----------------------------------------------------------------------
Ran 150 tests in 7.430s

FAILED (failures=3, skipped=92)

Evaluate clang5 v.s. clang 3.8 for pytorch

[Caffe2] MIOpen RNN Integration

The goal is to integrate MIOpen RNN APIs into caffe2.

[PyTorch] Achieve working fp16 support

Identify and fix outstanding issues with fp16.

[Detectron] Incorrect use of max and abs

After enabling Detectron files hipification and building in PR #295, there are warnings while building the project from the following files:
sigmoid_focal_loss_op_hip.cc
ps_roi_pool_op_hip.cc
smooth_l1_loss_op_hip.cc
The warnings are because HIP does not overload max and abs functions.

Please add appropriate checks #if defined (__HIP_PLATFORM_HCC__) and use more specific HIP functions like fmaxf and fabsfin the corresponding CUDA files.

[Caffe2] Update pre-hipified files (core)

Update pre-hipified files (core)

[Caffe2] Centos support

[PyTorch] Investigate early runtime error.

Currently, when using PyTorch with ROCm, you'll notice the following error:

import torch
torch.Tensor(1).cuda()

RuntimeError: torch.cuda.sparse.FloatTensor is not enabled.

However, the error disappears by executing torch.cuda._lazy_init() very early.

import torch
torch.cuda._lazy_init()
torch.Tensor(1).cuda()
tensor([ 0], device='cuda:0')

[Caffe2] Python API change upstream

update Python API upstream

Why don't we have a PyTorch-ROCM pip package like tensorflow?

🚀 Feature

PyTorch ROCM package as a pip package like tensorflow-rocm would be great.

Motivation

While it is already a pain for newer users to get things up and running, pytorch installation for rocm platform is just a lot for newer users. Since there is a tensorflow-rocm package for new users to easily download and install, I think PyTorch should have it too for the users who prefer pytorch over tensorflow.

Pitch

PyTorch ROCM package as a pip package like tensorflow-rocm would be great.

Alternatives

A conda package would be great too

Aten missing files for caffe2_hip

Issue description

Aten appears to me missing files for caffe2_hip and prevents the installation of torch

In file included from /home/user/dev/pytorch/aten/src/THC/THCTensorIndex.cu:12:
/home/user/dev/pytorch/aten/src/THC/THCAtomics.cuh:145:35: error: static declaration of 'atomicAdd' follows non-static declaration
static inline device void atomicAdd(double address, double val) { }
^
/opt/rocm/hip/include/hip/hcc_detail/hip_atomic.h:73:8: note: previous definition is here
double atomicAdd(double address, double val)
^
1 error generated.
[100%] Linking HIP shared library ../lib/libcaffe2_hip.so
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THC/caffe2_hip_generated_THCTensorIndex.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THC/caffe2_hip_generated_THCTensorScatterGather.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_FeatureLPPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_IndexLinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialAdaptiveAveragePooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialAdaptiveMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialClassNLLCriterion.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialFractionalMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialGridSamplerBilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialReflectionPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialReplicationPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialSubSampling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialUpSamplingBilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_SpatialUpSamplingNearest.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalReflectionPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalReplicationPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalUpSamplingLinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_TemporalUpSamplingNearest.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricAdaptiveAveragePooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricAdaptiveMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricAveragePooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricDilatedMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricFractionalMaxPooling.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricGridSamplerBilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricReplicationPadding.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricUpSamplingNearest.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/THCUNN/caffe2_hip_generated_VolumetricUpSamplingTrilinear.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_Activation.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_Distributions.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_EmbeddingBag.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_Gesv.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_SummaryOps.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/cuda/caffe2_hip_generated_TensorCompare.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/sparse/cuda/caffe2_hip_generated_SparseCUDATensor.cu.o'
clang-7.0: error: no such file or directory: 'CMakeFiles/caffe2_hip.dir//aten/src/ATen/native/sparse/cuda/caffe2_hip_generated_SparseCUDATensorMath.cu.o'
[100%] Built target caffe2_hip
Install the project...

CMake Error at caffe2/cmake_install.cmake:69 (file):
file INSTALL cannot find
"/home/user/dev/pytorch/build/lib/libcaffe2_hip.so".
Call Stack (most recent call first):
cmake_install.cmake:86 (include)

System Info

Ubuntu 16.04
Kernel 4.13.0-45-generic

build from source inside docker does not work.

🐛 Bug

build inside docker does not work.

To Reproduce

Steps to reproduce the behavior:
Build from sources section of ./rocm-docs/caffe2-build.md

Dump Gist

error:

caffe2/CMakeFiles/caffe2.dir/build.make:4182: recipe for target 'caffe2/CMakeFiles/caffe2.dir/contrib/aten/aten_op.cc.o' failed
make[2]: *** [caffe2/CMakeFiles/caffe2.dir/contrib/aten/aten_op.cc.o] Error 254

[Caffe2] Update MIOPEN ops (for release 1.5)

Update MIOPEN ops (for release 1.5)

[Caffe2] Enable MIOPEN tests upstream (for CI)

Enable MIOPEN tests upstream (for CI)

PyTorch Performance Drop for Resnet50 and Resnet101

🐛 Bug

We are observing consistent performance drops for Resnet50 and Resnet101 with PyTorch on both Vega20 and MI25. MIOpen commit details below.

MIOpen Commit Details
commit 74782da0cf9b1dff8ea6dcfe14e450a3531359d1
Author: Daniel Lowell [email protected]
Date: Mon Dec 17 16:53:33 2018 -0600

Removed redundant else condition.

To Reproduce

Steps to reproduce the behavior:

Load up docker image lcskrishna/rocm-pytorch pfl-1.9.2
Build and install MIOpen in the docker as per the commit details provided above
Run Resnet50 with Batch-size 64
Re-build MIOpen with a commit 1 week old (say Dec 12). Try running the same benchmark again

GPU's observed: MI25, Vega20
ROCm Version: 1.9.307, 1.9.211

[caffe2] For those sorting out GPU support and number of GPUs...

📚 Documentation

... the following works

>>> import caffe2.python._import_c_extension as C
>>> C.has_hip_support
True
>>> from caffe2.python import core, workspace, brew
>>> workspace.NumGpuDevices()
4

[Caffe2][MIOpen] RNN ParamAcessOp segfault

MIOpen RNN ParamAccess Op segfaults.

[PyTorch] Complete rocRAND Integration.

Currently, rocRAND is only partially integrated with PyTorch. However, it still remains nonfunctional at the time being. This PR is for ensuring that the rocRAND integration successfully achieves 100% of tests passing. This will require coordination with rocRAND lead developers @jszuppe & @ex-rzr.

rocmsoftwareplatform / pytorch Goto Github PK

pytorch's Issues

🐛 Bug

Environment

🐛 Bug

To Reproduce

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

📚 Documentation

🐛 Bug

Environment

GPU

Output

🚀 Feature

Motivation

Pitch

Alternatives

Issue description

System Info

🐛 Bug

To Reproduce

🐛 Bug

To Reproduce

📚 Documentation

Recommend Projects

Recommend Topics

Recommend Org