Git Product home page Git Product logo

rx580-rocm-tensorflow-ubuntu20.4-guide's People

Contributors

boriswinner avatar grench6 avatar nicerwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rx580-rocm-tensorflow-ubuntu20.4-guide's Issues

apt install rocm-dkms attempt build but fails

There is an error while compiling rocm-dkms (but why is it being compiled? I thought it was prebuilt) and here is the log.

DKMS make.log for amdgpu-3.5-32 for kernel 5.4.0-91-generic (x86_64)
lun 27 dic 2021, 19:40:04, CET
make: Entering directory '/usr/src/linux-headers-5.4.0-91-generic'
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/main.o
AR /var/lib/dkms/amdgpu/3.5-32/build/built-in.a
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/symbols.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_mn.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_memory.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_ioctl.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_device_cgroup.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_drm_cache.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/scheduler/sched_main.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_drm.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/scheduler/sched_fence.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_memory.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_drv.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_tt.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_fence_array.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_fence.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_io.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_kthread.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_device.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/scheduler/sched_entity.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_mm.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_pci.o
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_fence.c:29:1: warning: ‘dma_fence_test_signaled_any’ defined but not used [-Wunused-function]
29 | dma_fence_test_signaled_any(struct dma_fence **fences, uint32_t count,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_bo.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_perf_event.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_kms.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_reservation.o
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_pci.c: In function ‘amdkcl_pci_init’:
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_pci.c:102:84: warning: passing argument 2 of ‘amdkcl_fp_setup’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
102 | _kcl_pcie_link_speed = (const unsigned char *) amdkcl_fp_setup("pcie_link_speed", _kcl_pcie_link_speed_stub);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_pci.c:3:
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_common.h:23:63: note: expected ‘void *’ but argument is of type ‘const unsigned char *’
23 | static inline void *amdkcl_fp_setup(const char *symbol, void *fp_stup)
| ~~~~~~^~~~~~~
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/dma-resv.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_bo_util.o
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_reservation.c: In function ‘amdkcl_reservation_init’:
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_reservation.c:58:10: warning: passing argument 2 of ‘amdkcl_fp_setup’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
58 | &_kcl_reservation_seqcount_string_stub);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_reservation.c:32:
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_common.h:23:63: note: expected ‘void ’ but argument is of type ‘const char ()[21]’
23 | static inline void *amdkcl_fp_setup(const char *symbol, void *fp_stup)
| ~~~~~~^~~~~~~
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_bo_vm.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_suspend.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_module.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_workqueue.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_seq_file.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/kcl_connector.o
LD [M] /var/lib/dkms/amdgpu/3.5-32/build/scheduler/amd-sched.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_execbuf_util.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_atombios.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/atombios_crtc.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_connectors.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_page_alloc.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/atom.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_bo_manager.o
LD [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdkcl/amdkcl.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_agp_backend.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/ttm_page_alloc_dma.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_fence.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_ttm.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_object.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_gart.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_encoders.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_display.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_i2c.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_fb.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_gem.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_ring.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_cs.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_bios.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_benchmark.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_test.o
LD [M] /var/lib/dkms/amdgpu/3.5-32/build/ttm/amdttm.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_pm.o
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_fb.c: In function ‘amdgpufb_create’:
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_fb.c:253:14: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
253 | info->fbops = &amdgpufb_ops;
| ^
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/atombios_dp.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_afmt.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_trace_points.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/atombios_encoders.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_sa.o
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_bios.c: In function ‘amdgpu_read_platform_bios’:
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_bios.c:200:9: error: implicit declaration of function ‘pci_platform_rom’ [-Werror=implicit-function-declaration]
200 | bios = pci_platform_rom(adev->pdev, &size);
| ^~~~~~~~~~~~~~~~
/var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_bios.c:200:7: warning: assignment to ‘uint8_t *’ {aka ‘unsigned char *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
200 | bios = pci_platform_rom(adev->pdev, &size);
| ^
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/atombios_i2c.o
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_dma_buf.o
cc1: some warnings being treated as errors
CC [M] /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_vm.o
make[2]: *** [scripts/Makefile.build:270: /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu/amdgpu_bios.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [scripts/Makefile.build:519: /var/lib/dkms/amdgpu/3.5-32/build/amd/amdgpu] Error 2
make: *** [Makefile:1762: /var/lib/dkms/amdgpu/3.5-32/build] Error 2
make: Leaving directory '/usr/src/linux-headers-5.4.0-91-generic'

Any suggestion? Thanks in advance.

librccl.so.1: No such file or directory

I followed your instruction on Ubuntu 20.4.1, but when I test python code, I got the below error:

ImportError: librccl.so.1: cannot open shared object file: No such file or directory

Include the installed numpy version should be 1.19.5 and protobuf==3.20.*

When I tried to import tensorflow, I got the following error:

Python 3.8.10 (default, Mar 13 2023, 10:26:41) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 53, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/core/framework/graph_pb2.py", line 16, in <module>
    from tensorflow.core.framework import function_pb2 as tensorflow_dot_core_dot_framework_dot_function__pb2
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/home/refeed/.local/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
>>> 

The solution for the above error is to do pip uninstall numpy && pip install numpy==1.19.5

After that I got the below error:

Python 3.8.10 (default, Mar 13 2023, 10:26:41) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
/home/refeed/.local/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py:513: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
  np.object,
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/python/__init__.py", line 64, in <module>
    from tensorflow.python.framework.framework_lib import *  # pylint: disable=redefined-builtin
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/python/framework/framework_lib.py", line 25, in <module>
    from tensorflow.python.framework.ops import Graph
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 54, in <module>
    from tensorflow.python.framework import dtypes
  File "/home/refeed/.local/lib/python3.8/site-packages/tensorflow/python/framework/dtypes.py", line 513, in <module>
    np.object,
  File "/home/refeed/.local/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'object'.
`np.object` was a deprecated alias for the builtin `object`. To avoid this error in existing code, use `object` by itself. Doing this will not modify any behavior and is safe. 
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

For that, I solved it by using pip uninstall numpy && pip install numpy==1.19.5

I guess that's caused by deps in the old tensorflow doesn't have version boundary. Might be good to include this

apt-key certificate not valid

I have to use
wget -q -O - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
instead of
wget -q -O - http://repo.radeon.com/rocm/apt/3.5.1/rocm.gpg.key | sudo apt-key add -
otherwise I get an error, when doing apt update, saying that the certificate is not valid.

Maybe you can help me on another problem as well. My default kernel is 5.4.0-81 and I cannot downgrade to 5.4.0-42. If I do so, my keyboard stops working and my second screen stays black. But I think this is why I can't get ROCm to work.

how to open video card permission to all users

hello, there
I followed your guide and installed ROCm on my server, I was trying to find a way to apply the same function these two commands do to every user on this server(even those who will join in the future).

sudo usermod -a -G video $LOGNAME
sudo usermod -a -G render $LOGNAME

I don't know much about Linux, please help me.

ROCk module is NOT loaded, possibly no GPU devices

Hi
I'm kernel version is 5.4.0-53-generic.
When I run sudo apt dist-upgrade this command, the kernel will upgrade to 5.4.0-88-generic, and then rocm will install fail.

So I locked kernel version on 5.4.0-53-generic version, and install rocm and success!
But I try to run rocminfo this command will get the error like below:

ROCk module is NOT loaded, possibly no GPU devices
Unable to open /dev/kfd read-write: No such file or directory
user is member of video group
hsa api call failure at: /src/rocminfo/rocminfo.cc:1142
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

Any something I missed it?
Is need to install RX580 driver first?

Thanks!!!

Issue with upgrading to kernel 5.4.0-42-generic

I've been looking around (since this isn't an issue with the guide) but couldn't find an answer to my question.

When installing Ubuntu 20.04.3 fresh, I start with kernel 5.11.0-34-generic. Then after following through the steps, I have no other available kernels in the grub boot menu. I'm fairly new to Ubuntu/Linux, so I have no idea what else to do.

I have a RX 580 of course, i5 9400F, gigabyte z390 SLI.

Kernel restarts in Spyder and Jupyter notebook.

The installation worked as expected for few days but later the kernel restarted while trying to train models like resent. I did a fresh installation of Ubuntu and then rocm but issue still remains. Interesting thing is the benchmark test works as expected with same tensorflow-rocm with GPU. Benchmark script also works in Spyder.

This is the last line of running my own resent model as python script ( without spyder or jupyter-notebook).

2021-09-22 00:27:52.519822: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so 1/114 [..............................] - ETA: 0s - loss: 66690.9219 - accuracy: 0.0312Memory access fault by GPU node-1 (Agent handle: 0x5cd9760) on address 0x500000000. Reason: Page not present or supervisor privilege. Aborted (core dumped)

While benchmark test gives this output. The overall images/second values is also low. Could this be a hardware issue?

`Done warm up
Step Img/sec total_loss
1 images/sec: 43.8 +/- 0.0 (jitter = 0.0) 7.731
10 images/sec: 43.8 +/- 0.1 (jitter = 0.0) 8.055
20 images/sec: 43.8 +/- 0.0 (jitter = 0.0) 7.789
30 images/sec: 43.8 +/- 0.0 (jitter = 0.1) 7.974
40 images/sec: 43.8 +/- 0.0 (jitter = 0.1) 7.578
50 images/sec: 43.8 +/- 0.0 (jitter = 0.1) 7.568
60 images/sec: 43.9 +/- 0.0 (jitter = 0.1) 7.821
70 images/sec: 43.9 +/- 0.0 (jitter = 0.1) 7.802
80 images/sec: 43.9 +/- 0.0 (jitter = 0.1) 7.790
90 images/sec: 43.9 +/- 0.0 (jitter = 0.1) 8.034
100 images/sec: 43.9 +/- 0.0 (jitter = 0.1) 8.026

total images/sec: 43.86

`

it works on terminal but doesn't work on jupyter notebook

I installed rocm-tensorflow in anaconda but when I use rocm in jupyter notebook an error occurs:

 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20210914191622 (127.0.0.1) 1.620000ms referer=http://localhost:8888/notebooks/dlaicourse/Hello_World_Layers.ipynb
2021-09-14 19:24:48.338830: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libhip_hcc.so'; dlerror: libhip_hcc.so: cannot open shared object file: No such file or directory
2021-09-14 19:24:48.338858: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Failed precondition: Could not load dynamic library 'libhip_hcc.so'; dlerror: libhip_hcc.so: cannot open shared object file: No such file or directory

But it worked well in the terminal. I also exported lib_hcc.so and changed the bashrc file... can anyone tell me how to use it in jupyter notebook? Thanks very much.

In hcc_docs it tells me to clone the GPU module to a new Anaconda environment:

module load anaconda
conda activate tensorflow-gpu-1.14-custom
conda install <packages>

But when I type this on the terminal the result is module is not defined.

GPG Key error

After adding repository, GPG Key error is show while running sudo apt update. I think its the issue from rocm as I tried same steps for 3.7 repository too and got same error.

Get:11 http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease [1,819 B] Err:11 http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease The following signatures were invalid: EXPKEYSIG 9386B48A1A693C5C James Adrian Edwards (ROCm Release Manager) <[email protected]> Reading package lists... Done W: GPG error: http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease: The following signatures were invalid: EXPKEYSIG 9386B48A1A693C5C James Adrian Edwards (ROCm Release Manager) <[email protected]> E: The repository 'http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details.
PS: I downgraded kernel from 5.11.0 to 5.4.0-42-generic.

Precision never increase on Radeon R9 FURY

Hello,
Sorry to trouble you again, maybe someone can help me.
I use a Fiji [Radeon R9 FURY / NANO Series] with ROCm 3.5.1
I tried several models but precision does not increase.
If I execute the same code on CPU using with tf.device('/CPU:0'), precision increase but It never increases if I use GPU.

Anybody has the same problem ?

no pubkey available

$ sudo apt update
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:2 http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease [1.819 B]         
Hit:3 http://es.archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://es.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Err:2 http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 9386B48A1A693C5C
Get:5 http://es.archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Reading package lists... Done     
W: GPG error: http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 9386B48A1A693C5C
E: The repository 'http://repo.radeon.com/rocm/apt/3.5.1 xenial InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.