nvidia / yum-packaging-precompiled-kmod Goto Github PK
View Code? Open in Web Editor NEWNVIDIA precompiled kernel module packaging for RHEL
License: Apache License 2.0
NVIDIA precompiled kernel module packaging for RHEL
License: Apache License 2.0
Currently feature exists to lock to a XXX driver branch for precompiled. Add equivalent streams for dkms (non-precompiled).
I am currently on the latest precompiled kmod available for 8.9 that was released on 4/2
Kernel version: 4.18.0-513.24.1.el8_9.x86_64
Nvidia driver: 550.54.15
RHEL 8 has gone into maintenance mode with kernel version 4.18.0-553.el8_10 but the Nvidia drivers haven't quite caught up on the "latest" dnf module stream.
no kernel module package kmod-nvidia-555.42.02-4.18.0-553 for kernel version 4.18.0-553.el8_10 and NVIDIA driver 550.54.15 could be found
Am I missing something?
Thanks!
The spec file (at least for RHEL 7) has:
Requires: nvidia-driver-%{_named_version} = %{?epoch:%{epoch}:}%{kmod_driver_version}
Conflicts: kmod-nvidia-latest-dkms
If I check for example nvidia-driver-branch-450-450.80.02-1.el7.x86_64
(but it's not specific to this version):
$ rpm -qp nvidia-driver-branch-450-450.80.02-1.el7.x86_64.rpm --requires
...
kmod-nvidia-latest-dkms = 3:450.80.02
...
So the RPM on one hand has a conflict for kmod-nvidia-latest-dkms
but it pulls in dependencies which requires kmod-nvidia-latest-dkms
... How is this supposed to work?
The kmod packages for 555.42.02
, 550.90.07
, 535.183.01
are on the way for kernel 5.14.0-427.20.1
In this line https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/main/yum-kmod-nvidia.spec#L75 it seems the package is flagged as an installonly
package
This leads to yum
attempting an install when we request a package update:
Package 3:kmod-nvidia-latest-3.10.0-1160.49.1.r470.82.01.el7.post1.x86_64 is allowed multiple installs
Which eventually causes a conflict as to do an install it needs to remove the old driver and that would break the older version of the installed kmod.
It might be by design (installonly
is documented to be used to prevent packages from being updated) but if so it does cause issues with automation tools like yum-cron
and auter
Is this intended or is it an oversight?
On my RHEL 8 machine when I try to update my machine, I get the following error:
Updating Subscription Management repositories.
Last metadata expiration check: 4:38:34 ago on Fri 06 May 2022 09:56:38 AM MDT.
NOTE: Skipping kernel installation since no kernel module package kmod-nvidia-510.47.03-4.18.0-348.23.1 for kernel version 4.18.0-348.23.1.el8_5 and NVIDIA driver 510.47.03 could be found
Error:
Problem: package kernel-modules-4.18.0-348.23.1.el8_5.x86_64 requires kernel-uname-r = 4.18.0-348.23.1.el8_5.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-348.20.1.el8_5.x86_64
- package kernel-core-4.18.0-348.23.1.el8_5.x86_64 is filtered out by exclude filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
I have been getting this message for over a week, which is longer than the ~24h that is advertised. Is there an estimate for when the kmod will be available for RHEL 8 kernel 4.18.0-348.23.1
We know that RHEL kernels are kABI stable through lifecycle. So Kmods will work even when kernel updates.
So we should build Kmod for each version of nvidia driver only once. This will reduce requirements of frequent updates.
Thanks for simplifying driver installation process. Unfortunately my display is blank after the installation. I can log in remotely and successfully run nvidia-smi and execute CUDA scripts. I'm a bit of a newbie so guidance would be appreciated.
NVIDIA driver version: 470.57.02
RHEL kernel version: 8.4
modularity stream: latest
modularity profile: default
Hello
It is not possible to install all the branches with dnf because the signature is missing for some of the rpms. Trying to update driver with the following commands
dnf remove nvidia-driver
dnf module reset nvidia-driver
dnf module list nvidia-driver
dnf module install nvidia-driver:470
ends with error message
...
[SKIPPED] nvidia-driver-cuda-libs-470.57.02-1.el8.x86_64.rpm: Already downloaded
[SKIPPED] nvidia-driver-libs-470.57.02-1.el8.x86_64.rpm: Already downloaded
[SKIPPED] nvidia-libXNVCtrl-470.57.02-1.el8.x86_64.rpm: Already downloaded
[SKIPPED] nvidia-libXNVCtrl-devel-470.57.02-1.el8.x86_64.rpm: Already downloaded
Package kmod-nvidia-470.57.02-4.18.0-305.7.1-470.57.02-3.el8_4.x86_64.rpm is not signed
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: GPG check FAILED
Running rpm -K $file on the latest kmod-nvidia for each branch shows this
kmod-nvidia-418.211.00-4.18.0-305.7.1-418.211.00-3.el8_4.x86_64.rpm: digests OK
kmod-nvidia-440.118.02-4.18.0-240.8.1-440.118.02-3.el8_3.x86_64.rpm: digests signatures OK
kmod-nvidia-450.142.00-4.18.0-305.7.1-450.142.00-3.el8_4.x86_64.rpm: digests OK
kmod-nvidia-455.45.01-4.18.0-305-455.45.01-3.el8.x86_64.rpm: digests signatures OK
kmod-nvidia-460.91.03-4.18.0-305.7.1-460.91.03-3.el8_4.x86_64.rpm: digests OK
kmod-nvidia-465.19.01-4.18.0-305.7.1-465.19.01-3.el8_4.x86_64.rpm: digests signatures OK
kmod-nvidia-470.57.02-4.18.0-305.7.1-470.57.02-3.el8_4.x86_64.rpm: digests OK
Need genmodules.py
changes for the /fm
modularity profile to address rename of fabric manager package from NVIDIA/yum-packaging-fabric-manager#2
When upgrading to new kernel, depmod
runs against old kernel. However for precompiled it should run for the kernel that a kmod package is compiled against. Thought this was fixed in #11 but it was missing the .%{arch}
in the directory name causing error
depmod: ERROR: could not open directory /lib/modules/4.18.0-YYY.el8_2: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
warning: %post(kmod-nvidia-418.XXX.XX-4.18.0-YYY-3:418.XXX.XX-3.el8_2.x86_64) scriptlet failed, exit status 1
following the recommendation in this discussion,
https://access.redhat.com/solutions/4134401#comment-2124501
I am reporting this issue here since I just got
NOTE: Skipping kernel installation since no kernel module package kmod-nvidia-470.57.02-4.18.0-305.17.1 for kernel version 4.18.0-305.17.1.el8_4 and NVIDIA driver 470.57.02 could be found
Error:
Problem 1: package kernel-modules-4.18.0-305.17.1.el8_4.x86_64 requires kernel-uname-r = 4.18.0-305.17.1.el8_4.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-305.3.1.el8_4.x86_64
- package kernel-core-4.18.0-305.17.1.el8_4.x86_64 is filtered out by exclude filtering
Problem 2: package kernel-modules-4.18.0-305.17.1.el8_4.x86_64 requires kernel-uname-r = 4.18.0-305.17.1.el8_4.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-305.7.1.el8_4.x86_64
- package kernel-core-4.18.0-305.17.1.el8_4.x86_64 is filtered out by exclude filtering
Problem 3: package kernel-modules-4.18.0-305.17.1.el8_4.x86_64 requires kernel-uname-r = 4.18.0-305.17.1.el8_4.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-305.12.1.el8_4.x86_64
- package kernel-core-4.18.0-305.17.1.el8_4.x86_64 is filtered out by exclude filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
which makes me sad :-(
hi,
Nice to see you sharing your work.
Will you also target CentOS Stream as best effort since the pipeline might be suited for automation?
Thanks
Tru
Now that RHEL 9 released are there any plans for supporting it with official builds?
Not sure why, but this one also appears to be delayed. I had previously updated the GPG key:
# rpm -q gpg-pubkey --qf '%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n'|grep nvidia
gpg-pubkey-d42d0685-62589a51 gpg(cudatools <[email protected]>)
# dnf update
Last metadata expiration check: 2:20:10 ago on Mon 18 Jul 2022 07:05:59 AM CDT.
NOTE: Skipping kernel installation since no kernel module package kmod-nvidia-515.48.07-4.18.0-372.16.1 for kernel version 4.18.0-372.16.1.el8_6 and NVIDIA driver 515.48.07 could be found
Error:
Problem 1: package kernel-modules-4.18.0-372.16.1.el8_6.x86_64 requires kernel-uname-r = 4.18.0-372.16.1.el8_6.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-348.20.1.el8_5.x86_64
- package kernel-core-4.18.0-372.16.1.el8_6.x86_64 is filtered out by exclude filtering
Problem 2: package kernel-modules-4.18.0-372.16.1.el8_6.x86_64 requires kernel-uname-r = 4.18.0-372.16.1.el8_6.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-372.9.1.el8.x86_64
- package kernel-core-4.18.0-372.16.1.el8_6.x86_64 is filtered out by exclude filtering
Problem 3: package kernel-modules-4.18.0-372.16.1.el8_6.x86_64 requires kernel-uname-r = 4.18.0-372.16.1.el8_6.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-372.13.1.el8_6.x86_64
- package kernel-core-4.18.0-372.16.1.el8_6.x86_64 is filtered out by exclude filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
It might be that I am a bit early, but maybe a CI machine has not act as it should, as in #27 so I post this already today
following the recommendation in this discussion,
https://access.redhat.com/solutions/4134401#comment-2124501
I am reporting this issue here since I just got
NOTE: Skipping kernel installation since no kernel module package kmod-nvidia-470.57.02-4.18.0-305.19.1 for kernel version 4.18.0-305.19.1.el8_4 and NVIDIA driver 470.57.02 could be found
Error:
Problem 1: package kernel-modules-4.18.0-305.19.1.el8_4.x86_64 requires kernel-uname-r = 4.18.0-305.19.1.el8_4.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-305.7.1.el8_4.x86_64
- package kernel-core-4.18.0-305.19.1.el8_4.x86_64 is filtered out by exclude filtering
Problem 2: package kernel-modules-4.18.0-305.19.1.el8_4.x86_64 requires kernel-uname-r = 4.18.0-305.19.1.el8_4.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-305.12.1.el8_4.x86_64
- package kernel-core-4.18.0-305.19.1.el8_4.x86_64 is filtered out by exclude filtering
Problem 3: package kernel-modules-4.18.0-305.19.1.el8_4.x86_64 requires kernel-uname-r = 4.18.0-305.19.1.el8_4.x86_64, but none of the providers can be installed
- cannot install the best update candidate for package kernel-modules-4.18.0-305.17.1.el8_4.x86_64
- package kernel-core-4.18.0-305.19.1.el8_4.x86_64 is filtered out by exclude filtering
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Running RHEL 8.6 with the kernel and two different driver versions:
The GPU I have is NVIDIA RX 3050 Ti mobile. Drivers install fine but on reboot system gets stuck at Dell/RHEL splash screen. Do these drivers work with "switchable graphics"? Thank you for the great instructions the process is easy to follow!
the postld file defined in the rpm spec file here https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/main/yum-kmod-nvidia.spec#L40 is in the rpm database with permissions 0644.
However, in the postinstall section here https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/main/yum-kmod-nvidia.spec#L172 the file gets an execute bit set. This causes security tooling we run to be unhappy as the file flags as modified since install (i.e. possible malware). The postld file should be defined with the desired permissions in the rpm spec rather than have them set in the %post section
In the RHEL7 branch the spec sets the mode of ld.gold.nvidia as 755
https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/515-rhel7/kmod-nvidia.spec#L255
However the spec also sets defattr for files as 644
https://github.com/NVIDIA/yum-packaging-precompiled-kmod/blob/515-rhel7/kmod-nvidia.spec#L260
This causes rpm -q --dump
report the mode as 0100644
while in reality the file is laid down when installed as 755, which in order causes integrity scanners warn about the inconsistency.
(redhat satellite fails to sync a repo without it)
Hi,
Trying to run build.sh from main on NVIDIA-Linux-x86_64-525.60.11.run. It gets to the copy_rpms stage and fails.
==> copy_rpms(http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64)
:: repodata/3690e13e1224a040cac6f8c9a3dbe96551659ec22cabee088ed0b7daa8b515cb-primary.xml.gz
ERROR: Unable to locate 525.60.11 driver packages in repository for rhel8/x86_64
Looks like only 525.60.13 is available in the cuda repository, but that version is not available as a run file.
On a machine that has not been upgraded in a long time I have installed from the yum repos:
[root@chewey ~]# rpm -qa | grep nvidia
nvidia-driver-devel-418.67-4.el7.x86_64
nvidia-driver-cuda-libs-418.67-4.el7.x86_64
nvidia-xconfig-418.67-1.el7.x86_64
nvidia-driver-NVML-418.67-4.el7.x86_64
nvidia-libXNVCtrl-devel-418.67-1.el7.x86_64
nvidia-libXNVCtrl-418.67-1.el7.x86_64
nvidia-detect-460.73.01-1.el7.elrepo.x86_64
nvidia-driver-NvFBCOpenGL-418.67-4.el7.x86_64
nvidia-modprobe-418.67-1.el7.x86_64
nvidia-driver-libs-418.67-4.el7.x86_64
yum-plugin-nvidia-0.5-1.el7.noarch
nvidia-settings-418.67-1.el7.x86_64
dkms-nvidia-418.67-1.el7.x86_64
nvidia-persistenced-418.67-1.el7.x86_64
nvidia-driver-418.67-4.el7.x86_64
nvidia-driver-cuda-418.67-4.el7.x86_64
Repo is defined as:
[root@chewey ~]# cat /etc/yum.repos.d/cuda.repo
[cuda]
name=cuda
baseurl=http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64
enabled=1
gpgcheck=1
gpgkey=http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
When I do a yum upgrade it fails with:
--> Processing Conflict: 3:nvidia-driver-branch-440-440.118.02-1.el7.x86_64 conflicts nvidia-driver > 440.118.02
--> Processing Conflict: 3:nvidia-driver-latest-dkms-465.19.01-1.el7.x86_64 conflicts nvidia-driver > 465.19.01
--> Finished Dependency Resolution
Error: nvidia-driver-branch-440 conflicts with 3:nvidia-driver-latest-dkms-465.19.01-1.el7.x86_64
Error: Package: 3:nvidia-driver-branch-440-440.118.02-1.el7.x86_64 (cuda)
Requires: kmod-nvidia-latest-dkms = 3:440.118.02
Available: 3:kmod-nvidia-latest-dkms-418.87.00-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.87.00-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.87.00-2.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.87.00-2.el7
Available: 3:kmod-nvidia-latest-dkms-418.87.01-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.87.01-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.116.00-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.116.00-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.126.02-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.126.02-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.152.00-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.152.00-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.165.02-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.165.02-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.181.07-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.181.07-1.el7
Available: 3:kmod-nvidia-latest-dkms-418.197.02-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:418.197.02-1.el7
Available: 3:kmod-nvidia-latest-dkms-440.33.01-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:440.33.01-1.el7
Available: 3:kmod-nvidia-latest-dkms-440.64.00-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:440.64.00-1.el7
Available: 3:kmod-nvidia-latest-dkms-440.95.01-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:440.95.01-1.el7
Available: 3:kmod-nvidia-latest-dkms-440.118.02-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:440.118.02-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.36.06-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.36.06-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.51.05-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.51.05-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.51.06-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.51.06-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.80.02-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.80.02-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.102.04-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.102.04-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.119.03-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.119.03-1.el7
Available: 3:kmod-nvidia-latest-dkms-450.119.04-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:450.119.04-1.el7
Available: 3:kmod-nvidia-latest-dkms-455.23.05-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:455.23.05-1.el7
Available: 3:kmod-nvidia-latest-dkms-455.32.00-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:455.32.00-1.el7
Available: 3:kmod-nvidia-latest-dkms-455.45.01-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:455.45.01-1.el7
Available: 3:kmod-nvidia-latest-dkms-460.27.04-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:460.27.04-1.el7
Available: 3:kmod-nvidia-latest-dkms-460.32.03-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:460.32.03-1.el7
Available: 3:kmod-nvidia-latest-dkms-460.73.01-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:460.73.01-1.el7
Installing: 3:kmod-nvidia-latest-dkms-465.19.01-1.el7.x86_64 (cuda)
kmod-nvidia-latest-dkms = 3:465.19.01-1.el7
Error: nvidia-driver-latest-dkms conflicts with 3:nvidia-driver-branch-440-440.118.02-1.el7.x86_64
You could try using --skip-broken to work around the problem
Why is it trying to install both 440 and 465 driver files?
Source: https://forums.developer.nvidia.com/t/error-syncing-rhel8-cuda-repo/176276
Multiple reports of unable to mirror/sync https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/ repository with pulp
since ~ 1 month ago.
"traceback"=>
"Traceback (most recent call last):\n" +
" File \"/usr/lib/python2.7/site-packages/celery/app/trace.py\", line 367, in trace_task\n" +
" R = retval = fun(*args, **kwargs)\n" +
" File \"/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py\", line 688, in __call__\n" +
" return super(Task, self).__call__(*args, **kwargs)\n" +
" File \"/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py\", line 110, in __call__\n" +
" return super(PulpTask, self).__call__(*args, **kwargs)\n" +
" File \"/usr/lib/python2.7/site-packages/celery/app/trace.py\", line 622, in __protected_call__\n" +
" return self.run(*args, **kwargs)\n" +
" File \"/usr/lib/python2.7/site-packages/pulp/server/controllers/repository.py\", line 860, in sync\n" +
" raise pulp_exceptions.PulpExecutionException(_('Importer indicated a failed response'))\n" +
"PulpExecutionException: Importer indicated a failed response\n",
modules.yaml document is valid UTF-8 but error suggests parsing failing at stream key-values.
"distribution"=>
{"items_total"=>0,
"state"=>"FINISHED",
"error_details"=>[],
"items_left"=>0},
"modules"=>
{"state"=>"FAILED",
"error"=>
"strings in documents must be valid UTF-8: '\\x8c\\x01\\x00\\x00\\x04465-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04418-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04450\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04440\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04460\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04455\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04465\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04440-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04455-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04460-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04latest-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04450-dkms\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04418\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x04latest\\x00\\x14\\x00\\x00\\x00\\x020\\x00\\x08\\x00\\x00\\x00default\\x00\\x00\\x00'"},
"errata"=>{"state"=>"NOT_STARTED"},
"metadata"=>{"state"=>"FINISHED"}}},
Looking at another modularity-enabled repository, strings are enclosed in double-quotes ...
NVIDIA provided precompiled kmod RPMs only officially support RHEL kernels. These are built and tested on Red Hat Enterprise Linux for that specific kernel release. This blog post goes into more detail.
A frequently asked question is regarding technical reasons for why other RHEL-like kernels would not be compatible. The primary blocker is that in order to avoid any potential ABI incompatibility, the precompiled design requires a exact match of the kernel version string.
Let's look at some kernel-core
data for
Rocky Linux and Alma Linux both archive packages from previous y-stream releases, so first enable those repos.
rockylinux:8
define old_releases=('8.6' '8.5' '8.4' '8.3')
rockylinux:9
define old_releases=('9.0')
rockyvault="https://dl.rockylinux.org/vault/rocky"
for ver in ${old_releases[@]}; do
repo="$rockyvault/$ver/BaseOS/x86_64/os"
echo -e "[Rocky-Vault-$ver]\nname=Rocky-Vault-$ver\nbaseurl=$repo/\ngpgcheck=1\nenabled=1\ngpgkey=$repo/RPM-GPG-KEY-rockyofficial" | tee /etc/yum.repos.d/Rocky-Vault-$ver.repo
done
almalinux:8
define old_releases=('8.6' '8.5' '8.4' '8.3')
almalinux:9
define old_releases=('9.0')
almavault="https://repo.almalinux.org/vault"
almagpg="https://repo.almalinux.org/almalinux/RPM-GPG-KEY-AlmaLinux-8"
for ver in ${old_releases[@]}; do
repo="$almavault/$ver/BaseOS/x86_64/os"
echo -e "[Alma-Vault-$ver]\nname=Alma-Vault-$ver\nbaseurl=$repo/\ngpgcheck=1\nenabled=1\ngpgkey=$almagpg" | tee /etc/yum.repos.d/Alma-Vault-$ver.repo
done
dnf list kernel-core --showduplicates
# Filter output
dnf list kernel-core --showduplicates | awk '{print $2}' | grep "\.el" | sort -uV
-------------A--B--C--D------------------A--B--C--D---
| 1| 8.0 [+][ ][+][ ] |30| [+][+][+][+]
| 2| [+][ ][+][ ] |31| [+][+][+][+]
| 3| [+][ ][+][ ] |32| [+][+][+][+]
| 4| [+][ ][+][ ] |33| [+][+][+][+]
| 5| [+][ ][+][ ] |34| [+][+][+][+]
| 6| [+][ ][+][ ] |35| [+][+][+][+]
| 7| [+][ ][+][ ] |36| 8.5 [+][+][+][+]
| 8| [+][ ][+][ ] |37| [+][+][+][+]
| 9| 8.1 [+][ ][+][ ] |38| [+][+][+][+]
|10| [+][ ][+][ ] |39| [+][+][+][+]
|11| [+][ ][+][ ] |40| [+][+][+][+]
|12| [+][ ][+][ ] |41| [+][+][+][+]
|13| [+][ ][+][ ] |42| 8.6 [+][+][+][+]
|14| [+][ ][+][ ] |43| [+][+][ ][+]
|15| 8.2 [+][ ][+][ ] |44| [ ][ ][+][ ]
|16| [+][ ][+][ ] |45| [+][+][ ][+]
|17| [+][ ][+][ ] |46| [ ][+][ ][ ]
|18| [+][ ][+][ ] |47| [ ][ ][+][ ]
|19| [+][ ][+][ ] |48| [+][+][ ][+]
|20| [+][ ][+][ ] |49| [ ][ ][+][ ]
|21| [+][ ][+][ ] |50| [+][+][ ][+]
|22| 8.3 [+][ ][+][+] |51| [ ][ ][+][ ]
|23| [+][ ][+][ ] |52| [+][+][ ][+]
|24| [+][ ][+][ ] |53| [ ][ ][+][ ]
|25| [+][ ][+][ ] |54| 8.7 [+][+][+][+]
|26| [+][ ][+][+] |55| [+][+][+][+]
|27| [+][+][+][+] |56| [+][+][+][+]
|28| 8.4 [+][ ][+][+] |57| [+][+][+][+]
|29| [+][+][+][+]
------------------------------------------------------
A. Red Hat Enterprise Linux
B. Rocky Linux
C. Oracle Linux
D. Alma Linux
-------------A--B--C--D--------------
| 1| 9.0 [+][ ][ ][+]
| 2| [ ][ ][+][ ]
| 3| [+][ ][ ][+]
| 4| [ ][ ][+][ ]
| 5| [+][ ][ ][+]
| 6| [ ][ ][+][ ]
| 7| [+][ ][ ][+]
| 8| [ ][ ][+][ ]
| 9| [+][+][ ][+]
|10| [ ][ ][+][ ]
|11| 9.1 [+][ ][+][+]
|12| [+][ ][+][+]
|13| [+][ ][+][+]
|14| [+][+][+][+]
-------------------------------------
A. Red Hat Enterprise Linux
B. Rocky Linux
C. Oracle Linux
D. Alma Linux
While there is some overlap with kernel versions, it is often the case where there is not overlap (missing kernels, versioned differently, etc.). This results in non-deterministic install behavior - depending on when the dnf
transaction occurs.
To explain another way, for example, let's assume the kernels are aligned today and the precompiled install succeeds on machine A (RHEL-like) — however next week there is a new kernel released, it may not succeed on machine B (RHEL-like) because there is not a compatible kmod package available.
As such, attempts to use the precompiled modular streams provided in the CUDA repository on non-RHEL distros results in a degraded user experience and is not supported by NVIDIA.
Instead sysadmins are encouraged to build DIY precompiled kmod RPMs using the instructions provided in this git repo, otherwise the DKMS modular streams may be used.
Hello,
This ticket is a more formalized method of reviewing
#21 for
CentOS Stream 9 (and following).
I’d like to outline my understanding of your process, to make sure I
know what resources CentOS needs to provide for this to be
reconsidered. I'm hopeful this pile of information will help in the
process of reviewing and building for CentOS Stream.
Each pre-compiled kmod is specific to a released kernel. The kABI is
not used in part because the limited introspection, in part because 1:1
mappings are much easier to validate, and in part because the DRM
subsystem is not on the stable kABI list.
Much of what I’ve got written here is covered in much more detail at
https://wiki.centos.org/Events/Dojo/FOSDEM2022 “Tracking Kernel Rate of
Change”. I’ve gone into (painful?) detail on exact change metrics on
real world data within that presentation.
At minor releases within the RHEL product lifecycle (8.x, 9.x) the RHEL
kernel is updated
with changes published in the CentOS Stream kernel since the last minor
release (x-1).
I believe the clearest benefit to NVIDIA adding pre-compiled kmods for
CentOS Stream 9 is in reducing the time pressure on engineering for
steps 2-4 regarding RHEL sync up.
Since the existing drivers compile against the kernel.org kernel,
engineering is probably spending most of its time on step 2. The
changes to the RHEL kernel come as backports from the kernel.org
kernel, so I suspect much of the codebase is ready by the time those
patches make it to RHEL. There are doubtless places where new
connective hooks need to be inserted or the complex “ifdef” maze of
supporting multiple kernels.
I believe CentOS Stream 9 can help this in two ways:
A) You can see exactly how the kernel is changing in RHEL 9 at
https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9
B) You can run your validation process against what will change in RHEL
early.
On element (B), this is perhaps most clearly expressed in the RHEL 8.5
kernel vs the CentOS Stream 8 kernel. The RHEL 8.5 GA kernel released
into CentOS Stream 8 a total of 19 days before RHEL 8.5 GA. All kmods
tracking CentOS Stream 8 were ready for the RHEL 8.5 release kernel
weeks before RHEL 8.5 was published. The two kernels are identical at
that point.
There are some kernels released in CentOS Stream 8 that serve as a
“stepping stone” to the next big kernel. However, I view this as an
advantage to folks who need to review the actual function changes –
these “stepping stone” kernels have smaller patch sets, and thus less
to review. The changes are backported from kernel.org, this isn’t like
supporting a separate distribution kernel as the code comes from
kernel.org (which already works) and goes into RHEL (which you are
targeting).
If we can establish a clear method for notification when new packages
are published in CentOS Stream, is this something you’d be willing to
explore - packaging for CentOS Stream in addition to RHEL?
I don't love the idea of regular polling, but today anyone can pull
down
https://git.centos.org/api/0/rpms/kernel/git/tags?with_commits=True
and filter against 'imports/c8s/' or 'imports/c9s/' to get the list of
current tagged kernels. The commit hashes listed there can be
transformed into a kernel source via the scripts in centos-git-common
or centpkg. In theory a repoquery of BaseOS should provide a clear
list of kernels in the release to be compared against the listed tags.
With the existing published content for CentOS Stream 9, you could
begin prep work for RHEL 9 and reduce any delay in support for that
platform after launch.
We have another repository https://github.com/NVIDIA/cuda-repo-management that contains the scripts for managing updating RPM and Debian repository metadata. That is the better fit, as such going to be moving genmodules.py
to that repo.
From repo :
http://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/
Latest kmod available is
kmod-nvidia-535.54.03-5.14.0-284.18.1-535.54.03-3.el9_2.x86_64.rpm
It seems we are missing :
kmod-nvidia-535.54.03-5.14.0-284.18.1-535.86.10-1.el9_2.x86_64.rpm
yum update cannot upgrade to 535
Hi folks,
Just wanted to quickly jump in related to this issue. Current Rocky Linux kernel is not ahead of RHEL, but the Release
tags can differ for a few reasons. Mostly, and in this case, it's because we've had to republish the same version because of a non-technical change. Because we never publish same NVR twice, we can add .rocky
or .0.1
in this case to the Release
field.
kernel-4.18.0-372.16.1.el8_6.0.1
and kernel-4.18.0-372.16.1.el8_6
is in fact equal.
I'm not sure if there is a good way to make this work with kmods. Thankfully though we do keep all published artifacts so users can generally use the first released version using the dnf install
method you provided above @kmittman.
Thanks,
Mustafa Gezen
Release Engineering lead @ Rocky Linux
Originally posted by @mstg in #35 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.