Git Product home page Git Product logo

Comments (7)

kmittman avatar kmittman commented on July 28, 2024

Hi @namupatel
Can you please indicate which NVIDIA GPU model/SKU you are using?

This week, we released CUDA 11.5.0 with NVIDIA driver 495.29.05
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/

If you were on the latest stream, you would have been upgraded to 495 now.

However, the last driver branch that supports [many] Kepler GPUs is 470. If that is the case, then I suggest: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#removing-cuda-tk-and-driver

sudo dnf remove nvidia-driver
sudo dnf module reset nvidia-driver
sudo dnf module install nvidia-driver:470

That will keep you on the precompiled 470 driver branch, which should be supported with updates for a very long time.

from yum-packaging-precompiled-kmod.

kmittman avatar kmittman commented on July 28, 2024

More information about this:

NVIDIA Driver support for Kepler is removed beginning with R495. CUDA Toolkit development support for Kepler continues through CUDA 11.x.

R470
Long Term Support Branch
EOL: July 2024

Anyway, if that's not the case and you are still on 470.57.02 and RHEL 8.4 kernel 4.18.0-305.19.1 then I will need take another look.

from yum-packaging-precompiled-kmod.

kmittman avatar kmittman commented on July 28, 2024

I tested the installation with my GTX 650 (Kepler GPU) with dnf module install nvidia-driver:470 with kernel 4.18.0-305.19.1 and after rebooting, GNOME desktop works just fine on my system.

$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
$ lsmod | grep -e nouveau -e nvidia
nvidia_drm             57344  6
nvidia_modeset       1155072  13 nvidia_drm
nvidia_uvm           1069056  0
nvidia              34709504  682 nvidia_uvm,nvidia_modeset
drm_kms_helper        233472  2 nvidia_drm,i915
drm                   569344  12 drm_kms_helper,nvidia,nvidia_drm,i915
$ rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-470.57.02-4.18.0-305.19.1-470.57.02-3.el8_4.x86_64
nvidia-driver-470.57.02-1.el8.x86_64
nvidia-driver-cuda-470.57.02-1.el8.x86_64
nvidia-driver-cuda-libs-470.57.02-1.el8.x86_64
nvidia-driver-devel-470.57.02-1.el8.x86_64
nvidia-driver-libs-470.57.02-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-470.57.02-1.el8.x86_64
nvidia-driver-NVML-470.57.02-1.el8.x86_64
nvidia-kmod-common-470.57.02-1.el8.noarch
nvidia-libXNVCtrl-470.57.02-1.el8.x86_64
nvidia-libXNVCtrl-devel-470.57.02-1.el8.x86_64
nvidia-modprobe-470.57.02-1.el8.x86_64
nvidia-persistenced-470.57.02-1.el8.x86_64
nvidia-settings-470.57.02-1.el8.x86_64
nvidia-xconfig-470.57.02-1.el8.x86_64
$ rpm -qa | grep kernel | grep $(uname -r) | sort
kernel-4.18.0-305.19.1.el8_4.x86_64
kernel-core-4.18.0-305.19.1.el8_4.x86_64
kernel-modules-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-libs-4.18.0-305.19.1.el8_4.x86_64
$ sudo dnf nvidia-plugin
installed kernel: kernel-4.18.0-305.19.1.el8_4.x86_64
installed kmod(s): kmod-nvidia-470.57.02-4.18.0-305.19.1-3:470.57.02-3.el8_4.x86_64
$ sudo dnf module list nvidia-driver
Last metadata expiration check: 0:10:02 ago on Fri 22 Oct 2021 01:51:53 PM PDT.
cuda-rhel8-x86_64
Name           Stream           Profiles                      Summary                              
nvidia-driver  latest           default [d], fm, ks, src      Nvidia driver for latest branch      
nvidia-driver  latest-dkms [d]  default [d], fm, ks           Nvidia driver for latest-dkms branch 
nvidia-driver  418              default [d], fm, ks, src      Nvidia driver for 418 branch         
nvidia-driver  418-dkms         default [d], fm, ks           Nvidia driver for 418-dkms branch    
nvidia-driver  440              default [d], fm, ks, src      Nvidia driver for 440 branch         
nvidia-driver  440-dkms         default [d], fm, ks           Nvidia driver for 440-dkms branch    
nvidia-driver  450              default [d], fm, ks, src      Nvidia driver for 450 branch         
nvidia-driver  450-dkms         default [d], fm, ks           Nvidia driver for 450-dkms branch    
nvidia-driver  455              default [d], fm, ks, src      Nvidia driver for 455 branch         
nvidia-driver  455-dkms         default [d], fm, ks           Nvidia driver for 455-dkms branch    
nvidia-driver  460              default [d], fm, ks, src      Nvidia driver for 460 branch         
nvidia-driver  460-dkms         default [d], fm, ks           Nvidia driver for 460-dkms branch    
nvidia-driver  465              default [d], fm, ks, src      Nvidia driver for 465 branch         
nvidia-driver  465-dkms         default [d], fm, ks           Nvidia driver for 465-dkms branch    
nvidia-driver  470 [e]          default [d] [i], fm, ks, src  Nvidia driver for 470 branch         
nvidia-driver  470-dkms         default [d], fm, ks           Nvidia driver for 470-dkms branch    
nvidia-driver  495              default [d], fm, ks, src      Nvidia driver for 495 branch         
nvidia-driver  495-dkms         default [d], fm, ks           Nvidia driver for 495-dkms branch    

Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
$ glxgears
300 frames in 5.0 seconds = 59.851 FPS
301 frames in 5.0 seconds = 60.002 FPS

from yum-packaging-precompiled-kmod.

namupatel avatar namupatel commented on July 28, 2024

Hi @kmittman,

Thanks for following-up. I'm using Tesla K40c (NVIDIA Corporation GK180GL). I've switched over to driver version 470 on RedHat 8.4 kernel 4.18.0-305.19.1.el8_4 and reinstalled CUDA 11.1. CUDA code is successfully running. I'm not in front of the machine today, but will check tomorrow if graphics are up now.

Looking at the checks you ran, I see that the VGA GPU listed in your case is the GTX 650. My machine has 2 GPUs which might be causing the problem (will verify if graphics are up tomorrow and run the glxgears test):

$ lspci | grep NVIDIA
83:00.0 3D controller: NVIDIA Corporation GK180GL [Tesla K40c] (rev a1)
84:00.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1)
84:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
$ lsmod | grep -e nouveau -e nvidia
nvidia_drm             57344  6
nvidia_modeset       1155072  4 nvidia_drm
nvidia_uvm           1069056  0
nvidia              34709504  197 nvidia_uvm,nvidia_modeset
drm_kms_helper        233472  1 nvidia_drm
drm                   569344  10 drm_kms_helper,nvidia,nvidia_drm

$ rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-470.57.02-4.18.0-305.19.1-470.57.02-3.el8_4.x86_64
nvidia-driver-470.57.02-1.el8.x86_64
nvidia-driver-cuda-470.57.02-1.el8.x86_64
nvidia-driver-cuda-libs-470.57.02-1.el8.x86_64
nvidia-driver-devel-470.57.02-1.el8.x86_64
nvidia-driver-libs-470.57.02-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-470.57.02-1.el8.x86_64
nvidia-driver-NVML-470.57.02-1.el8.x86_64
nvidia-kmod-common-470.57.02-1.el8.noarch
nvidia-libXNVCtrl-470.57.02-1.el8.x86_64
nvidia-libXNVCtrl-devel-470.57.02-1.el8.x86_64
nvidia-modprobe-470.57.02-1.el8.x86_64
nvidia-persistenced-470.57.02-1.el8.x86_64
nvidia-settings-470.57.02-1.el8.x86_64
nvidia-xconfig-470.57.02-1.el8.x86_64
$ rpm -qa | grep kernel | grep $(uname -r) | sort
kernel-4.18.0-305.19.1.el8_4.x86_64
kernel-core-4.18.0-305.19.1.el8_4.x86_64
kernel-devel-4.18.0-305.19.1.el8_4.x86_64
kernel-headers-4.18.0-305.19.1.el8_4.x86_64
kernel-modules-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-libs-4.18.0-305.19.1.el8_4.x86_64
$ sudo dnf nvidia-plugin
installed kernel: kernel-4.18.0-305.19.1.el8_4.x86_64
installed kmod(s): kmod-nvidia-470.57.02-4.18.0-305.19.1-3:470.57.02-3.el8_4.x86_64
$  sudo dnf module list nvidia-driver
Updating Subscription Management repositories.
Last metadata expiration check: 3:20:53 ago on Mon 25 Oct 2021 06:24:22 AM EDT.
cuda-rhel8-x86_64
Name          Stream          Profiles                    Summary
nvidia-driver latest          default [d], fm, ks, src    Nvidia driver for latest branch
nvidia-driver latest-dkms [d] default [d], fm, ks         Nvidia driver for latest-dkms branc
                                                          h
nvidia-driver 418             default [d], fm, ks, src    Nvidia driver for 418 branch
nvidia-driver 418-dkms        default [d], fm, ks         Nvidia driver for 418-dkms branch
nvidia-driver 440             default [d], fm, ks, src    Nvidia driver for 440 branch
nvidia-driver 440-dkms        default [d], fm, ks         Nvidia driver for 440-dkms branch
nvidia-driver 450             default [d], fm, ks, src    Nvidia driver for 450 branch
nvidia-driver 450-dkms        default [d], fm, ks         Nvidia driver for 450-dkms branch
nvidia-driver 455             default [d], fm, ks, src    Nvidia driver for 455 branch
nvidia-driver 455-dkms        default [d], fm, ks         Nvidia driver for 455-dkms branch
nvidia-driver 460             default [d], fm, ks, src    Nvidia driver for 460 branch
nvidia-driver 460-dkms        default [d], fm, ks         Nvidia driver for 460-dkms branch
nvidia-driver 465             default [d], fm, ks, src    Nvidia driver for 465 branch
nvidia-driver 465-dkms        default [d], fm, ks         Nvidia driver for 465-dkms branch
nvidia-driver 470 [e]         default [d] [i], fm, ks, sr Nvidia driver for 470 branch
                              c
nvidia-driver 470-dkms        default [d], fm, ks         Nvidia driver for 470-dkms branch
nvidia-driver 495             default [d], fm, ks, src    Nvidia driver for 495 branch
nvidia-driver 495-dkms        default [d], fm, ks         Nvidia driver for 495-dkms branch

Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
$ glxgears
57 frames in 5.1 seconds = 11.284 FPS

from yum-packaging-precompiled-kmod.

namupatel avatar namupatel commented on July 28, 2024

The terminal was blank so I rebooted. After selecting the OS version to boot I briefly saw a gray screen with three dots before the display blanked. Any more suggestions as to what might help? Thanks.

from yum-packaging-precompiled-kmod.

kmittman avatar kmittman commented on July 28, 2024

Hi @namupatel
The three dots are the Plymouth bootsplash in fallback mode, normally it would display the distro's logo.

I consulted with our driver team and the hypothesis is the X display server is starting on the headless display (the Tesla SKU has no VGA/DVI/HDMI/DP output). Plymouth splash appearing momentarily on the K4000 seems to indicate this is the case.

One way to solve this is by explicitly adding the BusID of the Quadro GPU to the /etc/X11/xorg.conf file, see: https://stackoverflow.com/a/18382758

In your case, that should be like 0x84 -> 132

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:132:0:0"
EndSection

Alternatively, you can use nvidia-xconfig with the --busid= and --device= parameters to generate the configuration.

If that does not work, then please attach a nvidia-bug-report.log file, generated using nvidia-bug-report.sh

from yum-packaging-precompiled-kmod.

kmittman avatar kmittman commented on July 28, 2024

@namupatel it's been awhile so closing this. Feel free to re-open if you are still experiencing this issue.

from yum-packaging-precompiled-kmod.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.