dualcoder / vgpu_unlock Goto Github PK

View Code? Open in Web Editor NEW

4.5K 102.0 422.0 102 KB

Unlock vGPU functionality for consumer grade GPUs.

License: MIT License

Python 17.87% C 79.19% Shell 2.94%

vgpu_unlock's Introduction

vgpu_unlock

Unlock vGPU functionality for consumer-grade Nvidia GPUs.

Important!

This tool is not guarenteed to work out of the box in some cases, so use it at your own risk.

Description

This tool enables the use of Geforce and Quadro GPUs with the NVIDIA vGPU graphics virtualization technology. NVIDIA vGPU normally only supports a few datacenter Teslas and professional Quadro GPUs by design, but not consumer graphics cards through a software limitation. This vgpu_unlock tool aims to remove this limitation on Linux based systems, thus enabling most Maxwell, Pascal, Volta (untested), and Turing based GPUs to use the vGPU technology. Ampere support is currently a work in progress.

A community maintained Wiki written by Krutav Shah with a lot more information is available here.

Dependencies:

This tool requires Python3 and Python3-pip; the latest version is recommended.
The python package "frida" is required. pip3 install frida.
The tool requires the NVIDIA GRID vGPU driver.
"dkms" is required as it simplifies the process of rebuilding the driver alot. Install DKMS with the package manager in your OS.

Installation:

In the following instructions <path_to_vgpu_unlock> need to be replaced with the path to this repository on the target system and <version> need to be replaced with the version of the NVIDIA GRID vGPU driver.

Install the NVIDIA GRID vGPU driver, make sure to install it as a dkms module.

./nvidia-installer --dkms

Modify the line begining with ExecStart= in /lib/systemd/system/nvidia-vgpud.service and /lib/systemd/system/nvidia-vgpu-mgr.service to use vgpu_unlock as the executable and pass the original executable as the first argument. Example:

ExecStart=<path_to_vgpu_unlock>/vgpu_unlock /usr/bin/nvidia-vgpud

Reload the systemd daemons:

systemctl daemon-reload

Modify the file /usr/src/nvidia-<version>/nvidia/os-interface.c and add the following line after the lines begining with #include at the beginning of the file.

#include "<path_to_vgpu_unlock>/vgpu_unlock_hooks.c"

Modify the file /usr/src/nvidia-<version>/nvidia/nvidia.Kbuild and add the following line at the bottom of the file.

ldflags-y += -T <path_to_vgpu_unlock>/kern.ld

Remove the nvidia kernel module using dkms:

dkms remove -m nvidia -v <version> --all

Rebuild and reinstall the nvidia kernel module using dkms:

dkms install -m nvidia -v <version>

Reboot.

NOTE

This script only works with graphics cards in the same generation as their professional Tesla counterparts. As a result, only Maxwell and newer generation Nvidia GPUs are supported. It is not designed to be used with low end graphics card models, so not all cards are guarenteed to work smoothly with vGPU. For the best experience, it is recommended to use graphics cards with the same chip model as the Tesla cards. The same applies to the operating system as well, as certain bleeding-edge Linux distributions may not work well with vGPU software.

How it works

vGPU supported?

In order to determine if a certain GPU supports the vGPU functionality the driver looks at the PCI device ID. This identifier together with the PCI vendor ID is unique for each type of PCI device. In order to enable vGPU support we need to tell the driver that the PCI device ID of the installed GPU is one of the device IDs used by a vGPU capable GPU.

Userspace script: vgpu_unlock

The userspace services nvidia-vgpud and nvidia-vgpu-mgr uses the ioctl syscall to communicate with the kernel module. Specifically they read the PCI device ID and determines if the installed GPU is vGPU capable.

The python script vgpu_unlock intercepts all ioctl syscalls between the executable specified as the first argument and the kernel. The script then modifies the kernel responses to indicate a PCI device ID with vGPU support and a vGPU capable GPU.

Kernel module hooks: vgpu_unlock_hooks.c

In order to exchange data with the GPU the kernel module maps the physical address space of the PCI bus into its own virtual address space. This is done using the ioremap* kernel functions. The kernel module then reads and writes data into that mapped address space. This is done using the memcpy kernel function.

By including the vgpu_unlock_hooks.c file into the os-interface.c file we can use C preprocessor macros to replace and intercept calls to the iormeap and memcpy functions. Doing this allows us to maintain a view of what is mapped where and what data that is being accessed.

Kernel module linker script: kern.ld

This is a modified version of the default linker script provided by gcc. The script is modified to place the .rodata section of nv-kernel.o into .data section instead of .rodata, making it writable. The script also provide the symbols vgpu_unlock_nv_kern_rodata_beg and vgpu_unlock_nv_kern_rodata_end to let us know where that section begins and ends.

How it all comes together

After boot the nvidia-vgpud service queries the kernel for all installed GPUs and checks for vGPU capability. This call is intercepted by the vgpu_unlock python script and the GPU is made vGPU capable. If a vGPU capable GPU is found then nvidia-vgpu creates an MDEV device and the /sys/class/mdev_bus directory is created by the system.

vGPU devices can now be created by echoing UUIDs into the create files in the mdev bus representation. This will create additional structures representing the new vGPU device on the MDEV bus. These devices can then be assigned to VMs, and when the VM starts it will open the MDEV device. This causes nvidia-vgpu-mgr to start communicating with the kernel using ioctl. Again these calls are intercepted by the vgpu_unlock python script and when nvidia-vgpu-mgr asks if the GPU is vGPU capable the answer is changed to yes. After that check it attempts to initialize the vGPU device instance.

Initialization of the vGPU device is handled by the kernel module and it performs its own check for vGPU capability, this one is a bit more complicated.

The kernel module maps the physical PCI address range 0xf0000000-0xf1000000 into its virtual address space, it then performs some magical operations which we don't really know what they do. What we do know is that after these operations it accesses a 128 bit value at physical address 0xf0029624, which we call the magic value. The kernel module also accessses a 128 bit value at physical address 0xf0029634, which we call the key value.

The kernel module then has a couple of lookup tables for the magic value, one for vGPU capable GPUs and one for the others. So the kernel module looks for the magic value in both of these lookup tables, and if it is found that table entry also contains a set of AES-128 encrypted data blocks and a HMAC-SHA256 signature.

The signature is then validated by using the key value mentioned earlier to calculate the HMAC-SHA256 signature over the encrypted data blocks. If the signature is correct, then the blocks are decrypted using AES-128 and the same key.

Inside of the decrypted data is once again the PCI device ID.

So in order for the kernel module to accept the GPU as vGPU capable the magic value will have to be in the table of vGPU capable magic values, the key has to generate a valid HMAC-SHA256 signature and the AES-128 decrypted data blocks has to contain a vGPU capable PCI device ID. If any of these checks fail, then the error code 0x56 "Call not supported" is returned.

In order to make these checks pass the hooks in vgpu_unlock_hooks.c will look for a ioremap call that maps the physical address range that contain the magic and key values, recalculate the addresses of those values into the virtual address space of the kernel module, monitor memcpy operations reading at those addresses, and if such an operation occurs, keep a copy of the value until both are known, locate the lookup tables in the .rodata section of nv-kernel.o, find the signature and data bocks, validate the signature, decrypt the blocks, edit the PCI device ID in the decrypted data, reencrypt the blocks, regenerate the signature and insert the magic, blocks and signature into the table of vGPU capable magic values. And that's what they do.

vgpu_unlock's People

Stargazers

Watchers

Forkers

elroberto538 archontes greenyun bencoster screenwhine perryh moandcompany peyton eligrey r33int longjohncoder bhchiang 4144 stikves darkguy2008 ashishbijlani davebattles downtosky sippiecup antonf jakogut manfromafar signcl horstboy tatusamma crackercat aerok lavacano hobbit19 kobe718 cutff jiangge alphaarea karminski alex-zl btx638 pirenga gprocunier cwickniss mrb0y shaohan0228 crabkun gazny spyd3rweb hifiphile numpythuckles surfndez jamesatintegratnio imsmith techris45 elgalu ltears mfkiwl michaelso julian1130 liradb2000 theholyloli neophack creslinux vinjn lyzcx nlnjnj yangliuyu dqixol rkkoszewski umw0lverine shadowridgedev northvolin ramas-jpg abhighdf amitic jacky-ggyy windliao kaovd ai-maxim 91leinad perryodk krutavshah mmtwotoy dbezemer fangli mojotaker romangrechin web-dev-ritchie adrian-enspired dreamcat4 jonpas veshtov pejvan myhugong smeshing labdiynez red54 hxlls liuluo1979 xieydd lovejoy metathrone vbill heluocs

vgpu_unlock's Issues

Titan V support?

Hi,
seems there is a Tesla V100 ,so Titan V support should be possible, right?
thanks..

Does cuda support vgpu driver ？

Hello,I encountered a problem. The GPU I used was 2070 and the driver version was Linux KVM 7.9. When I install the vgpu driver and cuda in the virtual machine, there will be a problem that the vgpu driver and cuda do not match. Does cuda support vgpu? If you support it, what version of cuda can adapt to the vgpu driver.Thank you very much!

Hardcoded PCI address range

In the readme you wrote
"Physical PCI address range 0xf0000000-0xf1000000"
in my case the range turned out to be 0x4810000000-0x4811ffffff

The Address Range can be found in dmesg in lines like this:
pci 0000:0a:00.0: reg 0x1c: [mem 0x4810000000-0x4811ffffff 64bit pref]

This is Probably due to above 4g decoding (not 100% sure, but my best guess)

NVIDIA GRID vGPU driver

Thanks for the work you did on this! I'm personally a bit confused as to which driver I need to be using here. Is this publicly available for download?

ESXI Hypervisor installation

Is there any way to make this work on the ESXi hypervisor?

GTX970 on Proxmox

Hi! First of all, great project, big thanks for pulling this off.

I've been trying to use the tool on a Proxmox instance, followed the instructions, installed driver with dkms enabled, edited the 4 files, uninstall/reinstall with dkms, rebooted, mdevctl recognises the card. I've also added a uuid to the VM config.

However, when I try to start a VM, the web gui throws this error at me:

kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:01:00.0/00000000-0000-0000-0000-000000000100,id=hostpci0,bus=pci.0,addr=0x10: vfio 00000000-0000-0000-0000-000000000100: error getting device from group 13: Input/output error
Verify all devices in group 13 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1

The vgpu is created, it also seems on mdevctl list output, in iommu group 13 that is the only device:

ll /sys/kernel/iommu_groups/13/devices/
total 0
lrwxrwxrwx 1 root root 0 Jul  4 12:31 00000000-0000-0000-0000-000000000100 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/00000000-0000-0000-0000-000000000100

vgpu-mgr logs:

journalctl -u nvidia-vgpu-mgr.service 
-- Logs begin at Sun 2021-07-04 12:11:18 CEST, end at Sun 2021-07-04 12:14:06 CEST. --
Jul 04 12:11:23 pve1 systemd[1]: Starting NVIDIA vGPU Manager Daemon...
Jul 04 12:11:29 pve1 systemd[1]: Started NVIDIA vGPU Manager Daemon.
Jul 04 12:11:30 pve1 bash[1055]: vgpu_unlock loaded.
Jul 04 12:11:31 pve1 nvidia-vgpu-mgr[1055]: vgpu_unlock loaded.
Jul 04 12:11:31 pve1 nvidia-vgpu-mgr[1097]: vgpu_unlock loaded.
Jul 04 12:11:31 pve1 nvidia-vgpu-mgr[1097]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1796]: vgpu_unlock loaded.
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: vgpu_unlock loaded.
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 00000000-0000-0000-0000-000000000100 GPU PCI id 00:01
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=11
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: notice: vmiop_env_log: Successfully updated env symbols!
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: op_type: 0x20801322 failed.
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: op_type: 0x2080014b failed.
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: (0x0): Mixed density FB regions not supported
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: Assertion Failed at 0xb4133a03:10561
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: 12 frames returned by backtrace
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv005021vgpu+0x18) [0x7f0ab41793c8]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa64ef) [0x7f0ab41324ef]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa7a03) [0x7f0ab4133a03]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa813b) [0x7f0ab413413b]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0xa98c7) [0x7f0ab41358c7]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: vgpu() [0x413e72]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: vgpu() [0x4140e9]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: vgpu() [0x40e9d7]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: vgpu() [0x40c2c9]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: vgpu() [0x40bc7c]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7f0ab462d09b]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: vgpu() [0x4033ba]
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: (0x0): Failed to get FB region information
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: (0x0): Initialization: Failed to get static information error 1
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (unable to setup host connection state)
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 1
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_log: display_init failed for inst: 0
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_env_log: (0x0): vmiope_process_configuration: plugin registration error
Jul 04 12:14:05 pve1 nvidia-vgpu-mgr[1808]: error: vmiop_env_log: (0x0): vmiope_process_configuration failed with 0x1f

vgpu_unlock: latest
Nvidia drivers I tried (error message is same with all versions):

NVIDIA-Linux-x86_64-450.102-vgpu-kvm.run
NVIDIA-Linux-x86_64-450.124-vgpu-kvm.run
NVIDIA-Linux-x86_64-460.73.02-vgpu-kvm.run

Kernel: Linux 5.4.73-1-pve #1 SMP PVE 5.4.73-1

If possible, could you help me resolve this issue? If more details are needed, I'm more than happy to provide.

README.md needs fix

README.md says to open nvidia-vgpu.service, which is a file that doesn't exist. Instead, the correct file would be nvidia-vgpud.service.

Edit: The part of the file that ldflags-y += -T <path_to_vgpu_unlock>/kern.ld needs to be placed in is not specified. Is it the beginning or bottom of the file? Please specify this in the README as well. Thanks.

I hook RTX2070super to vGPU,it looks like everything is ok ，bu once VM start，it fails error

helloI has hooked RTX2070super to vGPU,it looks like everything is ok ，bu once VM start，it fails error.
my environment :
Linux localhost.localdomain 3.10.0-957.el7.x86_64
NVIDIA-Linux-x86_64-460.73.01.run
1)qemu error is as follows:
2021-04-21T09:42:33.814146Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/4bca2ed2-bf47-4a06-af38-103c5c22d1c6,display=off,bus=pci.6,addr=0x0: vfio error: 4bca2ed2-bf47-4a06-af38-103c5c22d1c6: error getting device from group 14: Input/output error
Verify all devices in group 14 are bound to vfio- or pci-stub and not already in use
2)nvidia-vgpu-mgr error is as follows:
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=259
7:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: notice: vmiop_env_log: Successfully updated env symbols!
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: error: vmiop_log: (0x0): vGPU is supported only on VGX capable boards
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (vGPU validation of the GPU failed)
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 1
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8632]: vgpu_unlock loaded.
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8630]: vgpu_unlock loaded.
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8654]: vgpu_unlock loaded.
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8655]: vgpu_unlock loaded.
17:42:33 localhost.localdomain nvidia-vgpu-mgr[8654]: error: vmiop_env_log: Failed to get VM UUID from QEMU command-line 0x57
3) vgpu looks like is OK
[root@localhost mdev_supported_types]# cat nvidia-*/name
GRID RTX6000-1Q
GRID RTX6000-2Q
GRID RTX6000-3Q
GRID RTX6000-4Q
GRID RTX6000-6Q
GRID RTX6000-8Q
GRID RTX6000-12Q
GRID RTX6000-24Q
GRID RTX6000-4C
GRID RTX6000-6C
GRID RTX6000-8C
GRID RTX6000-12C
GRID RTX6000-24C
GRID RTX6000-1B
GRID RTX6000-2B
GRID RTX6000-1A
GRID RTX6000-2A
GRID RTX6000-3A
GRID RTX6000-4A
GRID RTX6000-6A
GRID RTX6000-8A
GRID RTX6000-12A
GRID RTX6000-24A

what should I do ? help me please,thank you !

ERROR: Unable to load the 'nvidia-vgpu-vfio' kernel module.

When trying to install the VGPU package and choosing DKMS, it complains it can't find the 'nvidia-vgpu-vfio' kernel module.

Does this still require a nvidia license server?

I'm looking to set this up on my proxmox server but have a couple of questions...
Does this still require a license server?
Will this work on Proxmox 6.3-6?
I've got two cards in my test server, a quandro p2000, and a GTX 1660 super.. is there some way to use only one for this project, or will both be affected should I build this solution?
And I did sign up at nvidia for a trial license for 90 days, but there are a lot of different drivers, which one should I use for my hardware?
Thanks for this work, this is very exciting stuff!!

)

error: unknown type name ‘uint64_t’

Building module:
cleaning build area...
'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.114-1-pve IGNORE_CC_MISMATCH='' modules.....(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.4.114-1-pve (x86_64)
Consult /var/lib/dkms/nvidia/460.73.02/build/make.log for more information.

//
/opt/vgpu_unlock/vgpu_unlock_hooks.c:1173:35: error: unknown type name ‘uint64_t’
/opt/vgpu_unlock/vgpu_unlock_hooks.c:791:17: warning: ‘vgpu_unlock_bar3_end’ defined but not used [-Wunused-variable]
static uint64_t vgpu_unlock_bar3_end;

Is it possible to unlock on a windows Hyper-V host?

I am hoping there might be some way to unlock this for use with Hyper-V.

Is that possible using something like WSL or do I need to unlock on linux host and then transfer card to my hyper-v host?

Few details please

First, great project!

Second - could someone specify the vGPU driver version (and the clients versions) that is working with this hack? I'm pretty sure that within days, nVidia will release a new driver which will block this hack from working..

Another (a bit non related question) - If I recall correctly, you'll need to pay a monthly/yearly license to use the vGPU feature or is it something else?

Thanks

Built-in VNC and black screen

Hi.

OS: CentOS 8.3 + KVM

If I create a guest (Windows 10 Pro), then by default I can connect to him via VNC. But as soon as I add vGPU

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='30820a6f-b1a5-4503-91ca-0c10ba58692a'/>
  </source>
</hostdev>

, then when I connect through VNC, I see only a black screen.

Why is this happening?

is it possible to control gpu output (hdmi) to vm only?

In single gpu senerio, most people do not need host to have continous display output, after boot up.
In old age, we passthrough the entire gpu together with display output to an VM, which then the host will detach the display.

When it comes to vgpu unlock, we still wants the display output in VM level, is it possible with exsiting development?
I saw in wiki that we can have merged driver so that both host and VM can share the vGPU and at the same time the host can have display. Is it possible to designate the output to a single VM, while the other VM can still share the gpu?

Or if it is a must to install X in host (e.g. proxmox) and share the screenbuffer? Thanks.

X not started

I have RTX2080 (connected to monitor)and Intel UHD 630 (primary, connected to monitor). Host is on AlmaLinux8.4 (clone RHEL8), guest OS is Ubuntu 20.04
Use NVIDIA-GRID-Linux-KVM-460.32.04-460.32.03-461.33.zip
Install NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run
After installation have next on host:

mdevctl types

[user@localhost work]$ mdevctl types
0000:01:00.0
nvidia-256
  Available instances: 0
  Device API: vfio-pci
  Name: GRID RTX6000-1Q
  Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-257
  Available instances: 0
  Device API: vfio-pci
  Name: GRID RTX6000-2Q
  Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
nvidia-258
  Available instances: 0
  Device API: vfio-pci
  Name: GRID RTX6000-3Q
  Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
nvidia-259
  Available instances: 0
  Device API: vfio-pci
  Name: GRID RTX6000-4Q
  Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
nvidia-260
  Available instances: 3
  Device API: vfio-pci
  Name: GRID RTX6000-6Q
  Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4

First, why RTX6000 with TU102GL, not TU104GL? My card have gpu TU104.

Next, in KVM create new VM with Ububtu 20.04 and add new PCI device with XML
Install NVIDIA-Linux-x86_64-460.32.03-grid.run
After reboot have no graphics, black screen.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID RTX6000-6Q     On   | 00000000:04:00.0 Off |                  N/A |
| N/A   N/A    P8    N/A /  N/A |    432MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                             
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+`

lsmod | grep nvidia
`nvidia_drm             57344  0
nvidia_modeset       1228800  1 nvidia_drm
nvidia              34050048  1 nvidia_modeset
drm_kms_helper        217088  4 qxl,nvidia_drm
drm                   552960  6 drm_kms_helper,qxl,drm_ttm_helper,nvidia_drm,ttm`

Xorg.0.log

`[     3.818] (--) Log file renamed from "/var/log/Xorg.pid-895.log" to "/var/log/Xorg.0.log"
[     3.818] 
X.Org X Server 1.20.9
X Protocol Version 11, Revision 0
[     3.836] Build Operating System: Linux 4.15.0-130-generic x86_64 Ubuntu
[     3.836] Current Operating System: Linux user-KVM 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64
[     3.836] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.8.0-59-generic root=UUID=7cbeea00-f21b-4a76-ba88-0649e9195180 ro quiet splash vt.handoff=7
[     3.836] Build Date: 17 January 2021  09:13:31AM
[     3.837] xorg-server 2:1.20.9-2ubuntu1.2~20.04.1 (For technical support please see http://www.ubuntu.com/support) 
[     3.837] Current version of pixman: 0.38.4
[     3.837] 	Before reporting problems, check http://wiki.x.org
  to make sure that you have the latest version.
[     3.837] Markers: (--) probed, (**) from config file, (==) default setting,
  (++) from command line, (!!) notice, (II) informational,
  (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[     3.839] (==) Log file: "/var/log/Xorg.0.log", Time: Wed Jul  7 16:54:24 2021
[     3.839] (==) Using config file: "/etc/X11/xorg.conf"
[     3.839] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[     3.839] (==) ServerLayout "Layout0"
[     3.839] (**) |-->Screen "Screen0" (0)
[     3.840] (**) |   |-->Monitor "Monitor0"
[     3.842] (**) |   |-->Device "Device0"
[     3.842] (**) |-->Input Device "Keyboard0"
[     3.842] (**) |-->Input Device "Mouse0"
[     3.842] (==) Automatically adding devices
[     3.842] (==) Automatically enabling devices
[     3.842] (==) Automatically adding GPU devices
[     3.843] (==) Automatically binding GPU devices
[     3.847] (==) Max clients allowed: 256, resource mask: 0x1fffff
[     3.847] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[     3.847] 	Entry deleted from font path.
[     3.847] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[     3.847] 	Entry deleted from font path.
[     3.847] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[     3.847] 	Entry deleted from font path.
[     3.847] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[     3.847] 	Entry deleted from font path.
[     3.847] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[     3.848] 	Entry deleted from font path.
[     3.848] (==) FontPath set to:
  /usr/share/fonts/X11/misc,
  /usr/share/fonts/X11/Type1,
  built-ins
[     3.848] (==) ModulePath set to "/usr/lib/xorg/modules"
[     3.848] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[     3.848] (WW) Disabling Keyboard0
[     3.848] (WW) Disabling Mouse0
[     3.848] (II) Loader magic: 0x556d1102f020
[     3.848] (II) Module ABI versions:
[     3.856] 	X.Org ANSI C Emulation: 0.4
[     3.860] 	X.Org Video Driver: 24.1
[     3.860] 	X.Org XInput driver : 24.1
[     3.860] 	X.Org Server Extension : 10.0
[     3.861] (++) using VT number 1

[     3.863] (II) systemd-logind: took control of session /org/freedesktop/login1/session/c1
[     3.864] (II) xfree86: Adding drm device (/dev/dri/card0)
[     3.868] (II) systemd-logind: got fd for /dev/dri/card0 226:0 fd 12 paused 0
[     3.877] (II) xfree86: Adding drm device (/dev/dri/card1)
[     3.878] (II) systemd-logind: got fd for /dev/dri/card1 226:1 fd 13 paused 0
[     3.909] (--) PCI:*(0@0:1:0) 1b36:0100:1af4:1100 rev 4, Mem @ 0xf0000000/67108864, 0xf4000000/67108864, 0xfcc14000/8192, I/O @ 0x0000c040/32, BIOS @ 0x????????/131072
[     3.909] (--) PCI: (4@0:0:0) 10de:1e30:10de:1329 rev 161, Mem @ 0xfa000000/16777216, 0xd0000000/268435456, 0xf8000000/33554432
[     3.910] (II) LoadModule: "glx"
[     3.910] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[     3.910] (II) Module glx: vendor="X.Org Foundation"
[     3.910] 	compiled for 1.20.9, module version = 1.0.0
[     3.911] 	ABI class: X.Org Server Extension, version 10.0
[     3.911] (II) LoadModule: "nvidia"
[     3.911] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[     3.911] (II) Module nvidia: vendor="NVIDIA Corporation"
[     3.911] 	compiled for 1.6.99.901, module version = 1.0.0
[     3.911] 	Module class: X.Org Video Driver
[     3.911] (II) NVIDIA dlloader X Driver  460.32.03  Sun Dec 27 18:56:00 UTC 2020
[     3.912] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[     3.912] (II) systemd-logind: releasing fd for 226:1
[     3.912] (II) Loading sub module "fb"
[     3.912] (II) LoadModule: "fb"
[     3.913] (II) Loading /usr/lib/xorg/modules/libfb.so
[     3.913] (II) Module fb: vendor="X.Org Foundation"
[     3.913] 	compiled for 1.20.9, module version = 1.0.0
[     3.913] 	ABI class: X.Org ANSI C Emulation, version 0.4
[     3.913] (II) Loading sub module "wfb"
[     3.913] (II) LoadModule: "wfb"
[     3.913] (II) Loading /usr/lib/xorg/modules/libwfb.so
[     3.914] (II) Module wfb: vendor="X.Org Foundation"
[     3.914] 	compiled for 1.20.9, module version = 1.0.0
[     3.914] 	ABI class: X.Org ANSI C Emulation, version 0.4
[     3.914] (II) Loading sub module "ramdac"
[     3.914] (II) LoadModule: "ramdac"
[     3.914] (II) Module "ramdac" already built-in
[     3.915] (EE) No devices detected.
[     3.915] (II) Applying OutputClass "nvidia" to /dev/dri/card1
[     3.915] 	loading driver: nvidia
[     4.058] (==) Matched qxl as autoconfigured driver 0
[     4.058] (==) Matched nvidia as autoconfigured driver 1
[     4.058] (==) Matched nouveau as autoconfigured driver 2
[     4.058] (==) Matched modesetting as autoconfigured driver 3
[     4.058] (==) Matched fbdev as autoconfigured driver 4
[     4.058] (==) Matched vesa as autoconfigured driver 5
[     4.058] (==) Assigned the driver to the xf86ConfigLayout
[     4.059] (II) LoadModule: "qxl"
[     4.059] (II) Loading /usr/lib/xorg/modules/drivers/qxl_drv.so
[     4.059] (II) Module qxl: vendor="X.Org Foundation"
[     4.059] 	compiled for 1.20.7, module version = 0.1.5
[     4.059] 	Module class: X.Org Video Driver
[     4.059] 	ABI class: X.Org Video Driver, version 24.1
[     4.059] (II) LoadModule: "nvidia"
[     4.059] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[     4.059] (II) Module nvidia: vendor="NVIDIA Corporation"
[     4.059] 	compiled for 1.6.99.901, module version = 1.0.0
[     4.059] 	Module class: X.Org Video Driver
[     4.059] (II) UnloadModule: "nvidia"
[     4.059] (II) Unloading nvidia
[     4.059] (II) Failed to load module "nvidia" (already loaded, 0)
[     4.059] (II) LoadModule: "nouveau"
[     4.059] (II) Loading /usr/lib/xorg/modules/drivers/nouveau_drv.so
[     4.059] (II) Module nouveau: vendor="X.Org Foundation"
[     4.059] 	compiled for 1.20.3, module version = 1.0.16
[     4.059] 	Module class: X.Org Video Driver
[     4.059] 	ABI class: X.Org Video Driver, version 24.0
[     4.059] (II) LoadModule: "modesetting"
[     4.059] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[     4.059] (II) Module modesetting: vendor="X.Org Foundation"
[     4.059] 	compiled for 1.20.9, module version = 1.20.9
[     4.059] 	Module class: X.Org Video Driver
[     4.059] 	ABI class: X.Org Video Driver, version 24.1
[     4.059] (II) LoadModule: "fbdev"
[     4.059] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[     4.059] (II) Module fbdev: vendor="X.Org Foundation"
[     4.059] 	compiled for 1.20.1, module version = 0.5.0
[     4.059] 	Module class: X.Org Video Driver
[     4.059] 	ABI class: X.Org Video Driver, version 24.0
[     4.059] (II) LoadModule: "vesa"
[     4.059] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[     4.059] (II) Module vesa: vendor="X.Org Foundation"
[     4.059] 	compiled for 1.20.4, module version = 2.4.0
[     4.059] 	Module class: X.Org Video Driver
[     4.059] 	ABI class: X.Org Video Driver, version 24.0
[     4.059] (II) NVIDIA dlloader X Driver  460.32.03  Sun Dec 27 18:56:00 UTC 2020
[     4.059] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[     4.059] (II) qxl: Driver for QXL virtual graphics: QXL 1
[     4.059] (II) NOUVEAU driver Date:   Mon Jan 28 23:25:58 2019 -0500
[     4.059] (II) NOUVEAU driver for NVIDIA chipset families :
[     4.059] 	RIVA TNT            (NV04)
[     4.059] 	RIVA TNT2           (NV05)
[     4.059] 	GeForce 256         (NV10)
[     4.059] 	GeForce 2           (NV11, NV15)
[     4.059] 	GeForce 4MX         (NV17, NV18)
[     4.060] 	GeForce 3           (NV20)
[     4.060] 	GeForce 4Ti         (NV25, NV28)
[     4.060] 	GeForce FX          (NV3x)
[     4.060] 	GeForce 6           (NV4x)
[     4.060] 	GeForce 7           (G7x)
[     4.060] 	GeForce 8           (G8x)
[     4.060] 	GeForce 9           (G9x)
[     4.060] 	GeForce GTX 2xx/3xx (GT2xx)
[     4.060] 	GeForce GTX 4xx/5xx (GFxxx)
[     4.060] 	GeForce GTX 6xx/7xx (GKxxx)
[     4.060] 	GeForce GTX 9xx     (GMxxx)
[     4.060] 	GeForce GTX 10xx    (GPxxx)
[     4.060] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[     4.060] (II) FBDEV: driver for framebuffer: fbdev
[     4.060] (II) VESA: driver for VESA chipsets: vesa
[     4.060] (WW) Falling back to old probe method for modesetting
[     4.060] (WW) Falling back to old probe method for fbdev
[     4.060] (WW) Falling back to old probe method for modesetting
[     4.060] (WW) Falling back to old probe method for fbdev
[     4.060] (II) [KMS] Kernel modesetting enabled.
[     4.060] (EE) No devices detected.
[     4.060] (EE) 
Fatal server error:
[     4.060] (EE) no screens found(EE) 
[     4.060] (EE) 
Please consult the The X.Org Foundation support 
   at http://wiki.x.org
for help. 
[     4.060] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[     4.060] (EE) 
[     4.066] (EE) Server terminated with error (1). Closing log file.`

GP107GL

I don't see any explicit support for the Quadro P400 GPU (based on the GP107GL chip)...should we assume these "lesser" variants are supported or not?

DKMS will not build for the latest vGPU drivers (460.32.04)

While DKMS will build for driver 450, it will not with the newer versions. Here is the error:

`[root@rhel-test vgpu]# dkms install -m nvidia -v 460.32.04

Creating symlink /var/lib/dkms/nvidia/460.32.04/source ->
/usr/src/nvidia-460.32.04

DKMS: add completed.

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area...
'make' -j4 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=4.18.0-240.10.1.el8_3.x86_64 IGNORE_CC_MISMATCH='' modules...........(bad exit status: 2)
Error! Bad return status for module build on kernel: 4.18.0-240.10.1.el8_3.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/460.32.04/build/make.log for more information.`

Would this work with a Quadro p2000 card?

Hello all,

I have a Quadro p2000 card in a Promxox box and would love to get some vGPU's out of it since all it currently does is transcode for emby. Would this "unlock" work for this card? Would it work with Proxmox?

vgpu_unlock supports rtx3080

Does vgpu_unlock currently support rtx3080？If vgpu_unlock supports rtx3080, what type of vgpu is generated by rtx3080 and what version of the driver is used in the virtual machine.

Tesla M40 Problems & Memory Allocation Limit with Tesla M40 24GB -> Tesla M60 remapping

First and primary:
I'm coming from a setup where I was using a GTX 1060 with vgpu_unlock just fine, but figured I'd step it up so that I could support more VMs. So, I'm currently trying to use a Tesla M40. Being a Tesla card, you might expect not to need vgpu_unlock, but this is one of the few Tesla's that doesn't support it natively. So, I'm trying to use nvidia-18 types from the M60 profiles with my VMs. I'm aware that I should be using a slightly older driver to match my host driver. However, I'm still getting a code 43 when I load my guest. I would provide some logs here, but I'm not sure what I can include since the entries for the two vgpu services both seem to be fine with no errors other than nvidia-vgpu-mgr[2588]: notice: vmiop_log: display_init inst: 0 successful at the end of trying to initialize the mdev device when the VM starts up. Please let me know any other information that I can provide to help debug/troubleshoot.
Second:
This is probably one of the few instances where this is a problem since most GeForce/Quadro cards have less memory than their vGPU capable counterparts. However, I have a Tesla M40 GPU that has 24 GB of vRAM (in two separate memory regions I would guess, although this SKU isn't listed on the Nvidia graphics processing units Wikipedia page, so I'm not 100% sure). This is in comparison to the Tesla M60's 2x8GB configuration, of which, only 8GB is available for allocation in vGPU.
I'm not sure whether the max_instance quantity, as seen in mdevctl types, is defined on the Nvidia driver side, in the vgpu_unlock side, or if it's a mix and the vgpu_unlock side might be able to do something about it.
What I'm asking here, though, is whether this value can be redefined so that I can utilize all 24 GB of my available vRAM or, if not that, then at least the 12 GB that I presume is available in the GPU's primary memory.

how hook rtx2070 super to tesla t4?,I can't find nvidia-installer,only for xxx.rmp

hello,I want to hook rtx2070 super to tesla t4,but My GPU manager is NVIDIA-vGPU-rhel-7.6-440.53.x86_64.rpm.I can't find nvidia-installer? what is nvidia-installer ? Is it a shell run packet? what should I do ? thanks you!

Not work for NVIDIA 460.32.03 driver

On Ubuntu 20.04 with 5.8.0.48 kernel.

Try to install 460.32.03 driver downloaded in the portal (NVIDIA-Linux-x86_64-460.32.03-grid.run)

Minor issue, have to fix nvidia/nv-vgpu-vmbus.c to rename PAGE_KERNEL_RX to PAGE_KERNEL_ROX to make dkms happy.

Cannot find fore-mentioned vgpud.service and vgpu-mgr.service, only gridd.service available.
Upon switching to vgpu_unlock to launch gridd.service, still get license issue:

# systemctl status nvidia-gridd.service

Apr 10 00:41:28 sz77 systemd[1]: Starting NVIDIA Grid Daemon...
Apr 10 00:41:28 sz77 systemd[1]: Started NVIDIA Grid Daemon.
Apr 10 00:41:29 sz77 nvidia-gridd[1685]: Started (1685)
Apr 10 00:41:32 sz77 nvidia-gridd[1685]: Licensing not supported for GPUs in the system
Apr 10 00:41:32 sz77 nvidia-gridd[1685]: Failed to handle license change events
Apr 10 00:41:32 sz77 nvidia-gridd[1685]: Licensing not supported for GPUs in the system
Apr 10 00:41:32 sz77 nvidia-gridd[1685]: Failed to unlock PID file: Bad file descriptor
Apr 10 00:41:32 sz77 nvidia-gridd[1685]: Failed to close PID file: Bad file descriptor
Apr 10 00:41:32 sz77 nvidia-gridd[1685]: Shutdown (1685)
Apr 10 00:41:32 sz77 systemd[1]: nvidia-gridd.service: Succeeded.

Problem with type 'C' instances

I'm having some trouble getting type C instances working. This is with a GV100, in Centos. Type Q instances work without issue.
These type C instances are listed as available using mdevctl types, and I can create them using mdevctl without any problems.

The following appears in the logs when I try and start a VM with a C type instance attached:

Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 9ebb3727-9bfe-492a-8adf-4fe1d1381401 GPU PCI id 00:65:00.0 config params vgpu_type_id=312
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=312
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: notice: vmiop_env_log: Successfully updated env symbols!
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: error: vmiop_log: (0x0): Guest BAR1 is of invalid length (g: 0x200000000, h: 0x10000000)
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (error setting vGPU configuration information from RM)
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 1
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: error: vmiop_log: display_init failed for inst: 0
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: error: vmiop_env_log: (0x0): vmiope_process_configuration: plugin registration error
Jun 14 09:21:23 hostname nvidia-vgpu-mgr[66999]: error: vmiop_env_log: (0x0): vmiope_process_configuration failed with 0x1f

It's only possible to start one instance (and also, it's not possible to start any instance of type C)

Hi all.

I am using a 1080 Ti and no matter the size of the instance I choose, I can only assign the vgpu to one machine. When I try a second one, I get the error:

kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:07:00.0/00000000-0000-0000-0000-000000000101,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: vfio 00000000-0000-0000-0000-000000000101: error getting device from group 29: Connection timed out
Verify all devices in group 29 are bound to vfio-<bus> or pci-stub and not already in use`

Also, instances of type C (for computing) don't work with the very same error.

Finally, it says it's possible to create instances of sizes up to 24Gb, although the 1080 Ti only has 11Gb VRAM.

can't find nvidia file after successful install 460.32.04 linux kvm driver

hello everyone,
I follow the tutorial from vGPU wiki to set up the ubuntu 20.04 system. It works well, until to step 8 to modify four nvidia file.
I can't find any file named nvidia inside /lib/systemd/system/, I am not sure why there's no file inside.
I also remove nvidia driver and reinstall it again, everytime it said driver install sucessfully, but it still can't be found.
anyone can help me.

host system: ubuntu 20.04
cpu: amd 3600xt
gpu: 2060 super
driver: nvidia-linux-x86_64-460.32.04-vgpu-kvm.run (from nvidia licensing software download)(use --dkms install)
iommu: enable
svm: enable
grub_cmdline_linux_default: amd_iommu=on iommu=pt

Windows 10 VM problem

Hi,
I used vgpu_unlock to successfully virtualize the 2080s, and also successfully turned on the windows 10 VM with vGPU and passed the license verification. But since my VM added the mdev of vGPU, it feels a bit lagging, and it feels like there is something wrong. I have tried B-series and Q-series, and disables frame rate limiting.

Have you encountered a similar problem?
Can you also share the full XML file of your VM?
How do you connect to the VM with vGPU? I use RDP and TightVNC.

card

h264/HEVC Encode/Decode Support

NVIDIA's documentation indicates most of the consumer GPUs don't support the same number of encoders/decoders nor unlimited encode/decode sessions...can you comment on whether this driver mod addresses that limitation in any way or are we still bound by that limitation? In other words, are the encoder/decoder differences actually in silicon or are they just product segmentation implemented in drivers?

The reason I ask is because I'm trying to ascertain whether hardware-accelerated VDI is actually feasible using this modification or if on lower-end hardware I'd be stuck with a single h264/HEVC encode session--thereby still imposing a 1:1 ratio between physical GPUs and VMs?

hello,after vgpu_unlock is ok,license is still need for render video?

hello,after vgpu_unlock is operated succefully, nvidia license is still need for render video。 how to unlock nvidia license ? Is there any plan for unlock nvidia license? thank you !

Ampere support

An RTX A4000 was spotted, presumably built on GA104, so we should be on the look-out for a new driver that supports this GPU and test it out on existing consumer-grade 30-series cards.

ESXi support?

Can this hack be used on vsphere/esxi? How can i set it up?

Maxwell support

Is it possible to introduce GeForce 900-series to this project? GTX 970 seems to have the same chip GM204 as Tesla M6 which should support vGPU.

nvidia-bug-report.sh requested for further information regarding why the script failed.

Hello, the README says that the script is currently not working right now, so I would like if you can run nvidia-bug-report.sh to generate a report of all the messages and errors put out by vgpu manager and vgpud. This will help in figuring out what went wrong. Thank you.

Kvm vgpu driver

Hi i send mail to nvdia but not send driver link. Please send to mail kvm driver link.my mail adress [email protected]

Lower tier SKUs for vGPU not advisable

I noticed that cards like the 1060 3GB are included in the list of GPUs that can be marked as "vGPU capable" which is probably not the best idea. I say this because if a user tries to launch a profile that uses the entire video memory or multiple profiles on the same card without adequate memory available for both to remain stable when under load, the GPU host driver may fail and make the card inaccessible until reboot, or crash the system outright.

I don't know if it is possible to prevent the user from doing this within the vgpu_unlock_hooks.c file, but it may be possible to prevent all the profiles from being loaded at boot if they exceed the memory available. For example, prevent loading of P4-8Q on 1060 3GB.

It's a rather minor issue, and the host driver may have protection capability. But if it doesn't this can become an issue on certain cards.

ModuleNotFoundError: No module named 'frida' even when its installed

So I finished the installation of this fork, but when booting up it fails to start both "NVIDIA vGPU Daemon" and "NVIDIA vGPU Manager Daemon".

journalctl -u nvidia-vgpu-mgr.service gives:

Jul 10 19:01:19 archtower systemd[1]: Starting NVIDIA vGPU Manager Daemon...
Jul 10 19:01:19 archtower vgpu_unlock[414]: Traceback (most recent call last):
Jul 10 19:01:19 archtower vgpu_unlock[414]:   File "/opt/vgpu_unlock_5.12/vgpu_unlock", line 13, in <module>
Jul 10 19:01:19 archtower vgpu_unlock[414]:     import frida
Jul 10 19:01:19 archtower vgpu_unlock[414]: ModuleNotFoundError: No module named 'frida'
Jul 10 19:01:19 archtower systemd[1]: nvidia-vgpu-mgr.service: Control process exited, code=exited, status=1/FAILURE
Jul 10 19:01:19 archtower systemd[1]: nvidia-vgpu-mgr.service: Failed with result 'exit-code'.
Jul 10 19:01:19 archtower systemd[1]: Failed to start NVIDIA vGPU Manager Daemon.

I don't code that much, but IIRC this is because of a missing python library right?
after installing frida with pip, it keeps giving the same error.
I can even import frida from a separate python3 console...

asciinema: https://asciinema.org/a/EA1Twe22z0Z2dGN1jwlKYKxbC

Maxwell 2.0 support? + where is driver?

Basically Tesla M60 have GM204 (as GTX970), so virtually you can just add both product IDs and it should work.
(I would test myself but I cant find a driver)

By the way, I feel like an idiot, but I can't find a GRID vGPU driver, on Nvidia download page, when I select GRID > vGPU, there is no Linux in operating system, there is xenServer and vSphere, inside windows drivers + NVIDIA-vGPU-kepler-xenserver-version.rpm
So, should I use Tesla driver or where can I find a proper driver?

Question: Turn all GeForce cards into TCC capable devices?

Hello!

Using Linux hypervisor, could your tricks be used to turn all GeForce (from Pascal and up) cards to be TCC capable for Windows Guest? This is the feature we would need. If the answer is yes, but needs modification of your code, we are prepared to pay for this project to modify our ArchLinux virtualization project we are working on.

Let me know! Thanks for your time!

The hooks use a hard-coded value for GRID P40-6Q.

There is a single hard-coded value on line 804 of vgpu_unlock_hooks.c that makes the resulting lookup table match the GRID P40-6Q device:

pci_info[3] = 0x11ec;

This value is printed next to the PCI device ID in the list of supported vGPUs printed by nvidia-vgpud:

nvidia-vgpud[674]: Supported VGPU 0x32: max 4
nvidia-vgpud[674]: VGPU Type 0x32: GRID P40-6Q Class: Quadro
nvidia-vgpud[674]: DevId: 0x10de / 0x1b38 / 0x10de / 0x11ec
nvidia-vgpud[674]: Framebuffer: 0x164000000
nvidia-vgpud[674]: Mappable video size: 0x400000
nvidia-vgpud[674]: Framebuffer reservation: 0x1c000000
nvidia-vgpud[674]: FRL configuration: 0x3c
nvidia-vgpud[674]: CUDA enabled: 0x1
nvidia-vgpud[674]: ECC supported: 0x1
nvidia-vgpud[674]: Multi vGPU supported: 0x0
nvidia-vgpud[674]: Encoder Capacity: 0x64
nvidia-vgpud[674]: BAR1 Length: 0x100
nvidia-vgpud[674]: Frame Rate Limiter enabled: 0x1
nvidia-vgpud[674]: Number of Displays: 4
nvidia-vgpud[674]: Max pixels: 58982400
nvidia-vgpud[674]: Display: width 7680, height 4320
nvidia-vgpud[674]: License: Quadro-Virtual-DWS,5.0;GRID-Virtual-WS,2.0;GRID-Virtual-WS-Ext,2.0

As a result vgpu_unlock will only support the GRID P40-6Q device. We will need to have a lookup table of all these device IDs per GPU so that they can all be inserted into the lookup tables in the kernel module.

A workaround is to edit the hard-coded value and then remove and reinstall the kernel module using dkms.

Document more explicitly that you have to add a custom PCI ID for GPUs that are supported but are not in the default list

Hi everyone,

Thanks very much for this work. I've been wanting to try out vGPUs for a very, very long time, and this might make my dreams come true, so it's very exciting.

I attempted to follow the instructions and nvidia-vgpud said I had an unsupported vGPU (I have a GTX 1060 6GB, which should be supported, right?).

I added this to the vgpu_unlock script, which made nvidia-vgpud "work" (as in, it exits with an error code of zero.

                // GP104
                if(actual_devid == 0x1b80 || // GTX 1080
                   actual_devid == 0x1b81 || // GTX 1070
                   actual_devid == 0x1b82 || // GTX 1070 Ti
                   actual_devid == 0x1c03 || // GTX 1060 6GB, **mine**
                   actual_devid == 0x1b83 || // GTX 1060 6GB
                   actual_devid == 0x1b84 || // GTX 1060 3GB
                   actual_devid == 0x1bb0) { // Quadro P5000
                    spoofed_devid = 0x1bb3; // Tesla P4
                }

Here are the systemd logs for what I mean by nvidia-vgpud exiting:

Apr 09 22:37:29 localhost nvidia-vgpud[5660]: Number of Displays: 1
Apr 09 22:37:29 localhost nvidia-vgpud[5660]: Max pixels: 8847360
Apr 09 22:37:29 localhost nvidia-vgpud[5660]: Display: width 4096, height 2160
Apr 09 22:37:29 localhost nvidia-vgpud[5660]: License: NVIDIA-vComputeServer,9.0;Quadro-Virtual-DWS,5.0
Apr 09 22:37:29 localhost nvidia-vgpud[5660]: PID file unlocked.
Apr 09 22:37:29 localhost nvidia-vgpud[5660]: PID file closed.
Apr 09 22:37:29 localhost nvidia-vgpud[5660]: Shutdown (5660)

I'm not certain this is what's supposed to happen (shouldn't it keep running?)

I went and created an mdev, following the instructions here.

When I added the mdev to libvirt, I used the following XML

<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci'>
  <source>
    <address uuid='84c0de53-9363-478d-876c-b298956a4af1 '/>
  </source>
</hostdev>

I get the following error when starting the VM, though:

qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/84c0de53-9363-478d-876c-b298956a4af1,display=off,bus=pci.5,addr=0x0: vfio 84c0de53-9363-478d-876c-b298956a4af1: error getting device from group 8: Input/output error

Verify all devices in group 8 are bound to vfio-<bus> or pci-stub and not already in use

Dmesg says:

[nvidia-vgpu-vfio] 84c0de53-9363-478d-876c-b298956a4af1: start failed. status: 0x1

Did I do something wrong? Should I be using CentOS/RHEL instead of openSUSE?

I then found out that the systemd service nvidia-gpu-mgr is a thing.
These were the logs:

Apr 09 22:37:29 localhost notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 84c0de53-9363-478d-876c-b29>
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=63
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: notice: vmiop_env_log: Successfully updated env symbols!
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: error: vmiop_log: (0x0): vGPU is supported only on VGX capable boards
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (vGPU validation of the GPU failed)
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 1
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: error: vmiop_log: display_init failed for inst: 0
Apr 09 22:37:29 localhost nvidia-vgpu-mgr[5614]: error: vmiop_env_log: (0x0): vmiope_process_configuration: plugin registration error

I set ExecStart to /opt/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpu-mgr (in hopes that wouldn't help),
and now I have:

 nvidia-vgpu-mgr[2314]: notice: vmiop_env_log: Successfully updated env symbols!
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: NVOS status 0x56
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: Assertion Failed at 0x8429e183:293
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: 10 frames returned by backtrace
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: /usr/lib64/libnvidia-vgpu.so(_nv004938vgpu+0x26) [0x7f49842ee6a6]
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: /usr/lib64/libnvidia-vgpu.so(+0x88a7a) [0x7f498429ca7a]
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: /usr/lib64/libnvidia-vgpu.so(+0x8a183) [0x7f498429e183]
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: vgpu() [0x4119f1]
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: vgpu() [0x412955]
Apr 09 22:49:09 localhost nvidia-vgpu-mgr[2314]: error: vmiop_log: vgpu() [0x40d1fc]

Is there something I'm missing, or is my setup just wrong/not supported, or did I mess up something, or… is this a bug that my GPU doesn't work?

iommu_group: No such file or directory (solved)

For those who encounter this error while using virt-manager or qemu-system-*.
Ensure that the kernel modules vfio_mdev and vfio_iommu_type1 are loaded. In my system (openSUSE Leap 15.2) they weren't and after a modprobe this issue was gone.

Will be added the GTX 1050 Ti also?

Virtual

Notebook variants of GPUs not supported?

Hello!

It seems that the notebook PCI IDs are not supported (specifically, I have a GTX 1060 6GB with Max-Q Design). Are they completely unable to work with vGPU, or they can be patched to work too? It seems that they have the ID of 1C20.

Will be added the GTX 1660 Super also?

I like to know if gtx 1660 Super will be added to the supported gpu list.
Thank you very much

Much more support for other cards?!

Stumbled upon this https://envytools.readthedocs.io/en/latest/hw/pciid.html

There you can find many device ID's :-)
Might this be of any use? :,-)

Non-DKMS Method Possibility

As someone who does not use NVidia graphics currently, this is more of a "food for thought" question, however, would this be possible without relying on DKMS?

For example, on distros that do not employ DKMS, such as Solus.

Type Q with more than one VM

We use RTX 4000 and create 4 Q devices
1f1d02f7-d9ca-4b09-a26a-9049061a2416 0000:03:00.0 nvidia-260
04492aab-6cf2-4b03-ab85-2381bf94c9c5 0000:03:00.0 nvidia-260
5c1d9150-c134-4cf6-a070-3e61b0ba4bfa 0000:03:00.0 nvidia-260
7dd5ad5b-af5e-411d-88f3-2c6c827b2680 0000:03:00.0 nvidia-260

but while we start one VM with 04492aab-6cf2-4b03-ab85-2381bf94c9c5, we try to start another, shows error
kernel:
[11764.207909] [nvidia-vgpu-vfio] 1f1d02f7-d9ca-4b09-a26a-9049061a2416: start failed. status: 0x1

QEMU:
qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/1f1d02f7-d9ca-4b09-a26a-9049061a2416,bus=pci.0,addr=0xc: vfio 1f1d02f7-d9ca-4b09-a26a-9049061a2416: error getting device from group 53: Input/output error
Verify all devices in group 53 are bound to vfio- or pci-stub and not already in use

try to see if there are more devices in iommu group, seems not:

[root@node1 michael]# ls /sys/bus/mdev/devices/04492aab-6cf2-4b03-ab85-2381bf94c9c5/iommu_group/devices/
04492aab-6cf2-4b03-ab85-2381bf94c9c5
[root@node1 michael]# ls /sys/bus/mdev/devices/*/iommu_group/devices/
/sys/bus/mdev/devices/04492aab-6cf2-4b03-ab85-2381bf94c9c5/iommu_group/devices/:
04492aab-6cf2-4b03-ab85-2381bf94c9c5

/sys/bus/mdev/devices/1f1d02f7-d9ca-4b09-a26a-9049061a2416/iommu_group/devices/:
1f1d02f7-d9ca-4b09-a26a-9049061a2416

/sys/bus/mdev/devices/5c1d9150-c134-4cf6-a070-3e61b0ba4bfa/iommu_group/devices/:
5c1d9150-c134-4cf6-a070-3e61b0ba4bfa

/sys/bus/mdev/devices/7dd5ad5b-af5e-411d-88f3-2c6c827b2680/iommu_group/devices/:
7dd5ad5b-af5e-411d-88f3-2c6c827b2680