Git Product home page Git Product logo

Comments (26)

geerlingguy avatar geerlingguy commented on May 5, 2024 2

@vegedb - Right now my goal is to get any card working, and some seem to have a better chance than others :D

Once we can prove that some card actually works, then at that point, I'll start thinking about use cases like media transcoding, AI/ML, gaming, etc.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

So... trying this again today as I thought I'd be wrapping up work on a video for Friday, but as always, things behave differently if you look at them sideways—in this case, I have a camera on it, so it's doing different things than it did late last year.

This time around, since there the BAR space issue was resolved in newer Pi OS kernels, and since the 64-bit Pi OS now has a proper headers package available that can be installed via apt instead of by compiling things by myself, I'm taking another stab at installing Nvidia's proprietary AARCH64 latest driver from https://www.nvidia.com/en-us/drivers/unix/linux-aarch64-archive/

First I flashed my Pi's drive with the 64-bit beta release, then I booted it and ran:

sudo apt-get update
sudo apt-get -y dist-upgrade
sudo apt-get install -y raspberrypi-kernel-headers
sudo reboot

If running an X server (if you're logged into a GUI), and you can't log out from it, run from SSH / terminal: sudo systemctl stop lightdm. Nvidia's driver can't be installed while an X server is running.

I copied Nvidia's driver .run file to the Pi, then I ran:

chmod +x NVIDIA-Linux-aarch64-450.119.03.run
sudo ./NVIDIA-Linux-aarch64-450.119.03.run

After a reboot, everything seemed to be coming up and then, about 30 seconds later, before X logged me in, I got the following via dmesg—but thankfully the whole system didn't lock up, and I could still access the Pi via SSH.

[   39.313959] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b7
[   39.313974] Mem abort info:
[   39.313983]   ESR = 0x96000005
[   39.313995]   EC = 0x25: DABT (current EL), IL = 32 bits
[   39.314003]   SET = 0, FnV = 0
[   39.314012]   EA = 0, S1PTW = 0
[   39.314019] Data abort info:
[   39.314027]   ISV = 0, ISS = 0x00000005
[   39.314036]   CM = 0, WnR = 0
[   39.314047] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000483fe000
[   39.314056] [00000000000000b7] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[   39.314096] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[   39.314102] Modules linked in: rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc 8021q garp stp llc nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) brcmfmac brcmutil vc4 cec cfg80211 v3d bcm2835_v4l2(C) bcm2835_codec(C) bcm2835_isp(C) raspberrypi_hwmon gpu_sched rfkill videobuf2_vmalloc drm_kms_helper bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_dma_contig snd_bcm2835(C) videobuf2_memops videobuf2_v4l2 videobuf2_common vc_sm_cma(C) videodev drm mc drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd rpivid_mem syscopyarea sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq uio i2c_dev aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[   39.314353] CPU: 2 PID: 528 Comm: Xorg Tainted: P         C O      5.10.17-v8+ #1403
[   39.314358] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[   39.314368] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[   39.318042] pc : _nv036670rm+0x0/0x110 [nvidia]
[   39.321855] lr : _nv029673rm+0x11c/0x9a0 [nvidia]
[   39.321862] sp : ffffffc012a4b500
[   39.321867] x29: ffffffc012a4b500 x28: ffffff8047215808 
[   39.321879] x27: ffffff8048754008 x26: ffffff8047215808 
[   39.321890] x25: ffffff8043c8c008 x24: ffffff8043c8c008 
[   39.321900] x23: ffffff8047215808 x22: 0000000000000000 
[   39.321911] x21: ffffff8048754008 x20: ffffff8047216008 
[   39.321922] x19: ffffffc00a1b4000 x18: 0000000000800000 
[   39.321932] x17: ffffffc0095de650 x16: ffffffc0095de718 
[   39.321943] x15: ffffffc0095defe0 x14: ffffffc0095df0f8 
[   39.321953] x13: ffffffc0095df128 x12: ffffffc0113d3a80 
[   39.321964] x11: 0000000000019560 x10: 0000000000000000 
[   39.321974] x9 : ffffffc0091c4b1c x8 : 0000000000000000 
[   39.321985] x7 : 0000000000000001 x6 : ffffffc012a4b450 
[   39.321995] x5 : 0000000000000000 x4 : ffffffc009535500 
[   39.322006] x3 : ffffffc009539278 x2 : 0000000000000001 
[   39.322016] x1 : ffffffc0097ba228 x0 : ffffff8047216008 
[   39.322028] Call trace:
[   39.325874]  _nv036670rm+0x0/0x110 [nvidia]
[   39.329692]  _nv029705rm+0x1c4/0x2e0 [nvidia]
[   39.333476]  _nv029672rm+0x5c/0x238 [nvidia]
[   39.337292]  _nv030359rm+0x90/0x130 [nvidia]
[   39.341076]  _nv009458rm+0x5c/0x790 [nvidia]
[   39.344901]  _nv019777rm+0xb8/0x1a8 [nvidia]
[   39.348712]  _nv020023rm+0x28/0x78 [nvidia]
[   39.352566]  _nv000732rm+0xee4/0x19e0 [nvidia]
[   39.356416]  rm_init_adapter+0xa8/0xb8 [nvidia]
[   39.360232]  nv_open_device+0x420/0x6e8 [nvidia]
[   39.364044]  nvidia_open+0x100/0x3a0 [nvidia]
[   39.367851]  nvidia_frontend_open+0x74/0xc0 [nvidia]
[   39.367873]  chrdev_open+0xb0/0x1a8
[   39.367883]  do_dentry_open+0x134/0x398
[   39.367892]  vfs_open+0x34/0x40
[   39.367900]  path_openat+0xa24/0xe20
[   39.367907]  do_filp_open+0x84/0x100
[   39.367915]  do_sys_openat2+0x1f8/0x2a8
[   39.367921]  do_sys_open+0x60/0xa8
[   39.367927]  __arm64_sys_openat+0x2c/0x38
[   39.367939]  el0_svc_common.constprop.2+0xac/0x1d0
[   39.367947]  do_el0_svc+0x2c/0x98
[   39.367956]  el0_svc+0x20/0x30
[   39.367964]  el0_sync_handler+0x90/0xb8
[   39.367970]  el0_sync+0x174/0x180
[   39.367986] Code: a94363b7 17ffffbc 52800580 17ffffbc (3942d844) 
[   39.367997] ---[ end trace e68ea2ca7b20909f ]---

The attached display (HDMI0 on Pi) showed a solid cursor, so it seems the display system may have locked up still.

from raspberry-pi-pcie-devices.

jamesy0ung avatar jamesy0ung commented on May 5, 2024 1

Nice, thanks Jeff!

from raspberry-pi-pcie-devices.

vegedb avatar vegedb commented on May 5, 2024

This is awesome, however its more interesting to have Quadro P400 for plex/emby servers. It's the cheapest Quadro for transcoding costing around 50-100 on ebay. I feel that if this works, it will be the ultimate low power plex server.

Benefits,

Power
P400 = 30W
GTX 750 TI = 60-75W

Codecs
P400 = 6th Gen NVENC (Supports X265)
GTX 750 TI = 4th Gen NVENC

Screenshots
P400
image

GTX 750 TI
image

4k Streams
image

Additional info
https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new
https://www.elpamsoft.com/?p=Plex-Hardware-Transcoding

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750 Ti] (rev a2) (prog-if 00 [VGA controller])
	Subsystem: eVga.com. Corp. GM107 [GeForce GTX 750 Ti]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at 618000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Region 1: Memory at 600000000 (64-bit, prefetchable) [disabled] [size=256M]
	Region 3: Memory at 610000000 (64-bit, prefetchable) [disabled] [size=32M]
	Region 5: I/O ports at <unassigned> [disabled]
	[virtual] Expansion ROM at 619000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [250 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [258 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=262144ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [128 v1] Power Budgeting <?>
	Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900 v1] #19

01:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
	Subsystem: eVga.com. Corp. Device 3751
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin B routed to IRQ 0
	Region 0: Memory at 619080000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: [60] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [78] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
[    1.257396] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.260080] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.262848] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x063fffffff -> 0x00c0000000
[    1.265605] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0200000000
[    1.314966] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[    1.317945] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.320502] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.323035] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
[    1.325717] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.328558] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.334679] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    1.337626] pci 0000:01:00.0: [10de:1380] type 00 class 0x030000
[    1.340201] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[    1.342879] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
[    1.345534] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[    1.348120] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
[    1.350701] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    1.353566] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)
[    1.356323] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.359054] pci 0000:01:00.1: [10de:0fbc] type 00 class 0x040300
[    1.361747] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[    1.368030] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.370696] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x617ffffff 64bit pref]
[    1.373239] pci 0000:00:00.0: BAR 8: assigned [mem 0x618000000-0x6197fffff]
[    1.375813] pci 0000:01:00.0: BAR 1: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    1.378416] pci 0000:01:00.0: BAR 3: assigned [mem 0x610000000-0x611ffffff 64bit pref]
[    1.380948] pci 0000:01:00.0: BAR 0: assigned [mem 0x618000000-0x618ffffff]
[    1.383428] pci 0000:01:00.0: BAR 6: assigned [mem 0x619000000-0x61907ffff pref]
[    1.385888] pci 0000:01:00.1: BAR 0: assigned [mem 0x619080000-0x619083fff]
[    1.388276] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
[    1.390562] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
[    1.392998] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.395355] pci 0000:00:00.0:   bridge window [mem 0x618000000-0x6197fffff]
[    1.397746] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[    1.400377] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

That new BAR space increase must've landed in the kernel I built, because I didn't have to manually tweak the BAR space anymore!

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Downloading the proprietary Nvidia AARCH64 Driver first: https://www.nvidia.com/en-us/drivers/unix/linux-aarch64-archive/

$ sudo ./NVIDIA-Linux-aarch64-460.27.04.run
ERROR: Unable to find the kernel source tree for the currently running kernel...

But it exists, inside /usr/src/linux-headers-5.10.1-v8+ (in my case). So I ran:

$ sudo ./NVIDIA-Linux-aarch64-460.27.04.run --kernel-source-path /usr/src/linux-headers-5.10.1-v8+

(Just noting that I had previously run the gist to compile kernel headers for 64-bit Pi OS as directed in this comment: #40 (comment)).

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Full log with the errors in the build: https://gist.github.com/geerlingguy/33539fd16a1b2ec7cabc6d86d0e75cd9

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Trying a cross-compile with the Nouveau driver (inside menuconfig, under Device Drivers > Graphics support).

After copying the files, I created /etc/modprobe.d/blacklist-nouveau.conf with the contents:

blacklist nouveau

And after reboot I'll see what happens when I try loading the module with sudo modprobe nouveau.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
$ dmesg --follow
[   57.389199] pci 0000:00:00.0: enabling device (0000 -> 0002)
[   57.389235] nouveau 0000:01:00.0: enabling device (0000 -> 0002)
[   57.389369] nouveau 0000:01:00.0: NVIDIA GM107 (117000a2)
[   57.651947] nouveau 0000:01:00.0: bios: version 82.07.55.00.29
[   59.761566] nouveau 0000:01:00.0: fb: 2048 MiB GDDR5
[   59.761591] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 3e6684 [ IBUS ]
...

And as with all the other video cards, the entire Pi just completely locks up at this point, being unresponsive to input or to any remote commands. Even the little flashing cursor at the CLI prompt stops flashing, a complete system halt.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Someone else with that exact same fault: https://bugzilla.kernel.org/show_bug.cgi?id=202731 — but I'm guessing it could be like with the Radeon 5350, where it's actually failing somewhere else.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Also in https://bugs.launchpad.net/nouveau/+bug/1684123 (migrated to https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/335).

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Trying now with a powered external riser. And a note:

  • I tried the PCE164P-NO3 ver 888, and it resulted in a kernel panic (no boot). Tried 5x before giving up.
  • I tried the PCE164P-NO6 ver 008S, and it resulted in kernel panics (no boot). Tried 3x.
  • I tried the PCE164P-NO3 ver 006, and it resulted in kernel panics if I used the USB 3 cable it came with, but if I swapped in the beefier USB 3 cable that came with my ver 888 board, it booted.

Go figure. Cheap junk doesn't work wonderfully. And the 888 board somehow fried my 2.5G network adapter yesterday, wish I had that on video. Lots of smoke!

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

With the ver 006 riser, after I run sudo modprobe nouveau, I get:

[  172.199242] pci 0000:00:00.0: enabling device (0000 -> 0002)
[  172.199279] nouveau 0000:01:00.0: enabling device (0000 -> 0002)
[  172.199411] nouveau 0000:01:00.0: NVIDIA GM107 (117000a2)
[  172.461834] nouveau 0000:01:00.0: bios: version 82.07.55.00.29

And then the system freezes completely. Watching with an external display on the console, I do see a full kernel panic. How can I get that output in text form? Here's what it looks like on the diminutive screen I have hooked up:

IMG_3132

Edit I was hopeful I could use the Ubuntu Linux Crash Dump Guide, but that package is not available on the Pi.

from raspberry-pi-pcie-devices.

6by9 avatar 6by9 commented on May 5, 2024

Watching with an external display on the console, I do see a full kernel panic. How can I get that output in text form?

UART like https://www.amazon.co.uk/Serial-Converter-Adapter-Prolific-Windows-Black/dp/B08DKM6Q63/ref=sr_1_3 on GPIOs 14&15 + GND, and configured as a console (use raspi-config).
It does depend on how quickly the full kernel gets killed as it takes a little while to dribble everything out at 115200baud.

There is also a kernel config option NOUVEAU_DEBUG that can be set with menuconfig or similar. Crank it up before doing your cross-compile to get lots of debug out. That was the next step I was intending to do in my conversation on the nouveau kernel mailing list, but other things became the priority :-/

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@6by9 - Thanks; it seems like that's the long-term route to go, but seeing as I don't have one of those cables handy and have everything plugged in today, I was thinking maybe I could use netconsole...

I added the following in /boot/cmdline.txt:

And I made sure my Mac (142 / target) was running netcat on UDP port 6666:

nc -u -l 6666

The Pi would boot, but I don't see any output making its way to my Mac. I confirmed if I ran nc -u 10.0.100.142 6666 on the Pi directly and typed in text, it made it across to my Mac.

Is netconsole not supported on Pi OS?

I'll also crank up NOUVEAU_DEBUG and see what that gets me.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Set debug levels to max:

Screen Shot 2020-12-23 at 12 37 23 PM

Pushing the code over to the Pi now...

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Here's the entire dump of NOUVEAU output: https://gist.github.com/geerlingguy/7a021f3fecf198bf7020b85244e772ee

But I suspect it just got cut off at the kernel panic with more output queued up in a buffer like the Radeon did when we were debugging it. It seems like it was in the middle of the devinit process, as the last main section was:

[  182.675218] nouveau 0000:01:00.0: sw: preinit running...
[  182.675223] nouveau 0000:01:00.0: sw: preinit completed in 0us
[  182.675233] nouveau 0000:01:00.0: devinit: running init tables
...

Each time I try, it gets to a different line before the output is cut off.

@6by9 - I ordered AdaFruit's serial cable, from Amazon instead of direct, just because Amazon ships tomorrow :D

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Received the ADAFRUIT Industries 954 USB-to-TTL Serial Cable today from Amazon (less than 18 hours after ordering it!).

  1. Install the Silicon Labs driver for macOS.

  2. Connect the cable to the outside GPIO pins: black to ground (3rd pin), white to UART0_TXD (4th pin), green to UART0_RXD (5th pin).

  3. Connect the USB end to your Mac.

  4. Add enable_uart=1 to the bottom of /boot/config.txt, and reboot.

  5. Check serial ports on the Mac:

    $ ls /dev/cu.*
    /dev/cu.Bluetooth-Incoming-Port /dev/cu.SLAB_USBtoUART          /dev/cu.usbserial-0001
    
  6. Connect using screen: screen /dev/cu.SLAB_USBtoUART 115200

  7. Start working on the Pi like you would over SSH or via keyboard locally.

  8. (Hit CTRL+a and then 'k' to kill the session, or 'd' to just detach but leave it running.)

In my case, I ran: sudo modprobe nouveau, and this is the result:

[  113.322557] SError Interrupt on CPU1, code 0xbf000002 -- SError
[  113.322559] CPU: 1 PID: 598 Comm: modprobe Tainted: G         C        5.10.2-v8+ #1
[  113.322561] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[  113.322563] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[  113.322564] pc : init_rd32+0xf8/0x330 [nouveau]
[  113.322566] lr : init_rd32+0x58/0x330 [nouveau]
[  113.322567] sp : ffffffc011f8b350
[  113.322568] x29: ffffffc011f8b350 x28: ffffff8042889000 
[  113.322573] x27: ffffff804a06c238 x26: ffffffc011f8b808 
[  113.322577] x25: 0000000000000002 x24: 0000000000021328 
[  113.322580] x23: 0000000000000008 x22: ffffff8045ed3800 
[  113.322583] x21: ffffff8044c89400 x20: ffffffc011f8b4c0 
[  113.322586] x19: 0000000000021328 x18: 0000000000000030 
[  113.322590] x17: 0000000000000000 x16: 0000000000000000 
[  113.322593] x15: ffffffffffffffff x14: 3078302026205d38 
[  113.322596] x13: 323331323078305b x12: ffffffc0112a2ff8 
[  113.322599] x11: 0000000000000003 x10: ffffffc01125a6b8 
[  113.322602] x9 : ffffffc0091fd018 x8 : 0000000000005c88 
[  113.322606] x7 : c0000000fffff3db x6 : ffffffc011f8af60 
[  113.322609] x5 : ffffff80fb7a58e0 x4 : 0000000000000000 
[  113.322612] x3 : 0000000000000000 x2 : 00000000deaddead 
[  113.322615] x1 : ffffffc015000000 x0 : ffffffc015021328 
[  113.322619] Kernel panic - not syncing: Asynchronous SError Interrupt
[  113.322621] CPU: 1 PID: 598 Comm: modprobe Tainted: G         C        5.10.2-v8+ #1
[  113.322622] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[  113.322623] Call trace:
[  113.322625]  dump_backtrace+0x0/0x1b8
[  113.322626]  show_stack+0x20/0x70
[  113.322627]  dump_stack+0xf0/0x158
[  113.322629]  panic+0x18c/0x38c
[  113.322630]  nmi_panic+0x6c/0xa0
[  113.322631]  arm64_serror_panic+0x7c/0x90
[  113.322632]  do_serror+0x38/0x98
[  113.322634]  el1_error+0x84/0x104
[  113.322635]  init_rd32+0xf8/0x330 [nouveau]
[  113.322637]  init_condition_met+0xc0/0x150 [nouveau]
[  113.322638]  init_condition+0x64/0xe0 [nouveau]
[  113.322640]  nvbios_exec+0x5c/0x120 [nouveau]
[  113.322641]  init_sub_direct+0x98/0x160 [nouveau]
[  113.322642]  nvbios_exec+0x5c/0x120 [nouveau]
[  113.322644]  nvbios_post+0xac/0x180 [nouveau]
[  113.322645]  nv04_devinit_post+0x1c/0x28 [nouveau]
[  113.322647]  nvkm_devinit_post+0x40/0xb8 [nouveau]
[  113.322648]  nvkm_device_init+0xd4/0x230 [nouveau]
[  113.322649]  nvkm_udevice_init+0x68/0xa0 [nouveau]
[  113.322651]  nvkm_object_init+0x64/0x198 [nouveau]
[  113.322652]  nvkm_ioctl_new+0x1a4/0x288 [nouveau]
[  113.322653]  nvkm_ioctl+0xd4/0x278 [nouveau]
[  113.322655]  nvkm_client_ioctl+0x18/0x28 [nouveau]
[  113.322656]  nvif_object_ioctl+0x5c/0x70 [nouveau]
[  113.322658]  nvif_object_ctor+0xcc/0x160 [nouveau]
[  113.322659]  nvif_device_ctor+0x30/0x78 [nouveau]
[  113.322660]  nouveau_cli_init+0x168/0x568 [nouveau]
[  113.322662]  nouveau_drm_device_init+0x88/0x898 [nouveau]
[  113.322663]  nouveau_drm_probe+0x15c/0x1f8 [nouveau]
[  113.322665]  pci_device_probe+0xc0/0x190
[  113.322666]  really_probe+0xec/0x3b8
[  113.322667]  driver_probe_device+0x60/0xc0
[  113.322669]  device_driver_attach+0x7c/0x88
[  113.322670]  __driver_attach+0x60/0xe8
[  113.322671]  bus_for_each_dev+0x7c/0xd0
[  113.322672]  driver_attach+0x2c/0x38
[  113.322674]  bus_add_driver+0x194/0x1f8
[  113.322675]  driver_register+0x6c/0x128
[  113.322676]  __pci_register_driver+0x4c/0x58
[  113.322678]  nouveau_drm_init+0x180/0x1000 [nouveau]
[  113.322679]  do_one_initcall+0x54/0x2c8
[  113.322680]  do_init_module+0x60/0x240
[  113.322682]  load_module+0x1f20/0x2160
[  113.322683]  __do_sys_finit_module+0xbc/0xf8
[  113.322684]  __arm64_sys_finit_module+0x28/0x38
[  113.322686]  el0_svc_common.constprop.2+0x9c/0x1a8
[  113.322687]  do_el0_svc+0x2c/0x98
[  113.322688]  el0_svc+0x20/0x30
[  113.322690]  el0_sync_handler+0x90/0xb8
[  113.322691]  el0_sync+0x158/0x180
[  113.322709] SMP: stopping secondary CPUs
[  113.322710] Kernel Offset: disabled
[  113.322712] CPU features: 0x0240022,61002000
[  113.322713] Memory Limit: none

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Digging around quite a bit, I found little clues in a few places:

Glancing especially at that last result, I tried setting nouveau.runpm=0 in /boot/cmdline.txt, and rebooted, but same result. Seems like init_rd32 is the final culprit?

static u32
init_rd32(struct nvbios_init *init, u32 reg)
{
	struct nvkm_device *device = init->subdev->device;
	reg = init_nvreg(init, reg);
	if (reg != ~0 && init_exec(init))
		return nvkm_rd32(device, reg);
	return 0x00000000;
}

From: https://github.com/raspberrypi/linux/blob/rpi-5.10.y/drivers/gpu/drm/nouveau/nvkm/subdev/bios/init.c#L181-L189

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

I might also try the proprietary driver once more for fun on the latest precompiled 64-bit kernel, since kernel headers are now available via package and don't have to be built from source.

from raspberry-pi-pcie-devices.

jamesy0ung avatar jamesy0ung commented on May 5, 2024

I might also try the proprietary driver once more for fun on the latest precompiled 64-bit kernel, since kernel headers are now available via package and don't have to be built from source.

Hey how is it going? I have a GT 730 from the same generation (last time I checked) and I'm hoping it will work

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Finding similar reports here: gnab/rtl8812au#92, and here: raspberrypi/linux#3222

Though it could be a variety of things, I'm really thinking the driver is doing something like memcopy() and it's breaking similar to how it broke on the RX 550, and on the Broadcom MegaRAID (the latter of which we got fixed/patched to work around the memory addressing limitations on 64-bit Pi OS and a PCIe bug (see: raspberrypi/linux#4158) — though I'm not sure if the PCIe thing is the problem here.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

From @uruhakomachin on Twitter:

@geerlingguy I had some tries using Nvidia graphics cards with the official arm64 driver. I had the same/quite similar error as you got, RmInitAdapter failed! (0x25:0x54:1262).

So I checked the rm_init_adapter function, and it seemed that the driver tried to init the card with some probably x86 code. 0x400000 is the usual base address for 32-bit applications. Given other asm code here, this func seems to save some regs and then load/exec some init code from the graphics card.

Also, this kind of error has been existed for a long time. Probably have to wait for NVIDIA to fix it...

For now, I think open source drivers have more chances to succeed

uruhakomachin_2021-Apr-24

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Going to see what happens with nouveau on Pi 5.

The proprietary drivers weren't built for arm64 (only amd64) back when this card was supported...

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
pi@pi5:~ $ dmesg | grep nouveau
[    3.121503] nouveau 0000:01:00.0: enabling device (0000 -> 0002)
[    3.121599] nouveau 0000:01:00.0: unknown chipset (ffffffff)

from raspberry-pi-pcie-devices.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.