Git Product home page Git Product logo

Comments (53)

geerlingguy avatar geerlingguy commented on May 5, 2024 7

@darkbasic - It's a hardware issue with the BCM2711 PCIe implementation that can't be changed, however it could be possible to work around the problem in software.

As with the MegaRAID driver, the problem stems from the fact that 64-bit PCIe accesses expect certain things to work certain ways—and they do, on other ARM64 devices, and on Intel/AMD64—but they don't work at all and crash the Pi. So in software you kind of have to hack around things just for the Pi if you want them to work on the Pi.

So far this seems to affect GPUs, Coral TPUs, and storage controllers the most—some of the newer or more advanced/complex cards.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 2

Just wanted to update here that I have been able to get much further on a Raspberry Pi 5; see https://www.jeffgeerling.com/blog/2023/testing-pcie-on-raspberry-pi-5

Right now it seems like a clock sync/retiming issue with the PCIe bus. I keep getting these messages:

[  362.772240] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  362.772252] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  362.772255] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[  362.772258] pcieport 0000:00:00.0:    [12] Timeout               
[  372.628183] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  372.628199] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  372.628204] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[  372.628209] pcieport 0000:00:00.0:    [12] Timeout               
[  373.268131] apex 0000:01:00.0: RAM did not enable within timeout (12000 ms)
[  373.268141] apex 0000:01:00.0: Error in device open cb: -110
[  373.268160] apex 0000:01:00.0: Apex performance not throttled due to temperature

One Pi engineer said a prototype Pi 5 board with a different PCIe connection worked with his Coral TPU, but he hasn't gotten it working with the standard FPC connection (there are timeouts when you try using the hardware).

Perhaps a better FFC will fix the issue, we'll see!

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 2

Blog post with full summary: A PCIe Coral TPU FINALLY works on Raspberry Pi 5.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

Following the default instructions for setting up the Coral M.2 PCIe card, I get the following kernel panic when the device manager starts:

IMG_3633

(No problem prior to installing the Coral packages gasket-dkms and libedgetpu1-std.)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

@StuartIanNaylor - I have tried at PCIe Gen 3.0, 2.0, and 1.0, and have encountered the exact same issue on all three speeds.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

Just as a note, I have heard the final version of the FFC (the little flat cable that goes from Pi 5 to a HAT or expansion board) will be impedance-controlled—the one I'm using in my testing is not.

It seems like that may solve a lot of the little issues I'm seeing, especially with the Coral and SATA storage controllers.

from raspberry-pi-pcie-devices.

StuartIanNaylor avatar StuartIanNaylor commented on May 5, 2024 1

Likely Radxa did the same but swapped out, similar as faults occur under load but seem ok at start up.

from raspberry-pi-pcie-devices.

mikegapinski avatar mikegapinski commented on May 5, 2024 1

@geerlingguy can you add the verbose flag to the example? Mirek has our Coral so I won't be able to test until next week.

from pycoral.pybind._pywrap_coral import SetVerbosity as set_verbosity
set_verbosity(10)

If it is anything mmap related just use sudo. I ended up not using Docker since I wanted to avoid any potential issues from it. You'd need to purge python that comes with raspbian and manually install version 3.9. It works normally

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

Tried 4k page sizes by switching from my custom kernel to kernel8. Edited /boot/config.txt and added kernel=kernel8.img. Rebooted, verified I was on the 4k kernel:

pi@pi5:~ $ uname -a
Linux pi5 6.1.0-rpi4-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.54-1+rpt2 (2023-10-05) aarch64 GNU/Linux

Then I rebuilt the DKMS using the oneliner: ls /var/lib/initramfs-tools | sudo xargs -n1 /usr/lib/dkms/dkms_autoinstaller start (without this, I don't have the kernel module for apex available).

Rebooted, and verified /dev/apex_0 was present (and dmesg logs are same, no warnings about interrupts).

Ran docker container and...

pi@pi5:~ $ sudo docker run -it --device /dev/apex_0:/dev/apex_0 coral /bin/bash
root@20dff9d9ed96:~# python3 /usr/share/edgetpu/examples/classify_image.py --model /usr/share/edgetpu/examples/models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --label /usr/share/edgetpu/examples/models/inat_bird_labels.txt --image /usr/share/edgetpu/examples/images/bird.bmp
---------------------------
Poecile atricapillus (Black-capped Chickadee)
Score :  0.44140625
---------------------------
Poecile carolinensis (Carolina Chickadee)
Score :  0.29296875

IT WORKS!!!

from raspberry-pi-pcie-devices.

mikegapinski avatar mikegapinski commented on May 5, 2024 1

KUDOS!!!!!!!!!

I knew it was something with the newer kernel, the old one was a 4K one too

from raspberry-pi-pcie-devices.

timonsku avatar timonsku commented on May 5, 2024

No worries, glad to see more people having an interest in seeing this working :)

from raspberry-pi-pcie-devices.

Valdiolus avatar Valdiolus commented on May 5, 2024

Great, looking forward to solve this issue in my project!

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

It has arrived! (Picture uploaded).

I also got a couple other goodies today, though, so I'm going to have to wait to start testing it at least a couple days :( (ah, if only time were infinite!).

from raspberry-pi-pcie-devices.

Earnest-Williams avatar Earnest-Williams commented on May 5, 2024

Surely time is infinite in Catholic canon. ;)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@Geofferic - Haha, well yes... but I'm imagining if I can make it to the point where it is indeed infinite—I don't think testing a Coral TPU is going to be my highest priority 😆

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
$ sudo lspci -vvvv -d 1ac1:089a
05:00.0 System peripheral: Device 1ac1:089a (prog-if ff)
	Subsystem: Device 1ac1:089a
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at 600900000 (64-bit, prefetchable) [disabled] [size=16K]
	Region 2: Memory at 600800000 (64-bit, prefetchable) [disabled] [size=1M]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
		Vector table: BAR=2 offset=00046800
		PBA: BAR=2 offset=00046068
	Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
	Capabilities: [108 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [110 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [200 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

(Also posted this over to google-coral/edgetpu#280 (comment)).

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Latest update from someone over in the Coral issue queue:

For now, the plan is to wait until the office is open so we can use a PCIe analyzer and confirm this hypothesis. But there doesn't appear to be any additional changes that we can do in SW - the device expecting a host to be able to perform 64-bit read/write is built into the hardware.

USB is still the recommendation for the CM4. USB2.0 is possible out of box, and USB3.0 may be possible although extra design considerations are required (more info here: https://coral.ai/products/accelerator-module/).

It looks doubtful (though still in the realm of possibility) the PCI Express version of the Coral TPU will work on the current generation of Raspberry Pi. Though I still wonder if it's a similar issue to the 64/32-bit discrepancy that Broadcom had to work around for the MegaRAID card.

If so, there's a possibility the driver could add a one-off to work around the PCIe limitation on the Pi 4, but it would be much nicer for Pi OS or the firmware to somehow make it work more as expected :P

from raspberry-pi-pcie-devices.

StuartIanNaylor avatar StuartIanNaylor commented on May 5, 2024

@geerlingguy the wavshare carriers have a M.2 which should fit a Coral B+M

I am confused at the '64/32-bit discrepancy' as why use 32bit raspiOS but what is the PCIe limitation of the Pi4?

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@StuartIanNaylor - Right now the Coral drivers don't seem to work on either 32-bit or 64-bit Pi OS. But note that the OS type is not always related to the weird issues you get on the PCI Express bus (though for some things, the 32-bit Pi OS behaves more consistently).

from raspberry-pi-pcie-devices.

wbreiler avatar wbreiler commented on May 5, 2024

@geerlingguy has this been resolved or is this still an issue?

from raspberry-pi-pcie-devices.

darkbasic avatar darkbasic commented on May 5, 2024

AFAIK it's an hardware issue which cannot be solved.

from raspberry-pi-pcie-devices.

wbreiler avatar wbreiler commented on May 5, 2024

Ah, alright

from raspberry-pi-pcie-devices.

darkbasic avatar darkbasic commented on May 5, 2024

That would be awesome, I'd love to be able to use an M.2 Coral TPU.

from raspberry-pi-pcie-devices.

grigio avatar grigio commented on May 5, 2024

Which is a reccomanded SoC or mobo compatible with Google Coral TPU M.2 Accelerator m.2 E-key ?

from raspberry-pi-pcie-devices.

Valdiolus avatar Valdiolus commented on May 5, 2024

Which is a reccomanded SoC or mobo compatible with Google Coral TPU M.2 Accelerator m.2 E-key ?

Try to use nxp - like one from Coral dev board. But the cost is quite high.

from raspberry-pi-pcie-devices.

vukitoso avatar vukitoso commented on May 5, 2024

Hello. Do you use heatsinks and fans for cooling "power IC (PMIC) and Edge TPU"?
Since the datasheet says that it is necessary to use cooling:
https://coral.ai/docs/m2/datasheet/
https://coral.ai/docs/m2-dual-edgetpu/datasheet/

If I plan to use "Coral Edge TPU m.2" for 24/7 operation, will I need cooling?

from raspberry-pi-pcie-devices.

aiden1989acw avatar aiden1989acw commented on May 5, 2024

I'd maybe recommend following as per description from your first linked datasheet, in section 5.3, possibly a dear to both lower and upper cooling of the TPU to provide maximum performance, especially considering you're wanting to run 24/7, also take special note to possible transient spikes in power up to 3A

from raspberry-pi-pcie-devices.

vukitoso avatar vukitoso commented on May 5, 2024

@aiden1989acw, thx.

spikes in power up to 3A

for m.2 boards at 3.3V

https://coral.ai/docs/m2/datasheet/ - 3.2 Power consumption

Although the average current drawn from the 3.3V supply is typically less than 500 mA, brief current transients that occur during inferencing can reach roughly 3 A. These spikes also occur suddenly: even a simple model can generate current transients in excess of 1 A/μs. However, these numbers are representative of only the models tested at Google, and your numbers will vary. To determine the actual peak supply current, you should observe the current when running the models you will deploy in production.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

This is still hope, however small: google-coral/edgetpu#280 (comment)

from raspberry-pi-pcie-devices.

jveitchmichaelis avatar jveitchmichaelis commented on May 5, 2024

Which is a reccomanded SoC or mobo compatible with Google Coral TPU M.2 Accelerator m.2 E-key ?

@grigio You can use the Jetson Nano with the M.2 module, see setup notes here: https://github.com/jveitchmichaelis/edgetpu-yolo/blob/main/hardware.md It works pretty well!

from raspberry-pi-pcie-devices.

UcefMountacer avatar UcefMountacer commented on May 5, 2024

what about this ?
https://pipci.jeffgeerling.com/boards_cm/tinycar-cm4-markus-kasten.html

from raspberry-pi-pcie-devices.

grigio avatar grigio commented on May 5, 2024

@jveitchmichaelis it's out of stock in Europe

from raspberry-pi-pcie-devices.

jveitchmichaelis avatar jveitchmichaelis commented on May 5, 2024

Yes unfortunately almost all the popular dev boards are out of stock due to the chip shortage/supply chain issues. If you just want to play with the module then the Coral Dev Board Minis are more available.

from raspberry-pi-pcie-devices.

StuartIanNaylor avatar StuartIanNaylor commented on May 5, 2024

@geerlingguy pcie3.0 or 2.0 as do you not neet to set back to 2.0?
Radxa tried FPC on the rock Pi4 at 1st but that was pcie2.0 x4 but that was the same until they made a revision.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

I have a somewhat cleaner PCIe signal now (more on that later), but I'm still seeing this when I try using the Coral:

[  337.156485] apex 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
[  337.156488] apex 0000:01:00.0:   device [1ac1:089a] error status/mask=000010c1/00006000
[  337.156491] apex 0000:01:00.0:    [ 0] RxErr                  (First)
[  337.156494] apex 0000:01:00.0:    [ 6] BadTLP                
[  337.156496] apex 0000:01:00.0:    [ 7] BadDLLP               
[  337.156498] apex 0000:01:00.0:    [12] Timeout               
[  337.156507] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  337.156511] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  337.156513] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[  337.156516] pcieport 0000:00:00.0:    [12] Timeout               
[  337.156726] apex 0000:01:00.0: Couldn't reinit interrupts: -28
[  337.156729] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  337.156734] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  337.156737] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[  337.156739] pcieport 0000:00:00.0:    [12] Timeout               
[  337.156744] apex 0000:01:00.0: Permission checking failed.

Earlier in boot:

[    2.797046] apex 0000:01:00.0: Couldn't initialize interrupts: -28

And this issue (google-coral/edgetpu#223) also mentions that could occur when you're on a system that doesn't support MSI-X interrupts...

Also see this related thread on the Odroid M1 forum.

I opened up a new topic on the Raspberry Pi forum: Coral TPU, PCIe on Pi 5

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

From jdb on the Pi forums:

There are two possible mappings for routing of message-signalled interrupts - the BCM root complex MSI target, and a MSI-x target external to the RC that is completely transparent to software. External MSI-x interrupts are translated into GIC SPIs of which there are limited number on the chip.

What happens if you swap the msi-parent target from &mip1 to &pcie1 in the devicetree?

https://github.com/raspberrypi/linux/blob/6137fb168c08bd8c41c8421bf26f09ed29479f08/arch/arm/boot/dts/bcm2712.dtsi#L1015C19-L1015C23

I haven't tested that yet, but I also pinged the folks over at Pineberry Pi (they sent over a few impedance-controlled FPC cables I can test with, they seem more reliable than the flat/straight cable I was using originally...), and @mikegapinski mentioned:

diff --git a/arch/arm64/configs/bcm2712_defconfig b/arch/arm64/configs/bcm2712_defconfig
index 8ad2775f5..ff2c619c7 100644
--- a/arch/arm64/configs/bcm2712_defconfig
+++ b/arch/arm64/configs/bcm2712_defconfig
@@ -452,9 +452,10 @@ CONFIG_RFKILL_INPUT=y
 CONFIG_NET_9P=m
 CONFIG_NFC=m
 CONFIG_PCI=y
+CONFIG_PCI_MSI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCIEAER=y
-CONFIG_PCIEASPM_POWERSAVE=y
+CONFIG_PCIEASPM_PERFORMANCE=y
 CONFIG_PCIE_DPC=y
 CONFIG_UEVENT_HELPER=y
 CONFIG_DEVTMPFS=y

PCIEASPM / CONFIG_PCIEASPM_POWERSAVE needs to be off. This will get you past all the pcie related errors in DMESG. Only errors I am getting now are from the driver itself after I run the classifier.

There could still be other bugs lurking, but that does make sense. There are some other PCIe devices that seem to barf when you run any kind of powersaving options, even on CM4.

I will test out these theories tomorrow (fingers crossed!).

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

If I just add pcie_aspm=off to /boot/cmdline.txt and reboot, it gets rid of the annoying PCIe Bus Error messages. This is after a few minutes booted, where normally there's a message every 5 seconds:

pi@pi5:~ $ dmesg | grep apex
[    2.800975] apex 0000:01:00.0: enabling device (0000 -> 0002)
[    2.801855] apex 0000:01:00.0: Couldn't initialize interrupts: -28
[    7.901248] apex 0000:01:00.0: Apex performance not throttled due to temperature

Can you just force performance with pcie_aspm.policy=powersave?

The Coral still gives me:

[  307.543226] apex 0000:01:00.0: Couldn't reinit interrupts: -28
[  307.543245] apex 0000:01:00.0: Permission checking failed.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

I'm now attempting to swap the msi-parent target from &mip1 to &pcie1 in the devicetree:

# Back up the current dtb
sudo cp /boot/firmware/bcm2712-rpi-5-b.dtb /boot/firmware/bcm2712-rpi-5-b.dtb.bak

# Decompile the current dtb (ignore warnings)
dtc -I dtb -O dts /boot/firmware/bcm2712-rpi-5-b.dtb -o ~/test.dts

# Edit the file
nano ~/test.dts

# Change the line: msi-parent = <0x2f>; (under `pcie@110000`)
# To: msi-parent = <0x66>;
# Then save the file.

# Recompile the dtb and move it back to the firmware directory
dtc -I dts -O dtb ~/test.dts -o ~/test.dtb
sudo mv ~/test.dtb /boot/firmware/bcm2712-rpi-5-b.dtb

I rebooted, and...

pi@pi5:~ $ dmesg | grep apex
[    2.867959] apex 0000:01:00.0: enabling device (0000 -> 0002)
[    7.901233] apex 0000:01:00.0: Apex performance not throttled due to temperature

That interrupt message is gone!

Trying out some inference...

# dmesg
[   67.872746] apex 0000:01:00.0: mapping size 0x1000 must be page aligned
[   67.872757] apex 0000:01:00.0: mapping size 0x1000 must be page aligned

# Python output
root@c10211e66ef2:~# python3 /usr/share/edgetpu/examples/classify_image.py --model /usr/share/edgetpu/examples/models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --label /usr/share/edgetpu/examples/models/inat_bird_labels.txt --image /usr/share/edgetpu/examples/images/bird.bmp
F :39] Attempting to fetch value instead of handling error Failed precondition: Could not map pages : 6 (Invalid argument)
Aborted (core dumped)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@mikegapinski - Regarding purging Python, how did you do that specifically? Every time I've tried messing with changing system Python versions on Debian I end up with a tangled mess and regret my decisions.

Edit; trying to compile an alternate install: https://www.linuxcapable.com/how-to-install-python-3-9-on-debian-linux/

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

That test was with

dtparam=pciex1_gen=1

I am now testing on:

dtparam=pciex1_gen=3

Got a few errors:

[   44.349265] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[   44.349277] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[   44.349281] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[   44.349284] pcieport 0000:00:00.0:    [12] Timeout 

(Repeated 7x). Everything worked fine though. Trying again on Gen 2.

No link errors on Gen 1 or Gen 2. (Note that I'm using the 100mm impedance-controlled FPC in this testing instead of the shorter one).

from raspberry-pi-pcie-devices.

mikegapinski avatar mikegapinski commented on May 5, 2024

I had issues with it on gen3, but it is not a gen3 device so it does not really matter. I have a feeling that Gen3 will be tricky on the pi5 on something other than the NVMe drive from the software side

from raspberry-pi-pcie-devices.

mhaligowski avatar mhaligowski commented on May 5, 2024

@geerlingguy congrats on running the Coral! I'm more on the ML side and not embedded systems, so I'm not sure if I understand correctly what's going on here, there's no way of connecting an M.2 device to RPi 5 without the kinda ridiculous chain of hardware? I'm looking into https://coral.ai/products/m2-accelerator-dual-edgetpu/ for now, do I understand correctly that I need a dedicated HAT?

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@mhaligowski - You would need to either use an M.2 A+E key HAT (like the AiHat from Pineberry), or adapters with an M-key HAT, or a USB Coral.

from raspberry-pi-pcie-devices.

mhaligowski avatar mhaligowski commented on May 5, 2024

Just saw the AiHat, it's exactly what I need! Even more excited since I'm Polish living in the US, so proud of being able to support business from PL!

from raspberry-pi-pcie-devices.

IuliuNovac avatar IuliuNovac commented on May 5, 2024

Just saw the AiHat, it's exactly what I need! Even more excited since I'm Polish living in the US, so proud of being able to support business from PL!

I have the Ai Hat and can confirm it works, you just need to follow this guide https://www.jeffgeerling.com/blog/2023/pcie-coral-tpu-finally-works-on-raspberry-pi-5 .

The part that caught me off guard was the change in dtb, since without it you get the access denied.

Seems to be working fine with c++, haven't tried python tho.

from raspberry-pi-pcie-devices.

Skillnoob avatar Skillnoob commented on May 5, 2024

Has anyone gotten the coral usb accelerator working yet?
I've run into massive amounts of issues trying to get it working and had no success so far.
I've posted about the issue on the tensorflow issue page but the issue is quite stale:
tensorflow/tensorflow#62371

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@Skillnoob that would be a different issue (this issue is just for the A+E key PCIe accelerator). I have successfully used the USB accelerator with Frigate on two different Pi 4s and two different Pi 5s now, so at a hardware level it should be fine. Note that Python libraries are all over the place, and you have to use older Python versions to get it working with Python.

from raspberry-pi-pcie-devices.

Skillnoob avatar Skillnoob commented on May 5, 2024

@geerlingguy sorry about that.
I just haven't found any resource to get it working yet and I've tried the recommended python 3.9 without it working.
(the example from pycoral tflite also segfaults)

from raspberry-pi-pcie-devices.

mikegapinski avatar mikegapinski commented on May 5, 2024

If you do not have a PCIe switch and a special board that connects the second lane it'll work as a standard single TPU

from raspberry-pi-pcie-devices.

r3po-1s-Tr3e avatar r3po-1s-Tr3e commented on May 5, 2024

Blog post with full summary: A PCIe Coral TPU FINALLY works on Raspberry Pi 5.

@geerlingguy , amazing guide. But we have been facing a problem, we are not able to install pcie gasket drivers after changing the device tree settings. Rpi is not able to update its kernel headers after changing the drive tree settings.
I have tried downgrading rpi os from 6.6 to 6.1, but still it dosent work.
if i dont change the device tree settings, the gasket drivers are installing correctly, but then the image classification sample thorws an error that it cant access dev/apex_0

Harwdare: Raspberry pi 5 4gb, nvme base for rpi, coral m.2 tpu b+m key

any insight will be appreciated, thanks!

from raspberry-pi-pcie-devices.

mikegapinski avatar mikegapinski commented on May 5, 2024

@r3po-1s-Tr3e follow this thread, it has all the info for Coral. I have not tested the B+M recently but A+E works ok so this one should too

https://gist.github.com/dataslayermedia/714ec5a9601249d9ee754919dea49c7e?permalink_comment_id=4989560#gistcomment-4989560

from raspberry-pi-pcie-devices.

r3po-1s-Tr3e avatar r3po-1s-Tr3e commented on May 5, 2024

@mikegapinski It worked! I changed a few things, if anyone is intrested, here are the details:

Hardware:
Raspberry pi 5 4gb
Coral TPU M.2 Accelerator B+M key
NVMe Base PCIe extension HAT for Raspberry pi 5
OS: 6.1.0-rpi4-rpi

Procedure which i followed (I ommited a few steps from Jeff Geerling's guide and changed sequence of action):

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update

sudo apt-get install gasket-dkms libedgetpu1-std

sudo sh -c "echo 'SUBSYSTEM=="apex", MODE="0660", GROUP="apex"' >> /etc/udev/rules.d/65-apex.rules"
sudo groupadd apex
sudo adduser $USER apex

Now REBOOT, after reboot check if TPU is detected by:
lspci -nn | grep 089a #Output: 03:00.0 System peripheral: Device 1ac1:089a
Check if pcie driver is loaded:
ls /dev/apex_0 #Output: /dev/apex_0

If i tried to follow these steps after changing the device tree, I would have not been able to install the gasket and pcie drivers because of not being able to install headers.
Further steps:

echo "kernel=kernel8.img" | sudo tee -a /boot/config.txt
Omitted other steps regarding config changes which were mentioned in the guide
Then changed the Device tree settings: https://www.jeffgeerling.com/blog/2023/how-customize-dtb-device-tree-binary-on-raspberry-pi
(NOTE: If you are on kernel version 6.6, which is currently the latest, change the msi-parent settings to 0x67 instead of 0x66. Source: https://gist.github.com/dataslayermedia/714ec5a9601249d9ee754919dea49c7e?permalink_comment_id=4989560#gistcomment-4989560)

Installed docker:
curl -sSL https://get.docker.com | sh

Then followed the steps from : https://www.jeffgeerling.com/blog/2023/testing-coral-tpu-accelerator-m2-or-pcie-docker to create and run docker image

tested by running the sample image classifier script of tpu

from raspberry-pi-pcie-devices.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.