Git Product home page Git Product logo

Comments (19)

albydnc avatar albydnc commented on May 5, 2024

Since you have two, you should try to see if RoCE can work. It would be awesome to have an RDMA-enabled Pi and it would help removing the IRQ issue you had in the other videos.
If all of this work then I'll be pleased to see some infiniband network on it :)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@albydnc - Heh, I'll see what I can do!

from raspberry-pi-pcie-devices.

albydnc avatar albydnc commented on May 5, 2024

@geerlingguy let me know if you need some help, I work on infiniband and rdma

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
$ sudo lspci -vvvv
01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
	Subsystem: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at 600800000 (64-bit, non-prefetchable) [disabled] [size=1M]
	Region 2: Memory at 600000000 (64-bit, prefetchable) [disabled] [size=8M]
	[virtual] Expansion ROM at 600900000 [disabled] [size=1M]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] Vital Product Data
		Product Name: ConnectX-2 SFP+
		Read-only fields:
			[PN] Part number: MNPA19-XTR           
			[EC] Engineering changes: A2
			[SN] Serial number: MT1148X12321            
			[V0] Vendor specific: PCIe Gen2 x8    
			[RV] Reserved: checksum good, 0 byte(s) reserved
		Read/write fields:
			[V1] Vendor specific: N/A   
			[YA] Asset tag: N/A                             
			[RW] Read-write area: 105 byte(s) free
		End
	Capabilities: [9c] MSI-X: Enable- Count=128 Masked-
		Vector table: BAR=0 offset=0007c000
		PBA: BAR=0 offset=0007d000
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [148 v1] Device Serial Number 00-02-c9-03-00-53-00-fa

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
$ dmesg
...
[    1.217953] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.217995] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.218079] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x063fffffff -> 0x00c0000000
[    1.218178] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[    1.282343] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[    1.282710] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.282742] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.282770] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
[    1.282866] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.283113] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.286752] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    1.400521] pci 0000:01:00.0: [15b3:6750] type 00 class 0x020000
[    1.400803] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit]
[    1.400979] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x007fffff 64bit pref]
[    1.401254] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
[    1.402247] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link)
[    1.405844] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.405903] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x6007fffff 64bit pref]
[    1.405933] pci 0000:00:00.0: BAR 8: assigned [mem 0x600800000-0x6009fffff]
[    1.405965] pci 0000:01:00.0: BAR 2: assigned [mem 0x600000000-0x6007fffff 64bit pref]
[    1.406122] pci 0000:01:00.0: BAR 0: assigned [mem 0x600800000-0x6008fffff 64bit]
[    1.406275] pci 0000:01:00.0: BAR 6: assigned [mem 0x600900000-0x6009fffff pref]
[    1.406303] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.406334] pci 0000:00:00.0:   bridge window [mem 0x600800000-0x6009fffff]
[    1.406363] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x6007fffff 64bit pref]

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Trying this driver first: https://www.mellanox.com/products/ethernet-drivers/linux/mlnx_en

Screen Shot 2020-12-29 at 5 26 35 PM

$ wget http://www.mellanox.com/downloads/ofed/MLNX_EN-5.1-1.0.4.0/mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz
$ tar xvf mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz
$ cd mlnx-en-5.1-1.0.4.0-debian10.3-aarch64/
$ sudo ./install
Error: The current mlnx-en is intended for debian10.3

How unfortunate :P

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Digging through the installer, I found --skip-distro-check as an available option.

$ sudo ./install --skip-distro-check
System has one or more unsupported device, see below.
MLNX_OFED / mlnx_en 5.1 and above supports only ConnectX-4 or newer devices.
This device could become unavailable which might result in loss of connectivity.
Use --skip-unsupported-devices-check to skip this check.
Aborting.
* 01:00.0 Ethernet controller [0200]: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev b0)

No support for older cards? What madness! Let's try with:

$ sudo ./install --skip-distro-check --skip-unsupported-devices-check

Now it's attempting to install extra stuff:

Checking SW Requirements...
One or more required packages for installing mlnx-en are missing.
/lib/modules/5.10.3-v8+/build/scripts is required for the Installation.
Attempting to install the following missing packages:
autotools-dev graphviz autoconf chrpath linux-headers-5.10.3-v8+ dpatch lsof dkms m4 automake quilt debhelper swig libltdl-dev
Failed command: apt-get install -y autotools-dev graphviz autoconf chrpath linux-headers-5.10.3-v8+ dpatch lsof dkms m4 automake quilt debhelper swig libltdl-dev

Why don't all these device manufacturers account for Raspberry Pi OS 64-bit beta? ๐Ÿค”

Anyways, going to stop for now, and get back at it later. At least I have the card identified. It works through my 1x-to-16x adapter, but it wasn't showing if I tried powering it through my external adapter...

I also received a Noctua fan PWM controller today. Nice for my ears to not run the 12V fan at maximum speed all day :D

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@albydnc - I may ask for some help figuring out a good test / benchmark for RDMA, as I know a lot of people may be interested in whether the Pi can support it.

from raspberry-pi-pcie-devices.

albydnc avatar albydnc commented on May 5, 2024

@geerlingguy you can use the default benchmarks available with the mellanox driver: perfest.
You can also look at the source on GitHub. This is the optimal condition for testing the performance of the network, since the benchmarks are written using the C low-level API for RDMA (infinband verbs).
For a more general test, I suggest to use MPI tests, so you can compare easily various technologies; you will see a drop in performance, but it shouldn't be significant.

from raspberry-pi-pcie-devices.

mi-hol avatar mi-hol commented on May 5, 2024

Trying this driver first: https://www.mellanox.com/products/ethernet-drivers/linux/mlnx_en

Screen Shot 2020-12-29 at 5 26 35 PM
$ wget http://www.mellanox.com/downloads/ofed/MLNX_EN-5.1-1.0.4.0/mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz
$ tar xvf mlnx-en-5.1-1.0.4.0-debian10.3-aarch64.tgz
$ cd mlnx-en-5.1-1.0.4.0-debian10.3-aarch64/
$ sudo ./install
Error: The current mlnx-en is intended for debian10.3

How unfortunate :P

you lost me here :(
Why wouldn't the available debian10.0 driver have no chance of working?

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@mi-hol - It seems like that install script is a giant bash script that has a lot of points of entanglement where it's looking for exact strings in returned information. Pi OS, and especially Pi OS 64-bit beta, don't behave identically to Debian 10.3 / Debian 10.

The Ubuntu installer might have better success, but honestly, the drivers have a ton of warnings and checks and things that try to force you to use ConnectX-4 or later generation of cards... I'm thinking compiling in the kernel would be easier since it's not as preachy about making you buy the latest generation of card.

from raspberry-pi-pcie-devices.

albydnc avatar albydnc commented on May 5, 2024

So, Connectx2, while still interesting, are not something you'll want to waste your time on.
Mellanox has dropped driver support from ages and they miss the only interesting thing of Mellanox NICs, RDMA.
You should be able to get one Connectx3 on eBay for cheap and get all the nice modern features.
If you want to try it, I'm willing to buy it for you @geerlingguy

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@albydnc - I figured as much... and I would gladly take you up on that offer! If you can DM me on Twitter, or email me (my email is on my website about page), I can sort out the details. And I'll happily plug your Twitter/name/whatever in an eventual video I make on 10G networking on the Pi (whether or not I can get the X3 working! I already have the ASUS card going).

from raspberry-pi-pcie-devices.

ianfitchet avatar ianfitchet commented on May 5, 2024

Jeff, just to (re-)pique your interest in the Mellanox cards, I have the dual NIC versions of the same venerable beasties:

$lspci -nn | grep Mellanox
01:00.0 Ethernet controller [0200]: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev b0)

where CentOS 7 worked a treat but CentOS Stream dropped support, I discovered, when upgrading a couple of weeks ago. You should note that Linux, at least, uses the MLX4 driver for these parts.

In my case the issue was "as simple as" the drivers having GEN2 support #define'd out. I wrote some notes to self.

On my RPi4 running 64-bit there's barely support for any ethernet device:

$ls /lib/modules/$(uname -r)/kernel/drivers/net/ethernet/
microchip  qualcomm  wiznet

but there are some 4.x kernels lying about (no idea why) which do have full MLX4 support. In particular you can grep out this particular card (using the PCI vendor and product IDs from lspci -nn above):

$modinfo /lib/modules/4.19.0-16-arm64/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko | grep -i 15b3 | grep -i 6750
alias:          pci:v000015B3d00006750sv*sd*bc*sc*i*

So I'm going to guess that support is entirely feasible.

Until last week I'd not compiled anything kernel-y before but I guess the process is similar on the RPi.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@ianfitchet - Thanks! I'll keep that in mind next time I get back to this cardโ€”for now I'm switching my sights over to the ConnectX-3 I just got (see #143).

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Just tried with same freshly-compiled kernel I tested in #143 with a ConnectX-3 adapter, and getting the exact same error:

[   28.219483] mlx4_en: eth1: Link Up
[   43.997574] ------------[ cut here ]------------
[   43.997620] NETDEV WATCHDOG: eth1 (mlx4_core): transmit queue 0 timed out
[   43.997703] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x3a0/0x3a8
[   43.997710] Modules linked in: bnep hci_uart btbcm bluetooth ecdh_generic ecc mlx4_en 8021q garp stp llc vc4 brcmfmac cec brcmutil drm_kms_helper v3d cfg80211 gpu_sched bcm2835_codec(C) rfkill bcm2835_v4l2(C) drm bcm2835_isp(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common drm_panel_orientation_quirks raspberrypi_hwmon videodev mlx4_core vc_sm_cma(C) mc snd_bcm2835(C) i2c_brcmstb snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea rpivid_mem sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq uio nvmem_rmem aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[   43.998043] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G         C        5.10.39-v8+ #1
[   43.998050] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[   43.998062] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[   43.998071] pc : dev_watchdog+0x3a0/0x3a8
[   43.998078] lr : dev_watchdog+0x3a0/0x3a8
[   43.998085] sp : ffffffc0115bbd10
[   43.998092] x29: ffffffc0115bbd10 x28: ffffff804b0d3f40 
[   43.998108] x27: 0000000000000004 x26: 0000000000000140 
[   43.998124] x25: 00000000ffffffff x24: 0000000000000002 
[   43.998139] x23: ffffffc011286000 x22: ffffff804b0a03dc 
[   43.998154] x21: ffffff804b0a0000 x20: ffffff804b0a0480 
[   43.998168] x19: 0000000000000000 x18: 0000000000000000 
[   43.998183] x17: 0000000000000000 x16: 0000000000000000 
[   43.998198] x15: ffffffffffffffff x14: ffffffc011288948 
[   43.998213] x13: ffffffc01146ebd0 x12: ffffffc011315430 
[   43.998227] x11: 0000000000000003 x10: ffffffc0112fd3f0 
[   43.998242] x9 : ffffffc0100e5358 x8 : 0000000000017fe8 
[   43.998256] x7 : c0000000ffffefff x6 : 0000000000000003 
[   43.998270] x5 : 0000000000000000 x4 : 0000000000000000 
[   43.998285] x3 : 0000000000000103 x2 : 0000000000000102 
[   43.998299] x1 : 730045c0bcfb7500 x0 : 0000000000000000 
[   43.998314] Call trace:
[   43.998324]  dev_watchdog+0x3a0/0x3a8
[   43.998339]  call_timer_fn+0x38/0x200
[   43.998349]  run_timer_softirq+0x298/0x548
[   43.998358]  __do_softirq+0x1a8/0x510
[   43.998369]  irq_exit+0xe8/0x108
[   43.998378]  __handle_domain_irq+0xa0/0x110
[   43.998386]  gic_handle_irq+0xb0/0xf0
[   43.998393]  el1_irq+0xc8/0x180
[   43.998407]  arch_cpu_idle+0x18/0x28
[   43.998416]  default_idle_call+0x58/0x1d4
[   43.998427]  do_idle+0x25c/0x270
[   43.998437]  cpu_startup_entry+0x30/0x70
[   43.998448]  secondary_start_kernel+0x170/0x180
[   43.998456] ---[ end trace 257c7cb4ef196f12 ]---
[   43.998490] mlx4_en: eth1: TX timeout on queue: 0, QP: 0x208, CQ: 0x84, Cons: 0xffffffff, Prod: 0x1
[   44.046185] mlx4_en: eth1: Steering Mode 1
[   44.052169] mlx4_en: eth1: Link Down
[   46.301966] mlx4_en: eth1: Link Up
[   61.917527] mlx4_en: eth1: TX timeout on queue: 2, QP: 0x20a, CQ: 0x86, Cons: 0xffffffff, Prod: 0x1
[   61.949949] mlx4_en: eth1: Steering Mode 1
[   61.970419] mlx4_en: eth1: Link Down
[   64.379433] mlx4_en: eth1: Link Up

The lights flash, things seem to work, but it keeps re-connecting :(

$ ip a
...
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:02:c9:4e:e2:fa brd ff:ff:ff:ff:ff:ff
    inet 169.254.135.78/16 brd 169.254.255.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::25c8:7bfd:2254:dad4/64 scope link 
       valid_lft forever preferred_lft forever

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Marking this as done... can't find any way to get the thing working, unfortunately.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

(I have since confirmed these cards work fine in a few different PCs, though.)

from raspberry-pi-pcie-devices.

kainz avatar kainz commented on May 5, 2024

If you still have one of these cards around, turning off tx/fx flow control (pause frames) may work via ethtool -A DEVICE rx off tx off , which I've had to do for connectx2 cards on some PC installations that see a similar timeout & linkdown/up behavior. The card's also designed to work with multiple tx/rx queues to split traffic amongst CPUs. Maybe that's interfering with something as well and you could try using ethtool -l DEVICE and ethtool -L DEVICE rx RXCHNUM tx TXCHNUM to tweak the channel count.

Incidentally, I've also had these cards silently fail when I try to use a MTU larger than 4032 on ethernet, but IDK if you've done anything IRT that.

from raspberry-pi-pcie-devices.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.