Git Product home page Git Product logo

Comments (78)

GreenReaper avatar GreenReaper commented on May 5, 2024 7

MSI-X is showing as disabled in dmsg output above, and that may be why interrupts aren't being distributed.

The single address used by original MSI was found to be restrictive for some architectures. In particular, it made it difficult to target individual interrupts to different processors, which is helpful in some high-speed networking applications. MSI-X allows a larger number of interrupts and gives each one a separate target address and data word.

More specifically:

MSI-X interrupts are the preferred method, especially for NICs that support multiple RX queues. This is because each RX queue can have its own hardware interrupt assigned, which can then be handled by a specific CPU (with irqbalance or by modifying /proc/irq/IRQ_NUMBER/smp_affinity).

It is likely that the PCIe interface is to blame. This commit suggests it does not support MSI-X:

  • Last MSI cleanups, notably removing MSIX flag

This is possibly because the interface is PCIe Gen2 and per kernel documentation:

The MSI-X capability was also introduced with PCI 3.0.

But also, the Broadcom PCIe controller's driver does not even support Multi Message MSI (though the hardware does) which would be indicated by MSI_FLAG_MULTI_PCI_MSI. I think this means that you are limited to one interrupt, and thus one CPU core handling it. If that core is waiting for the PCIe bus, it might go to 100%.

Honestly I think you are doing pretty well already, considering PCIe 2.0 is 5GT/sec and that is reduced by the 8b/10b encoding to 4Gbps. It is unlikely that this maximum can be practically achieved, given data link and transport framing, which is presumably why the card has an x4 interface.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 6

Oops, MTU was still 1500 on the Intel side. Must've reverted after a few reboots and re-plugs today. Had to run:

sudo ip link set dev eth1 mtu 9000
sudo ip link set dev eth2 mtu 9000
sudo ip link set dev eth3 mtu 9000
sudo ip link set dev eth4 mtu 9000

Ran the benchmark playbook again, and what do you know?

Interface Bandwidth
eth0 (built-in) 942 Mbps
eth1 804 Mbps
eth2 799 Mbps
eth3 802 Mbps
eth4 803 Mbps
TOTAL: 4.15 Gbps

Testing just the four interfaces on the card, I get:

Interface Bandwidth
eth1 801 Mbps
eth2 805 Mbps
eth3 806 Mbps
eth4 803 Mbps
TOTAL: 3.22 Gbps

Testing just three of the card's interfaces:

Interface Bandwidth
eth1 990 Mbps
eth2 991 Mbps
eth3 991 Mbps
TOTAL: 2.97 Gbps

And testing three of the card's interfaces plus the onboard:

Interface Bandwidth
eth0 (built-in) 941 Mbps
eth1 990 Mbps
eth2 989 Mbps
eth3 989 Mbps
TOTAL: 3.91 Gbps

So, seems logical that PCIe 2.0 x1 lane is maxing out right around 3.2 Gbps. Also confirmed that with jumbo packets / MTU 9000, the irq CPU usage was hovering around 50% maximum with all 5 interfaces going full throttle. It was 99% before, causing the slowdown.

And also, I can confirm the Pi Compute Module 4 can be overclocked to 2.20 GHz, just like the Pi 400. Though it needs better cooling to stay running that fast ;)

4.15 Gbps ain't too shabby on a single Pi. I remember back in my early Pi cluster days when I salivated over getting 200-300 Mbps... 😂

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 5

Rebooted and...

$ sudo ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether b8:27:eb:5c:89:43 brd ff:ff:ff:ff:ff:ff
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:33:72:64 brd ff:ff:ff:ff:ff:ff
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:33:72:65 brd ff:ff:ff:ff:ff:ff
5: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:33:72:66 brd ff:ff:ff:ff:ff:ff
6: eth4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 90:e2:ba:33:72:67 brd ff:ff:ff:ff:ff:ff
7: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DORMANT group default qlen 1000
    link/ether b8:27:eb:74:f2:6c brd ff:ff:ff:ff:ff:ff

BINGO!

Now, time to pull out a gigabit switch and start testing the maximum bandwidth I can pump through everything with iperf! But that will have to wait for another day, as I'm pretty much running on fumes right now.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 4

So one thing I would like to take a look at is OpenWRT. And just seeing how many bits I can push through the Pi per second using iperf. Any other uses people can think of for having 5 interfaces on one Pi (besides the obvious, like having an Apache HTTP site running on each hard IP address, which is not as interesting to me nowadays).

Edit: Some ideas to test:

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024 3

In bigger, multi socket systems, I've seen notable performance increase when processes have been assigned on CPUs that have been directly connected to the network interfaces (numactl helps with that) and I'm wondering if locking each ifperf process together with networking interrupts would help scheduler to work more efficiently.

Way more high effort task would be to run networking drivers in user space. There's some research group who have written a paper and drivers, but those are for X520. https://github.com/emmericp/ixy

That's why I suggest to perform some test without letting pkts reach userspace, by using bridge or setup iptable forwarding packets. It's a common use case to use RPI as a switch/router/firewall or gateway.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 2

@annunity you and me both! I just got it plugged in for my initial "I should be getting to bed but I'll test this instead" routine :)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 2

Hehehe this is so dumb but fills me with joy:

IMG_2556

I don't even have enough ports on my little office switch, I'm going to have to go into storage to grab my bigger switch tomorrow.

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b8:27:eb:5c:89:43 brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.120/24 brd 10.0.100.255 scope global dynamic noprefixroute eth0
       valid_lft 86197sec preferred_lft 75397sec
    inet6 fe80::2f77:1eba:a042:9f2a/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 90:e2:ba:33:72:64 brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.45/24 brd 10.0.100.255 scope global dynamic noprefixroute eth1
       valid_lft 86391sec preferred_lft 75591sec
    inet6 fe80::88c:a286:503d:490c/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 90:e2:ba:33:72:65 brd ff:ff:ff:ff:ff:ff
    inet 10.0.100.46/24 brd 10.0.100.255 scope global dynamic noprefixroute eth2
       valid_lft 86345sec preferred_lft 75545sec
    inet6 fe80::10fb:5fd6:67f2:168e/64 scope link 
       valid_lft forever preferred_lft forever
5: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 90:e2:ba:33:72:66 brd ff:ff:ff:ff:ff:ff
6: eth4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 90:e2:ba:33:72:67 brd ff:ff:ff:ff:ff:ff
7: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b8:27:eb:74:f2:6c brd ff:ff:ff:ff:ff:ff
$ dmesg | grep igb
[    4.192441] igb: loading out-of-tree module taints kernel.
[    4.200685] igb 0000:01:00.0: enabling device (0140 -> 0142)
[    4.201190] igb 0000:01:00.0: Failed to initialize MSI-X interrupts. Falling back to MSI interrupts.
[    4.376204] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Linux Driver
[    4.376225] igb 0000:01:00.0: eth1: (PCIe:5.0GT/s:Width x1) 
[    4.376244] igb 0000:01:00.0 eth1: MAC: 90:e2:ba:33:72:64
[    4.376337] igb 0000:01:00.0: eth1: PBA No: E81600-008
[    4.376353] igb 0000:01:00.0: LRO is disabled
[    4.376371] igb 0000:01:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
[    4.379455] igb 0000:01:00.1: enabling device (0140 -> 0142)
[    4.379940] igb 0000:01:00.1: Failed to initialize MSI-X interrupts. Falling back to MSI interrupts.
[    4.558290] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Linux Driver
[    4.558310] igb 0000:01:00.1: eth2: (PCIe:5.0GT/s:Width x1) 
[    4.558329] igb 0000:01:00.1 eth2: MAC: 90:e2:ba:33:72:65
[    4.558423] igb 0000:01:00.1: eth2: PBA No: E81600-008
[    4.558438] igb 0000:01:00.1: LRO is disabled
[    4.558456] igb 0000:01:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
[    4.558847] igb 0000:01:00.2: enabling device (0140 -> 0142)
[    4.559353] igb 0000:01:00.2: Failed to initialize MSI-X interrupts. Falling back to MSI interrupts.
[    4.736590] igb 0000:01:00.2: Intel(R) Gigabit Ethernet Linux Driver
[    4.736611] igb 0000:01:00.2: eth3: (PCIe:5.0GT/s:Width x1) 
[    4.736639] igb 0000:01:00.2 eth3: MAC: 90:e2:ba:33:72:66
[    4.736731] igb 0000:01:00.2: eth3: PBA No: E81600-008
[    4.736746] igb 0000:01:00.2: LRO is disabled
[    4.736765] igb 0000:01:00.2: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
[    4.737172] igb 0000:01:00.3: enabling device (0140 -> 0142)
[    4.737659] igb 0000:01:00.3: Failed to initialize MSI-X interrupts. Falling back to MSI interrupts.
[    4.926383] igb 0000:01:00.3: Intel(R) Gigabit Ethernet Linux Driver
[    4.926403] igb 0000:01:00.3: eth4: (PCIe:5.0GT/s:Width x1) 
[    4.926421] igb 0000:01:00.3 eth4: MAC: 90:e2:ba:33:72:67
[    4.926513] igb 0000:01:00.3: eth4: PBA No: E81600-008
[    4.926529] igb 0000:01:00.3: LRO is disabled
[    4.926546] igb 0000:01:00.3: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)

And after plugging in another connection to one of the jacks on the board:

[  200.034885] igb 0000:01:00.3 eth4: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[  200.035107] IPv6: ADDRCONF(NETDEV_CHANGE): eth4: link becomes ready

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 2

@mayli / @jwbensley - Ah, using atop I did notice all the irq went to the first CPU:

Screen Shot 2020-10-30 at 4 40 51 PM

Going to try manually assigning...

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024 2

Also from HN:

So its running Gen2 x1, which is good. I was afraid that it might have downshifted to Gen1. Other threads point to your CPU being pegged, and I would tend to agree with that.
What direction are you running the streams in? In general, sending is much more efficient than receiving ("its better to give than to receive"). From your statement that ksoftirqd is pegged, I'm guessing you're receiving.
I'd first see what bandwidth you can send at with iperf when you run the test in reverse so this pi is sending. Then, to eliminate memory bw as a potential bottleneck, you could use sendfile. I don't think iperf ever supported sendfile (but its been years since I've used it). I'd suggest installing netperf on this pi, running netserver on its link partners, and running netperf -tTCP_SENDFILE -H othermachine to all 5 peers and see what happens.

iperf3 can do zerocopy seeding use -Z.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 2

Posted a new video: 4+ Gbps Ethernet on the Raspberry Pi Compute Module 4

from raspberry-pi-pcie-devices.

annunity avatar annunity commented on May 5, 2024 1

Haha nice. Probably still less messy this using 4 usb nics. This is definitely one of those because you can sort of projects.

from raspberry-pi-pcie-devices.

annunity avatar annunity commented on May 5, 2024 1

Just a heads up. Openwrt for the pi 4 is still using daily snapshots or you can compile it yourself. The snapshots work fine if you are just testing but you will have trouble updating the packages after a while. The main appeal for me is to use load balancing or vlans in Openwrt. Beats having a large expensive switch.

from raspberry-pi-pcie-devices.

markbirss avatar markbirss commented on May 5, 2024 1

@geerlingguy
Have you considered any of these cards ?

https://www.amazon.com/QNAP-Dual-Band-Wireless-Expansion-QWA-AC2600/dp/B07DD86XW4 (atk10k driver)
https://deviwiki.com/wiki/QNAP_QWA-AC2600

https://www.amazon.com/QNAP-QXG-10G1T-Single-Port-Low-Profile-Full-Height/dp/B07CW2C2J1

Or some Mellanox ConnectX-3 cards from Ebay (mlx4_core kernel module)

https://www.ebay.ie/itm/Mellanox-MCX311A-XCAT-CX311A-ConnectX-3-EN-10G-Ethernet-10GbE-SFP/193394513380

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

Aha! We might be getting closer. The CPU itself still seems to have headroom, but I noticed the process ksoftirqd was hitting 99%-100% CPU the whole time, and I'm guessing it's single-threaded.

I just created a new Ansible playbook to run these tests a lot more quickly/easily so I can better observe the board while the tests are running (I'll add it to an 'extras' dir in this repository soon), and I'm now seeing that running everything at exactly the same time (I had a small lag when I wasn't using Ansible's immediate forks) gets me:

  • 559 Mbps
  • 560 Mbps
  • 560 Mbps
  • 561 Mbps
  • 573 Mbps

...which is still 2.81 Gbps total. In the words of the young'ns playing Among Us, that ksoftirqd process looks a little 'sus':

Screen Shot 2020-10-30 at 3 26 19 PM

As good an answer as any from @luciang:

Your computer communicates with the devices attached to it through IRQs (interrupt requests). When an interrupt comes from a device, the operating system pauses what it was doing and starts addressing that interrupt.

In some situations IRQs come very very fast one after the other and the operating system cannot finish servicing one before another one arrives. This can happen when a high speed network card receives a very large number of packets in a short time frame.

Because the operating system cannot handle IRQs as they arrive (because they arrive too fast one after the other), the operating system queues them for later processing by a special internal process named ksoftirqd.

If ksoftirqd is taking more than a tiny percentage of CPU time, this indicates the machine is under heavy interrupt load.

from raspberry-pi-pcie-devices.

theofficialgman avatar theofficialgman commented on May 5, 2024 1

just to add to the conversation, I also see high percentage (18%) of cpu time on my mips based routers while running iperf3 (only getting 178mbps because of the cpu in it is very slow) but not at all on my x86_64 based machine..... so basically this all this says is the cpu is too slow and continuing to hammer it only makes the situation worse as I see it... interesting

What happens if you just use 2 ports (so only 2gbps), does the ksoftirqd usage go away entirely or does it scale somewhat linearly?

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024 1

try echo 1 |sudo tee /proc/irq/56/smp_affinity
the sudo in sudo echo 1 >/proc/irq/56/smp_affinity only applies to sudo echo 1, the >/proc/irq/56/smp_affinity is evaluated from current shell.

from raspberry-pi-pcie-devices.

alexforencich avatar alexforencich commented on May 5, 2024 1

Transferring 1500 byte Ethernet frames via PCIe DMA on a PCIe gen 2 x1 link with 128 byte max TLP size has a theoretical maximum bandwidth considering all protocol overheads of about 3.0 Gbps if you want to run both RX and TX at the same time, or about 3.3 Gbps if you want to run RX or TX in isolation. Overclocking the CPU may also overclock the PCIe bus, depending on exactly how the overclocking was done. No software configuration changes will improve the situation here, you either need a gen 3 link or more lanes to do any better.

tl;dr: you got 3.0 Gbps, you're done. It's not possible to do any better as this is the limit of what you can push over PCIe.

from raspberry-pi-pcie-devices.

alexforencich avatar alexforencich commented on May 5, 2024 1

Max throughput over a PCIe gen 2 x1 link with 128 byte TLPs is 5 Gbps * 8/10 (encoding) * 128/(128+24) (TLP headers) = 3.37 Gbps. TLPs less than 128 bytes, link layer traffic for flow control, read request TLPs, and descriptor traffic will reduce this further.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

Latest result with MTU 9000: 3.36 Gbps

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024 1

@chinmaythosar - So... I also decided it would be fun to give the 2-port 2.5 GbE NIC a try; see #46

from raspberry-pi-pcie-devices.

annunity avatar annunity commented on May 5, 2024

Looking forward to the results. I am curious if it will work with something like openwrt and make a killer home router. Cheers

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024
$ lspci
00:00.0 PCI bridge: Broadcom Limited Device 2711 (rev 20)
01:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
01:00.2 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
01:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)

So far so good. One of the interfaces using -vv:

$ sudo lspci -vv
...
01:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
	Subsystem: Intel Corporation Ethernet Server Adapter I340-T4
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin D routed to IRQ 0
	Region 0: Memory at 600200000 (32-bit, non-prefetchable) [disabled] [size=512K]
	Region 3: Memory at 60028c000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] MSI-X: Enable- Count=10 Masked-
		Vector table: BAR=3 offset=00000000
		PBA: BAR=3 offset=00002000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <32us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-ff-33-72-64
	Capabilities: [1a0 v1] Transaction Processing Hints
		Device specific mode supported
		Steering table in TPH capability structure

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

PCIe interface seems happy with the default BAR space:

[    0.870492] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    0.870517] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    0.870589] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
[    0.870660] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0100000000
[    0.926496] brcm-pcie fd500000.pcie: link up, 5 GT/s x1 (SSC)
[    0.926820] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    0.926839] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.926860] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
[    0.926920] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    0.927192] pci 0000:00:00.0: PME# supported from D0 D3hot
[    0.930550] PCI: bus0: Fast back to back transfers disabled
[    0.930573] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    0.930801] pci 0000:01:00.0: [8086:150e] type 00 class 0x020000
[    0.930890] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x0007ffff]
[    0.930961] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x00003fff]
[    0.931032] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    0.931278] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    0.931374] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x1 link at 0000:00:00.0 (capable of 16.000 Gb/s with 5 GT/s x4 link)
[    0.931607] pci 0000:01:00.1: [8086:150e] type 00 class 0x020000
[    0.931691] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x0007ffff]
[    0.931762] pci 0000:01:00.1: reg 0x1c: [mem 0x00000000-0x00003fff]
[    0.932046] pci 0000:01:00.1: PME# supported from D0 D3hot D3cold
[    0.932329] pci 0000:01:00.2: [8086:150e] type 00 class 0x020000
[    0.932412] pci 0000:01:00.2: reg 0x10: [mem 0x00000000-0x0007ffff]
[    0.932482] pci 0000:01:00.2: reg 0x1c: [mem 0x00000000-0x00003fff]
[    0.932765] pci 0000:01:00.2: PME# supported from D0 D3hot D3cold
[    0.933045] pci 0000:01:00.3: [8086:150e] type 00 class 0x020000
[    0.933128] pci 0000:01:00.3: reg 0x10: [mem 0x00000000-0x0007ffff]
[    0.933199] pci 0000:01:00.3: reg 0x1c: [mem 0x00000000-0x00003fff]
[    0.933482] pci 0000:01:00.3: PME# supported from D0 D3hot D3cold
[    0.936779] PCI: bus1: Fast back to back transfers disabled
[    0.936799] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    0.936845] pci 0000:00:00.0: BAR 8: assigned [mem 0x600000000-0x6002fffff]
[    0.936884] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x60007ffff]
[    0.936910] pci 0000:01:00.0: BAR 6: assigned [mem 0x600080000-0x6000fffff pref]
[    0.936928] pci 0000:01:00.1: BAR 0: assigned [mem 0x600100000-0x60017ffff]
[    0.936953] pci 0000:01:00.2: BAR 0: assigned [mem 0x600180000-0x6001fffff]
[    0.936978] pci 0000:01:00.3: BAR 0: assigned [mem 0x600200000-0x60027ffff]
[    0.937002] pci 0000:01:00.0: BAR 3: assigned [mem 0x600280000-0x600283fff]
[    0.937027] pci 0000:01:00.1: BAR 3: assigned [mem 0x600284000-0x600287fff]
[    0.937052] pci 0000:01:00.2: BAR 3: assigned [mem 0x600288000-0x60028bfff]
[    0.937076] pci 0000:01:00.3: BAR 3: assigned [mem 0x60028c000-0x60028ffff]
[    0.937107] pci 0000:00:00.0: PCI bridge to [bus 01]
[    0.937131] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x6002fffff]
[    0.937449] pcieport 0000:00:00.0: enabling device (0140 -> 0142)
[    0.937692] pcieport 0000:00:00.0: PME: Signaling with IRQ 55
[    0.938097] pcieport 0000:00:00.0: AER: enabled with IRQ 55

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

However, not seeing the new interfaces by default:

$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether b8:27:eb:5c:89:43 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DORMANT group default qlen 1000
    link/ether b8:27:eb:74:f2:6c brd ff:ff:ff:ff:ff:ff

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Glancing around in make menuconfig I didn't see any relevant Intel drivers for networking... so I went on their website and found the "Intel® Ethernet Adapter Complete Driver Pack" which says it's compatible with the "Intel® Ethernet Server Adapter I340-T4". It does mention in the README that 32-bit Linux is supported, but I could find no mention of ARM anywhere.

To download the .zip archive, you have to do it in a browser since it requires the manual acceptance of a download license. That's annoying, because I had to download it to my Mac then copy it over to the Pi via SCP.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

All right, so after digging into that process and realizing that downloads like the entire universe of Windows and Linux drivers in a giant archive, I found that the Intel Linux base driver page mentions:

igb-x.x.x.tar.gz driver: Supports all 82575/6, 82580, I350, I354, and I210/I211 based gigabit network connections.

So I downloaded that driver and manually copied it to the Pi, then:

$ tar zxf igb-5.4.6.tar.gz
$ cd igb-5.4.6/src/
$ make install
make[1]: Entering directory '/home/pi/linux'
  CC [M]  /home/pi/igb-5.4.6/src/igb_main.o
/home/pi/igb-5.4.6/src/igb_main.c: In function ‘igb_get_os_driver_version’:
/home/pi/igb-5.4.6/src/igb_main.c:10044:7: error: implicit declaration of function ‘isdigit’ [-Werror=implicit-function-declaration]
   if(!isdigit(*c) && *c != '.')
       ^~~~~~~
At top level:
/home/pi/igb-5.4.6/src/igb_main.c:9439:12: warning: ‘igb_resume’ defined but not used [-Wunused-function]
 static int igb_resume(struct device *dev)
            ^~~~~~~~~~
/home/pi/igb-5.4.6/src/igb_main.c:9413:12: warning: ‘igb_suspend’ defined but not used [-Wunused-function]
 static int igb_suspend(struct device *dev)
            ^~~~~~~~~~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:266: /home/pi/igb-5.4.6/src/igb_main.o] Error 1
make[1]: *** [Makefile:1732: /home/pi/igb-5.4.6/src] Error 2
make[1]: Leaving directory '/home/pi/linux'
make: *** [Makefile:86: default] Error 2

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

After seeing noseka1/linuxband#12, I monkey-patched the igb_main.c and added #include <linux/ctype.h> alongside the other includes, and that got things a bit further.

However, now it's complaining about permissions when it tries to copy over manfiles:

...
  MODPOST 1 modules
  CC [M]  /home/pi/igb-5.4.6/src/igb.mod.o
  LD [M]  /home/pi/igb-5.4.6/src/igb.ko
make[1]: Leaving directory '/home/pi/linux'
Copying manpages...
install: cannot create regular file '/usr/share/man/man7/igb.7.gz': Permission denied
make: *** [Makefile:120: install] Error 1

So trying with sudo make install:

$ sudo make install
make[1]: Entering directory '/home/pi/linux'
  Building modules, stage 2.
  MODPOST 1 modules
make[1]: Leaving directory '/home/pi/linux'
Copying manpages...
Installing modules...
make[1]: Entering directory '/home/pi/linux'
  INSTALL /home/pi/igb-5.4.6/src/igb.ko
  DEPMOD  5.4.72-v7l+
make[1]: Leaving directory '/home/pi/linux'
Running depmod...
Updating initramfs...

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Just testing one interface with iperf:

On the Pi:

$ sudo apt install -y iperf
$ iperf --bind 10.0.100.48 --server

On my Mac:

$ iperf -c 10.0.100.48
------------------------------------------------------------
Client connecting to 10.0.100.48, TCP port 5001
TCP window size:  201 KByte (default)
------------------------------------------------------------
[  4] local 10.0.100.118 port 56661 connected with 10.0.100.48 port 5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.10 GBytes   941 Mbits/sec

Just doing a quick test on my Mac with two interfaces, I found that I got:

483 Mbits/sec # 10.0.100.48
481 Mbits/sec # 10.0.100.120

So 964 Mbits/sec total, which just shows that my local 1 Gbps switch has, as expected, 1 Gbps of total capacity.

Now, I think what I'll do to try to test all five GigE ports at once is to have my Pi Dramble (4x Pi 4s) connecting directly to each of the ports on the Intel NIC, with a hardcoded IP address for each connection.

Then have my Mac connected over the network to the wired Ethernet on the CM4.

Then set up five instances of iperf --bind [interface ip] --server on the CM4, and then have Ansible trigger iperf -c [interface ip] for each of the Pis, plus localhost, at the same time (--forks 5).

This could get messy. I know the cabling at least will get messy.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

For each Pi, I prepped the boot volume with:

# First, flash 64-bit Pi OS lite to it via Raspberry Pi Imager.

# Second, touch ssh file so I can access it.
$ touch /Volumes/boot/ssh

# Third, write a wpa_supplicant file so it connects to my WiFi.
$ cat > /Volumes/boot/wpa_supplicant.conf << EOF
country=US
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
scan_ssid=1
  ssid="ssid_here"
  psk="password_here"
}
EOF

# Fourth, pop card in Pi, boot pi, and watch for it using `fing` on Mac.

# Fifth, mark in separate document the IP and MAC for the Pi.

# Sixth, copy my SSH key to the Pi.
ssh-copy-id pi@[ip address here]

# Seventh, log into the Pi.
ssh pi@[ip address here]

# Eight, configure a static IP in /etc/dhcpcd.conf.
$ sudo nano /etc/dhcpcd.conf
(I'm using range 192.168.0.4-192.168.0.11)

This is a tedious and annoying process, but this is research for science so it is worth it.

The next step will be building a small Ansible playbook that creates little networks for each of the interfaces, using the same IP and netmask between the individual Pis and the Intel NIC interfaces... or maybe doing it all by hand. It's one of those XKCD 1319 issues. (Edit: I went manual for this part.)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Now I added a mapping of static IPs to /etc/dhcpcd.conf on the CM4:

interface eth1
static ip_address=192.168.0.11/24
static routers=192.168.0.1

interface eth2
static ip_address=192.168.0.10/24
static routers=192.168.0.1

interface eth3
static ip_address=192.168.0.9/24
static routers=192.168.0.1

interface eth4
static ip_address=192.168.0.8/24
static routers=192.168.0.1

And rebooted. It looks like all the links are up, so it's time to start seeing if I can hit multiple interfaces at the same time and get more than 1 Gigabit pumping through this Pi!

EDIT: That mapping doesn't work, because the independent networks each need different IP ranges, so I set up a new mapping:

# TO PI 4
interface eth1
static ip_address=192.168.0.8/24
static routers=192.168.0.1

# TO PI 3
interface eth2
static ip_address=172.16.0.8/24
static routers=172.16.0.1

# TO PI 2
interface eth3
static ip_address=198.51.100.8/24
static routers=198.51.100.1

# TO PI 1
interface eth4
static ip_address=203.0.113.8/24
static routers=203.0.113.1

And I ran the iperf server on all the interfaces on the CM4:

iperf --bind 192.168.0.8 --server
iperf --bind 172.16.0.8 --server
iperf --bind 198.51.100.8 --server
iperf --bind 203.0.113.8 --server
iperf --bind 10.0.100.120 --server

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

For each of the Dramble Pis, I had to get iperf installed:

sudo ifconfig eth0 down
sudo apt update
sudo apt install -y iperf
sudo ifconfig eth0 up

Then I set up five iperf servers (one on each interface on the CM4), and manually triggered a bunch of iperf -c [ip] on each of the Pis individually as well as on my Mac to the primary interface, and... TODO

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Er... I think I need to use a different IP range for each of the interfaces. Ugh, my little text file mapping is going all bonkers now. As I said, tedious!

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

So first quick result, firing off each benchmark one by one:

  1. 626 Mbps
  2. 612 Mbps
  3. 608 Mbps
  4. 606 Mbps
  5. 614 Mbps

Total: 3.06 Gbps

Then I tried just hitting three interfaces on the NIC only, and got:

  1. 935 Mbps
  2. 942 Mbps
  3. 941 Mbps

Total: 2.82 Gbps

Then hitting all four interfaces on the NIC:

  1. 768 Mbps
  2. 769 Mbps
  3. 769 Mbps
  4. 768 Mbps

Total: 3.07 Gbps

Hitting three on the NIC and one on the Pi:

  1. 714 Mbps
  2. 709 Mbps
  3. 709 Mbps
  4. 709 Mbps

Total: 2.84 Gbps

So it seems like there's an upper ceiling around 3 Gbps for total throughput you can get through a Compute Module 4's PCIe slot... and when you couple that with the onboard network interface, it seems like there must be a tiny bit more overhead (maybe the Broadcom Ethernet chip isn't quite as efficient, or maybe having the kernel have to switch between the Intel chip and the Broadcom chip results in that 8% bandwidth penalty?).

I ran all of these sets of tests twice, and they were extremely close in each case.

Now I'm interested to see if I'd be able to pump any more bits through this thing with a 10GbE adapter, or if 2.5 GbE over any individual interface is about as high as it'll go before we hit limits.

from raspberry-pi-pcie-devices.

paulwratt avatar paulwratt commented on May 5, 2024

[ 0.931374] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x1 link at 0000:00:00.0 (capable of 16.000 Gb/s with 5 GT/s x4 link)

So yeah, there is overhead loss somewhere (3.1 Gbps limit atm). To improve this, you might look at those cards that offload from kernel to card driver (see next paragraph). Or it might just be that the drivers are not optimised when working together (on ARM). Do you get any connection over 940-ish Mbps (that seems to be about the limit per connection), which seems pretty good under the circumstances (and on par with other platforms).

I predict 6 months worth of work for actual real-world use (kernel and driver patches/updates), would have this running reliably at max speeds. But you would obviously need a reason for this real-world use case (and probably an income at the end :) - eg High Performance Dramble Cluster 5, 1xCM4 + 4xRPi4 with 4x1Gb PCIe ethernet kit (and no need for a switch).

I am wondering what sort of reliability comes from sustained-use-testing (24/48 hours) particularly thermals, ie can the CM4 and host-board handle it under sustained high-load conditions, or does something slowdown or heatup too much

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

I predict 6 months worth of work for actual real-world use (kernel and driver patches/updates), would have this running reliably at max speeds.

Heh, don't tempt me!

I've definitely considered building myself a custom Pi switch / router with this card... though I have some other much more pressing projects right now, especially after the response from my last video.

I am wondering what sort of reliability comes from sustained-use-testing (24/48 hours) particularly thermals, ie can the CM4 and host-board handle it under sustained high-load conditions, or does something slowdown or heatup too much.

In the few hours that I was testing it today, it got slightly warm, but not hot (fan blowing over the whole rig throughout). The SATA card I'm testing got much hotter, though I keep forgetting to pull out my Seek thermal cam to take a few pictures on the front and back of the board.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Another note on the build: if running the 64-bit beta of Pi OS, it seems that you might run into:

$ make install
common.mk:82: *** Kernel header files not in any of the expected locations.
common.mk:83: *** Install the appropriate kernel development package, e.g.
common.mk:84: *** kernel-devel, for building kernel modules and try again.  Stop.

And that seems to be because even if you install the headers with sudo apt install -y raspberrypi-kernel-headers, they are in a nonstandard location (I had the same issue with compiling the Nvidia graphics card drivers).

from raspberry-pi-pcie-devices.

theofficialgman avatar theofficialgman commented on May 5, 2024

iperf can be quite cpu limited since it is a single threaded process. You might be running into a bottleneck because of lacking cpu performance on the pi cm4. The client in iperf has to do more cpu work than the server so you are getting as good of performance as you can get in terms of your setup with the cm4 being the server. I find iperf3 to be more performant than iperf in general so you might want to try that.

Also, the -Z, --zerocopy of iperf3 can allow for better performance as well (because it uses less cpu cycles)
(copied from my youtube comment)

from raspberry-pi-pcie-devices.

theofficialgman avatar theofficialgman commented on May 5, 2024

one more thing: pcie 2 at 1X is limited to 500MB/s (or 4gb/s) in each direction. might be interesting to do a role reversal (-R) on a couple of the clients to see what happens in terms of total bandwidth utilized (RX+TX). since right now all the bandwidth is in one direction

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@theofficialgman - Good suggestions! I have the rig running currently and I'm going to do a few more tests with some suggestions I've gotten in the comments.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Further:

  • If I use 2 ports, I don't see ksoftirqd go above 1% CPU.
  • If I use 3 ports, I see ksoftirqd spike to 100% every time (same with 4 or 5 ports).

@theofficialgman - Ha, saw your comment right after I finished typing this in. Basically ksoftirqd only becomes an issue once I go to 3 interfaces. At 2 interfaces, it's not even on the radar.

from raspberry-pi-pcie-devices.

istoOi avatar istoOi commented on May 5, 2024

Now I added a mapping of static IPs to /etc/dhcpcd.conf on the CM4:

interface eth1
static ip_address=192.168.0.11/24
static routers=192.168.0.1

interface eth2
static ip_address=192.168.0.10/24
static routers=192.168.0.1

interface eth3
static ip_address=192.168.0.9/24
static routers=192.168.0.1

interface eth4
static ip_address=192.168.0.8/24
static routers=192.168.0.1

And rebooted. It looks like all the links are up, so it's time to start seeing if I can hit multiple interfaces at the same time and get more than 1 Gigabit pumping through this Pi!

EDIT: That mapping doesn't work, because the independent networks each need different IP ranges, so I set up a new mapping:

# TO PI 4
interface eth1
static ip_address=192.168.0.8/24
static routers=192.168.0.1

# TO PI 3
interface eth2
static ip_address=172.16.0.8/24
static routers=172.16.0.1

# TO PI 2
interface eth3
static ip_address=198.51.100.8/24
static routers=198.51.100.1

# TO PI 1
interface eth4
static ip_address=203.0.113.8/24
static routers=203.0.113.1

And I ran the iperf server on all the interfaces on the CM4:

iperf --bind 192.168.0.8 --server
iperf --bind 172.16.0.8 --server
iperf --bind 198.51.100.8 --server
iperf --bind 203.0.113.8 --server
iperf --bind 10.0.100.120 --server

A tip for future projects.
You can create a lot of seperate networks with just one private segment.
You used 10.0.100.0/24 - so 10.0.100.x is in the same network.
By changing the the 2nd or 3rd octet, you would have created separate networks. like
eth1 10.0.100.1/24 (usable ip range: 10.0.100.1 - 10.0.100.254 )
eth2 10.1.100.1/24 (usable ip range: 10.1.100.1 - 10.1.100.254 )
eth3 10.2.100.1/24 (usable ip range: 10.2.100.1 - 10.2.100.254 )

or 10.0.100.1, 10.0.101.1,....

You can also vary the network mask. /30 would give you a network with 2 usable IPs.
The magic word is CIDR -> https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Also, after sudo apt install -y linux-perf, then hacking the perf alias to work with the mismatched kernel (just to get this test done), I got the following:

# perf top
...
  15.96%  [kernel]                      [k] _raw_spin_unlock_irqrestore
  12.81%  [kernel]                      [k] mmiocpy
   6.26%  [kernel]                      [k] __copy_to_user_memcpy
   6.02%  [kernel]                      [k] __local_bh_enable_ip
   5.13%  [igb]                         [k] igb_poll

When it hit full blast, I started getting "Events are being lost, check IO/CPU overload!"

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Also just noting:

$ ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 128

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

I tried setting the MTU to the highest available option (get minmtu and maxmtu with ip -d link list):

sudo ip link set dev eth1 mtu 9216
sudo ip link set dev eth2 mtu 9216
sudo ip link set dev eth3 mtu 9216
sudo ip link set dev eth4 mtu 9216

But it still resulted in the same behavior, even when only hitting the PCIe interfaces on the Intel card, giving 2.96 Gbps.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Also, as noted by Kevin Malec:

igb is in the kernel source. it's called igb however, which doesn't have "intel" in the name

Still, quicker to install the driver from Intel than to recompile the kernel to get it... at least for this driver.

from raspberry-pi-pcie-devices.

durapensa avatar durapensa commented on May 5, 2024

one more thing: pcie 2 at 1X is limited to 500MB/s (or 4gb/s) in each direction. might be interesting to do a role reversal (-R) on a couple of the clients to see what happens in terms of total bandwidth utilized (RX+TX). since right now all the bandwidth is in one direction

I came here to wonder about the same thing.

https://en.wikipedia.org/wiki/PCI_Express#History_and_revisions

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

So... another thing I quickly tried: Overclocking the Raspberry Pi CM4 to it's maximum setting of 2.147 GHz CPU / 750 MHz GPU:

And I got 3.43 Gbps! So, from 3.06 to 3.43 == 11% difference! (for a 36% clock speed difference)

So it looks like what some have speculated is true—besides the PCIe bus limitation, the interrupts are limited by CPU on the Pi 4/CM4.

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024

So,first try switch to iperf3 instead of iperf.

Second, install atop and check irq% on each core. It's likely you are doing all work on a single core, so a manual set interrupt handle to each core might work, by something like this

echo 1 >/proc/irq/32/smp_affinity
echo 2 >/proc/irq/33/smp_affinity
echo 4 >/proc/irq/34/smp_affinity
echo 8 >/proc/irq/35/smp_affinity

or if you are lazy, try enable irqbalance or use https://gist.github.com/SaveTheRbtz/8875474

There are few things you can try to reduce irq including adjust the adaptive-tx/rx using ethtool, if your nic supports it.

ethtool -C $ETH_NAME adaptive-tx on
ethtool -C $ETH_NAME adaptive-rx on

You can also try increase the MTU to 9000 since those nics should be capable to handle jump frame and that could reduce the number of irqs coming to CPU.

Other tuning including Receive Packet Steering, but that might won't help a lot.

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024

Adaptive RX: off TX: off

Yeah, your intel nic supports that, try enable it to save few irq requests by handling multiple packets in batch.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@mayli - Hmm...

pi@raspberrypi:~ $ sudo ethtool -C eth1 adaptive-tx on
Cannot set device coalesce parameters: Unknown error 524
pi@raspberrypi:~ $ sudo ethtool -C eth1 adaptive-rx on
Cannot set device coalesce parameters: Unknown error 524

from raspberry-pi-pcie-devices.

jwbensley avatar jwbensley commented on May 5, 2024

Interesting project, thanks for sharing!

If you're checking for theoretical max speed, try using iperf3 in UDP mode (I haven't used it in a while but I think in UDP mode it defaults to 1Mbps so you also need to set the bandwidth higher).

Alternatively try a tool I wrote (although I don't have a Pi to test it on, so no idea if it will work on a Pi - but I don't see why it wouldn't). Spread the interrupts for each NIC over each core (https://github.com/jwbensley/net-stats/blob/master/irq_balance.sh) then run a separate instance of EtherateMT with a single worker thread each pinned to a separate CPU core and interface: https://github.com/jwbensley/EtherateMT

Remove any IPs from the interfaces first, they aren't needed.

./irq_balance.sh
./etherate_mt -c 1 -x 1 -i eth1 -p4
./etherate_mt -c 1 -x 2 -i eth2 -p4
./etherate_mt -c 1 -x 3 -i eth3 -p4
./etherate_mt -c 1 -x 4 -i eth4 -p4

Something like that, I'm on a mobile right now with nothing to test nearby.

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024

I'm guessing it's single-threaded.

Yes, it's single threaded, but there are 4 of them, and you can distribute work to different cores to avoid that bottleneck.

From the screenshot I notice each iperf is taking 25% cpu to handle ~500Mbps, that's the userspace overhead of iperf, so all 4 iperf processes are taking 1 core at 100%, which is a huge overhead. (Kernel -context switching-> User -> handle)

For routing/firewall or NAT use case, the packets doesn't leave kernel, to benchmark that, you can create a bridge across all 4 Intel nics and make CM4 works as a switch. Then start iperf3 server/client on the RPI4 side, that's should achieve better numbers than running iperf on CM4 directly.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

For each interface:

# grep eth1 /proc/interrupts
 56:      10966          0          0          0  BRCM STB PCIe MSI 524288 Edge      eth1

// Take the '56' and that's the irq number:
# cat /proc/irq/56/smp_affinity
f

// 'f' means it can use any IRQ on the system. We want to assign this if to CPU 0:
# sudo echo 1 >/proc/irq/56/smp_affinity
bash: echo: write error: Invalid argument

Hmm... it seems I also can't use ./set_irq_affinity -x 0 eth1 as that gives a printf: write error: Invalid argument :-/

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Though that all may be a bit of a misleading path, as Nils mentions on SE:

I do not think that moving interrupts to different CPUs - especially for handling network events - would increase the performance.

The contrary will happen, since the network-code can not be held in a specific CPU any longer.

So as long as you do not experience dropped packets on your network interface, I would say - this is quite normal behaviour for a network that serves many packets.

You need to lower the number of interrupts - moving them around will not help (on the contrary, as I tried to outline).

@mayli - That was copied from earlier—I tried all the commands as root directly, and was getting the same issue.

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024

That's weird error, Intel driver should work, unless there is a limitation on arm platform.
It's also weird that ethtool can report Adaptive RX: off TX: off but unable to turn them on.
How about let's try reduce the CPU usage by

  • use brctl to create a bridge of 4 NICs, and avoid userspace packet handling on CM4 Cpus
  • change MTU to 9000 to save few interrupts
  • tune the ethtool Coalesce settings to merge more pkts into single interrupt explanation
    rx-usecs: 30 rx-frames: 100 rx-usecs-irq: 100 rx-frames-irq: 128
    tx-usecs: 30 tx-frames: 100 tx-usecs-irq: 100 tx-frames-irq: 128
    I just made up those numbers, but they might work :)

and some generic nic tuning

  • increase tx/rx queue: ethtool -G ethN rx 4096 tx 4096

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Also from HN:

So its running Gen2 x1, which is good. I was afraid that it might have downshifted to Gen1. Other threads point to your CPU being pegged, and I would tend to agree with that.

What direction are you running the streams in? In general, sending is much more efficient than receiving ("its better to give than to receive"). From your statement that ksoftirqd is pegged, I'm guessing you're receiving.

I'd first see what bandwidth you can send at with iperf when you run the test in reverse so this pi is sending. Then, to eliminate memory bw as a potential bottleneck, you could use sendfile. I don't think iperf ever supported sendfile (but its been years since I've used it). I'd suggest installing netperf on this pi, running netserver on its link partners, and running netperf -tTCP_SENDFILE -H othermachine to all 5 peers and see what happens.

from raspberry-pi-pcie-devices.

happosade avatar happosade commented on May 5, 2024

In bigger, multi socket systems, I've seen notable performance increase when processes have been assigned on CPUs that have been directly connected to the network interfaces (numactl helps with that) and I'm wondering if locking each ifperf process together with networking interrupts would help scheduler to work more efficiently.

Way more high effort task would be to run networking drivers in user space. There's some research group who have written a paper and drivers, but those are for X520. https://github.com/emmericp/ixy

from raspberry-pi-pcie-devices.

ryan-haver avatar ryan-haver commented on May 5, 2024

I assume the compute module's eMMC storage is a limiting factor here as well. Have you tested disk benchmarks to ensure it isn't contributing significantly to the bottleneck you are seeing with your iPerf testing?

Edit: never mind it looks like you were running iPerf in memory mode so the eMMC flash storage speed wouldn’t be a limiting factor.

from raspberry-pi-pcie-devices.

johnp789 avatar johnp789 commented on May 5, 2024

Have you thought about bonding or teaming with this interface, to see if you can exceed 940 Mbps on a single IP? Samba or minio at more than gigabit rates would be neat.

from raspberry-pi-pcie-devices.

russor avatar russor commented on May 5, 2024

Though that all may be a bit of a misleading path, as Nils mentions on SE:

I do not think that moving interrupts to different CPUs - especially for handling network events - would increase the performance.
The contrary will happen, since the network-code can not be held in a specific CPU any longer.
So as long as you do not experience dropped packets on your network interface, I would say - this is quite normal behaviour for a network that serves many packets.
You need to lower the number of interrupts - moving them around will not help (on the contrary, as I tried to outline).

@mayli - That was copied from earlier—I tried all the commands as root directly, and was getting the same issue.

I don't think the SE advice is correct. To get the most throughput out of your system, you want to use all of the CPUs, and you want to make sure they can work independently, with the minimum amount of communication between them. With 4 NICs, the simplest way to set it up is pin each nic and the iperf command for that nic to one cpu. If you want to also use the onboard NIC; I guess I'd leave that unpinned and see where that gets you. If you wanted to operationalize this, you'd want to look into Recieve Side Scaling (RSS), with each NIC having one queue per cpu, and some more more work to get connections pinned to the same cpu for userland and kernel work; but for just iperfing, keeping it simple is better.

from raspberry-pi-pcie-devices.

johnp789 avatar johnp789 commented on May 5, 2024

Without doing any specific configuration, here's what /proc/interrupts looks like on my i7-870 with an HP NCT365T quad-port NIC (enp5s0), based on the Intel 82580. Only the first three ports are connected, so the fourth port (f3) doesn't appear in the list. The motherboard NIC is enp3s0.

$ cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:    1325041          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  8:          0          0          0          1          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 16:          0          0          0          0          0          0          0          0   IO-APIC  16-fasteoi   uhci_hcd:usb5, uhci_hcd:usb11, ahci[0000:02:00.0]
 17:          0          0          0          0          0          2          0          0   IO-APIC  17-fasteoi   firewire_ohci, pata_jmicron
 18:          0          0          0          0         56          0          0          0   IO-APIC  18-fasteoi   ehci_hcd:usb1, uhci_hcd:usb7, uhci_hcd:usb10, i801_smbus
 19:          0          0          1          0          0          0          0          0   IO-APIC  19-fasteoi   uhci_hcd:usb9, pci-das6402/16
 21:          0          0          0          0          0          0          0          0   IO-APIC  21-fasteoi   uhci_hcd:usb6
 23:          0          0          0          0          0          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb3, uhci_hcd:usb8
 24:    4519864          0          0          0          0          0          0          0  HPET-MSI   3-edge      hpet3
 25:          0    4818953          0          0          0          0          0          0  HPET-MSI   4-edge      hpet4
 26:          0          0    5027553          0          0          0          0          0  HPET-MSI   5-edge      hpet5
 27:          0          0          0    5136508          0          0          0          0  HPET-MSI   6-edge      hpet6
 28:          0          0          0          0    4998057          0          0          0  HPET-MSI   7-edge      hpet7
 34:          0          0          0          0          0          0      76420          0   PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 35:          0          0          0          0          0          0          0          0   PCI-MSI 2097152-edge      xhci_hcd
 36:          0          0          0          0          0          0          0          0   PCI-MSI 2097153-edge      xhci_hcd
 37:          0          0          0          0          0          0          0          0   PCI-MSI 2097154-edge      xhci_hcd
 38:          0          0          0          0          0          0          0          0   PCI-MSI 2097155-edge      xhci_hcd
 39:          0          0          0          0          0          0          0          0   PCI-MSI 2097156-edge      xhci_hcd
 40:          0          0          0          0          0          0          0          0   PCI-MSI 2097157-edge      xhci_hcd
 41:          0          0          0          0          0          0          0          0   PCI-MSI 2097158-edge      xhci_hcd
 42:          0          0          0          0          0          0          0          0   PCI-MSI 2097159-edge      xhci_hcd
 43:          0          0          0         21          0          0          0          0   PCI-MSI 360448-edge      mei_me
 44:          7          0          0          0          0          0          0          0   PCI-MSI 2621440-edge      enp5s0f0
 45:          0   18472518          0          0          0          0          0          0   PCI-MSI 2621441-edge      enp5s0f0-TxRx-0
 46:          0          0   19181163          0          0          0          0          0   PCI-MSI 2621442-edge      enp5s0f0-TxRx-1
 47:          0          0          0   15900954          0          0          0          0   PCI-MSI 2621443-edge      enp5s0f0-TxRx-2
 48:          0          0          0          0   16366358          0          0          0   PCI-MSI 2621444-edge      enp5s0f0-TxRx-3
 49:          0          0          0          0          0   17742523          0          0   PCI-MSI 2621445-edge      enp5s0f0-TxRx-4
 50:          0          0          0          0          0          0   15637822          0   PCI-MSI 2621446-edge      enp5s0f0-TxRx-5
 51:          0          0          0          0          0          0          0   19780624   PCI-MSI 2621447-edge      enp5s0f0-TxRx-6
 52:   16868733          0          0          0          0          0          0          0   PCI-MSI 2621448-edge      enp5s0f0-TxRx-7
 53:          0          0          0          0      55225          0          0          0   PCI-MSI 1572864-edge      enp3s0
 54:          0          1          0          0          0          0          0          0   PCI-MSI 2623488-edge      enp5s0f1
 55:          0          0     407486          0          0          0          0          0   PCI-MSI 2623489-edge      enp5s0f1-TxRx-0
 56:          0          0          0     987569          0          0          0          0   PCI-MSI 2623490-edge      enp5s0f1-TxRx-1
 57:          0          0          0          0    1420090          0          0          0   PCI-MSI 2623491-edge      enp5s0f1-TxRx-2
 58:          0          0          0          0          0     216953          0          0   PCI-MSI 2623492-edge      enp5s0f1-TxRx-3
 59:          0          0          0          0          0          0    1324590          0   PCI-MSI 2623493-edge      enp5s0f1-TxRx-4
 60:          0          0          0          0          0          0          0    1192697   PCI-MSI 2623494-edge      enp5s0f1-TxRx-5
 61:    1658867          0          0          0          0          0          0          0   PCI-MSI 2623495-edge      enp5s0f1-TxRx-6
 62:          0    1264706          0          0          0          0          0          0   PCI-MSI 2623496-edge      enp5s0f1-TxRx-7
 63:          0          0          0          0          0          0          0          7   PCI-MSI 2625536-edge      enp5s0f2
 64:   18264909          0          0          0          0          0          0          0   PCI-MSI 2625537-edge      enp5s0f2-TxRx-0
 65:          0   17995682          0          0          0          0          0          0   PCI-MSI 2625538-edge      enp5s0f2-TxRx-1
 66:          0          0   17641425          0          0          0          0          0   PCI-MSI 2625539-edge      enp5s0f2-TxRx-2
 67:          0          0          0   18821332          0          0          0          0   PCI-MSI 2625540-edge      enp5s0f2-TxRx-3
 68:          0          0          0          0   18104087          0          0          0   PCI-MSI 2625541-edge      enp5s0f2-TxRx-4
 69:          0          0          0          0          0   18161722          0          0   PCI-MSI 2625542-edge      enp5s0f2-TxRx-5
 70:          0          0          0          0          0          0   16260403          0   PCI-MSI 2625543-edge      enp5s0f2-TxRx-6
 71:          0          0          0          0          0          0          0   18749833   PCI-MSI 2625544-edge      enp5s0f2-TxRx-7
 72:          0          0          0          0          0        659          0          0   PCI-MSI 442368-edge      snd_hda_intel:card0
 82:          0          0          0          0          0          0    4722227          0   PCI-MSI 524288-edge      nvkm
NMI:       1091       1115       1149       1114       1058       1037       1089       1026   Non-maskable interrupts
LOC:         17         16         16         16         16    4747911    4963347    4649474   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:       1091       1115       1149       1114       1058       1037       1089       1026   Performance monitoring interrupts
IWI:          0          0          3          2          1          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:     770012     678623     661112     654779     633194     738232     735896     745955   Rescheduling interrupts
CAL:      68649      67826      68639      68468      68503      68563      68903      68773   Function call interrupts
TLB:      90286      89945      90338      90837      90557      91443      91874      91422   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:        174        175        175        175        175        175        175        175   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

from raspberry-pi-pcie-devices.

wallace11 avatar wallace11 commented on May 5, 2024

Earlier this year I did a little write up about IRQs on the Odroid XU4, which is another popular SBC.
It's probably not going to bring any news, since IRQs and the irqbalance tool have already been discussed here but I thought I'd throw a link in for anyone who wants to see all the information summed up in one place.
https://my-take-on.tech/2020/01/12/setting-irq-cpu-affinities-to-improve-performance-on-the-odroid-xu4/

On another note, I haven't gone through the whole thread so I don't know if this was mentioned already, but I have experienced in the past hardware issues with some bad Linux drivers that I couldn't get tho perform as it should. All that, while a Windows machine could get the top performance, as expected.
So in order to rule this possibility out, I'd first try to stick the card into some desktop to see that IT can reach 5gbps.

from raspberry-pi-pcie-devices.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on May 5, 2024

Transferring 1500 byte Ethernet frames via PCIe DMA on a PCIe gen 2 x1 link with 128 byte max TLP size has a theoretical maximum bandwidth considering all protocol overheads of about 3.0 Gbps if you want to run both RX and TX at the same time, or about 3.3 Gbps if you want to run RX or TX in isolation. Overclocking the CPU may also overclock the PCIe bus, depending on exactly how the overclocking was done. No software configuration changes will improve the situation here, you either need a gen 3 link or more lanes to do any better.

tl;dr: you got 3.0 Gbps, you're done. It's not possible to do any better as this is the limit of what you can push over PCIe.

Indeed it is reasonable if it is just the I340-T4 cranking out 3Gbps on itself. But the case here is that it is the I340-T4 and the RGMII/SGMII BCM54210 in the SoC together cranking out 3.06Gbps. The bottleneck at the CPU, as shown above, may very likely be the culprit here.

from raspberry-pi-pcie-devices.

6by9 avatar 6by9 commented on May 5, 2024

Intel drivers should be in the mainline Linux kernel.
[ 0.930801] pci 0000:01:00.0: [8086:150e] type 00 class 0x020000
8086 is Intel's Vendor ID.
150e is the Product ID.

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/igb/e1000_hw.h#L28 looks hopeful
#define E1000_DEV_ID_82580_COPPER 0x150E

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/Kconfig#L87

config IGB
	tristate "Intel(R) 82575/82576 PCI-Express Gigabit Ethernet support"

I suspect the help text just hasn't been updated to include 82580 and I340.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@GreenReaper - Good sleuthing! I was suspicious it may not be supported since my efforts to make any irq changes were fruitless, but it's good to know it's not just my mistakes.

I think getting ~3.0-3.2 Gbps through PCIe is about as good as it's gonna get, and 3.4 Gbps when combined with the internal interface is pretty awesome.

I think I could get more if I could overclock further, but it doesn't seem possible currently.

from raspberry-pi-pcie-devices.

russor avatar russor commented on May 5, 2024

If the driver isn't able to support separate MSI interrupts per PCI functions, it's still possible that you can configure the built in NIC (which doesn't seem connected to the PCIe bus?) to interrupt a different cpu than the igb NICs. If that works, maybe you can still get 3 Gbps over PCIe, and closer to 1G over the built in NIC.

Also worth trying (depending on your level of investment): the Intel drivers support Interrupt Moderation, it looks like you can set InterruptThrottleRate as a module parameter to play with that. Reducing interrupt rate can increase throughput at the cost of latency. The default value (1) uses a dynamic algorithm to adjust the maximum interrupt rate, with a floor on the limit of 4000 interrupts/second; setting a fixed value of 2000 or 1000 might be interesting.

Also, it's a terrible idea, but I wonder if forcing legacy interrupts (INTx) might allow for separate interrupts; the intel driver should accept IntMode=0 to force legacy interrupts.

(From what I can tell on my random amd64 system, the in-tree igb driver may not support these parameters, but the intel documentation mentions it specifically, since you've installed the driver from out of tree, you should be able to adjust them. https://downloadmirror.intel.com/20927/eng/e1000.htm ... also see the Multiple Interfaces on Same Ethernet Broadcast Network note, which may be of interest to you)

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

It looks like the Pi's default Broadcom network driver doesn't allow reconfiguring the MTU, so it's locked in at 1500. To get a higher MTU, you have to recompile the kernel with the changes outlined by waryishe on the Pi forums here: increase MTU to 9000 on Raspberry Pi 4's built-in Ethernet interface.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Ooh... I finally successfully cross-compiled Raspberry Pi OS Linux from my Mac (inside VirtualBox) over to one of the Pi 4s. Adding a Vagrantfile and README to assist others with that process very soon... should probably also do a video on it, or talk about the process in the follow-up networking video.

2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN group default qlen 1000
    link/ether dc:a6:32:02:6f:de brd ff:ff:ff:ff:ff:ff

Edit: But it seems like it might not be working — plugging it direct into the CM4 I get DOWN still. Might have to do a little more digging, was hoping for a quick win, but it's never easy, is it?

Edit 2: Hahaha, it helps if you plug in the network card, doesn't it?

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 203.0.113.7  netmask 255.255.255.0  broadcast 203.0.113.255
        inet6 fe80::7f85:35b4:5263:1b54  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:02:6f:de  txqueuelen 1000  (Ethernet)
        RX packets 266  bytes 49777 (48.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 181  bytes 30761 (30.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

from raspberry-pi-pcie-devices.

mayli avatar mayli commented on May 5, 2024

Does that number includes the onboard NIC? Quite impressive since there folks pointing out it could be the bottleneck of the pcie x1

from raspberry-pi-pcie-devices.

JulyIghor avatar JulyIghor commented on May 5, 2024

Ooh... I finally successfully cross-compiled Raspberry Pi OS Linux from my Mac (inside VirtualBox) over to one of the Pi 4s.

Using VM is always slower than run under host.
You can compile gcc to cross-compile any target architecture directly on a macOS.
Most important thing that you have to use case-sensitive file system to compile a linux kernel.
I do have cross compilers for various target CPUs on macOS and that is working great with no VMs and with gdb support.
Let me know if you wish to know more, I'll be happy to share instructions.

from raspberry-pi-pcie-devices.

chinmaythosar avatar chinmaythosar commented on May 5, 2024

2.5

So about this STH has this adapter reviewed (https://www.servethehome.com/syba-dual-2-5-gigabit-ethernet-adapter-review/) that could make it possible to do two 2.5Gbps ports. I don't know how much the CPU can handle but that would effectively give you two 2.5Gbps ports for routing with a cheap switch (https://www.anandtech.com/show/15916/at-last-a-25gbps-consumer-network-switch-qnap-releases-qsw11055t-5port-switch) and the original port for ISP ... In fact I actually would be interested in making a board with just that. Compute module, the ASMedia PCI switch connected to two Intel 2.5 NICs, the third gigabit port, a Micro SD slot for those without EMMC, and a barrel connector so that you can find cheap power adapters. Currently same can be achieved with that adapter and compute module board but would be great to fit all that in size of EdgeRouterX box.

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

FYI I'm testing a 2.5GbE adapter in #40.

from raspberry-pi-pcie-devices.

chinmaythosar avatar chinmaythosar commented on May 5, 2024

FYI I'm testing a 2.5GbE adapter in #40.

Just checked that thread ... awesome thanks ...

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

Closing issues where testing is at least mostly complete, to keep the issue queue tidy.

from raspberry-pi-pcie-devices.

jordibiosca avatar jordibiosca commented on May 5, 2024

So first quick result, firing off each benchmark one by one:

1. 626 Mbps

2. 612 Mbps

3. 608 Mbps

4. 606 Mbps

5. 614 Mbps

Total: 3.06 Gbps

Then I tried just hitting three interfaces on the NIC only, and got:

1. 935 Mbps

2. 942 Mbps

3. 941 Mbps

Total: 2.82 Gbps

Then hitting all four interfaces on the NIC:

1. 768 Mbps

2. 769 Mbps

3. 769 Mbps

4. 768 Mbps

Total: 3.07 Gbps

Hitting three on the NIC and one on the Pi:

1. 714 Mbps

2. 709 Mbps

3. 709 Mbps

4. 709 Mbps

Total: 2.84 Gbps

So it seems like there's an upper ceiling around 3 Gbps for total throughput you can get through a Compute Module 4's PCIe slot... and when you couple that with the onboard network interface, it seems like there must be a tiny bit more overhead (maybe the Broadcom Ethernet chip isn't quite as efficient, or maybe having the kernel have to switch between the Intel chip and the Broadcom chip results in that 8% bandwidth penalty?).

I ran all of these sets of tests twice, and they were extremely close in each case.

Now I'm interested to see if I'd be able to pump any more bits through this thing with a 10GbE adapter, or if 2.5 GbE over any individual interface is about as high as it'll go before we hit limits.

Hey Jeff!

Very interesting and inspiring. Actually, this Ethernet Server Adapter allows for HW timestamping and therefore more precise values when looking at latency performance, will maybe be another use-case where this setup could be interesting. But my question is, if it would be possible for the raspberry to also handle at the same time traffic coming from the USB3.0 port?

Jordi

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@jordibiosca - Unfortunately, when you go through USB 3.0, the bus overhead slows things down and you get a lot less bandwidth/throughput.

from raspberry-pi-pcie-devices.

maruohon avatar maruohon commented on May 5, 2024

I'm currently debating changing my current home server setup, which is just one Intel i7-7700K based PC, to a combination of a "main server" of a Pi 4 Compute 8 GB RAM, with the i7-7700K being a secondary server only used for some heavier loads (game servers), which would be in sleep most of the time and only woken up when it's actually needed.

Towards this goal I would need the Pi to have 4 Ethernet ports (1 to connect to the outside internet, 2 to share the connection to the existing other PCs, and the fourth for the i7-7700K which would join the "inside network"), which is why this Intel board is interesting to me.

But for this change to actually make any sense for me, what I'm mostly interested in is the power consumption of

    1. the Pi 4 Compute including the i/o board
    1. how much does the 4-port Intel Ethernet board take on top of that?

And how much does that consumption change between idle and full load, bot for the Pi's CPU and also for the Intel Ethernet card when under heavy network load. I don't really remember seeing power consumption being featured in your Youtube videos for the Pi based systems, but that is one thing that would be interesting at least for me.

If you were to do some power consumption testing, then also be aware of some (cheap?) power meters not really taking into account the power factor, which might throw off the results at least if the measured power supply does not do much or any power factor correction. (I've gotten some super funky results from some PC systems with certain power meters like the PM-300).

from raspberry-pi-pcie-devices.

geerlingguy avatar geerlingguy commented on May 5, 2024

@maruohon - I haven't done a power meter with this card installed, but typically for other devices (I tested with 2.5G and dual 2.5G), the Pi uses a maximum of 15-20W (10-15W at idle, or 5-10W without a PCIe card). Only hard drives and more intense storage-related cards seem to really push that up higher.

from raspberry-pi-pcie-devices.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.