ido / libvma-old Goto Github PK

Automatically exported from code.google.com/p/libvma

License: Other

Python 0.56% Shell 4.74% C 56.52% C++ 37.83% Bison 0.35%

libvma-old's Introduction

Hi, I'm Ido Rosen. I work on things that intersect economics/finance, machine learning/"AI", statistics, optimization, and/or distributed systems. If you're trying to reach me, email 📫 is the best way. (If I don't respond within 48 hours, please try again - my spam filter can be over-eager.)

libvma-old's People

Contributors

Watchers

Forkers

grseb9s

libvma-old's Issues

VMA PANIC when IPv6 is disabled

What steps will reproduce the problem?
1. Disable IPv6 kernel module, by adding "install ipv6 /bin/true" to modprobe 
configuration.
2. Install MLNX_OFED and libvma
3. Run an App with LD_PRELOAD=libvma.so (e.g. netcat or telnet)

What is the expected output? What do you see instead?
Expected: app to work
Output: VMA_PANIC because /dev/infiniband/rdma_cm cannot be found

What version of the product are you using? On what operating system?
libvma 6.4.11;
MLNX_OFED 2.0-3.0.0-rhel6_4-x86-64
Scientific linux 6.4 (Carbon) - kernel 2.6.32-358.23.2.el6.x86_64

Please provide any additional information below.

OFED module RDMA-CM fails to load After installation, 

# service openibd restart
Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [FAILED]

Please open an issue in the http://bugs.openfabrics.org and attach 
/tmp/ib_debug_info.log

Original issue reported on code.google.com by [email protected] on 5 Nov 2013 at 2:09

Attachments:

ib_debug_info.log

wakeup: Add return_from_sleep() method

wakeup::remove_wakeup_fd() both decremented m_is_sleeping and removed
the wakeup fd from the epoll list.  In the vast majority of cases there
is no wakeup fd present which causes all blocking epoll_wait calls to
perform an unnecessary epoll_ctl() systemcall.

This change adds a return_from_sleep() method which handles the
decrementing of m_is_sleeping and moves the remove_wakeup_fd() call to
only happen when there is a wakeup fd present in the events returned by
epoll_wait().

This change additionally makes a small optimization to
epoll_wait_call::_wait() by using a for loop and not copying struct
epoll_event to shrink the array.

On my hardware this increases a netperf UDP_RR test with VMA_RX_POLL=0
and VMA_SELECT_POLL=0 from 89948 transactions per second to 94307
transactions per second.

Original issue reported on code.google.com by [email protected] on 31 Jan 2014 at 11:25

Attachments:

0001-wakeup-Add-return_from_sleep-method.patch

Add configure check for IBV_ACCESS_ALLOCATE_MR

Not all verbs versions appear to have IBV_ACCESS_ALLOCATE_MR so add a
configure check for it.  When it is not present we simply fall back on
non-contiguous memory alocations.

Original issue reported on code.google.com by [email protected] on 19 Dec 2013 at 11:00

Attachments:

0001-Add-configure-check-for-IBV_ACCESS_ALLOCATE_MR.patch

RDMAV_HUGEPAGES_SAFE set after ibv_fork_init()

Commit 0f4c81655332f94313020e667158bc56b983a4b4 "issue: 365538 Loading VMA with 
Redis server give a seg-fault" has moved setting RDMAV_HUGEPAGES_SAFE=1 to 
do_global_ctors() which occurs after ibv_fork_init() has run in main_init().  
This causes ibv_reg_mr() to fail when running with Huge Pages.

# LD_PRELOAD=/usr/lib64/libvma.so.6.5.9 ./timestamping eth4 SO_TIMESTAMPNS 
SOF_TIMESTAMPING_RX_SOFTWARE SOF_TIMESTAMPING_SOFTWARE
 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : VMA_VERSION: 6.5.9-0 Development Snapshot built on Jan 29 2014 14:06:15
 VMA INFO   : Cmd Line: ./timestamping eth4 SO_TIMESTAMPNS SOF_TIMESTAMPING_RX_SOFTWARE SOF_TIMESTAMPING_SOFTWARE
 VMA INFO   : Current Time: Wed Jan 29 14:06:36 2014
 VMA INFO   : Pid: 29456
sh: ofed_info: command not found
 VMA INFO   : OFED Version: � VMA INFO   : Architecture: x86_64
 VMA INFO   : Node: berbox13
 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]
 VMA INFO   : ---------------------------------------------------------------------------
calling ibv_fork_init()
Setting RDMAV_HUGEPAGES_SAFE
 VMA WARNING: ib_ctx_collection88:mem_reg_on_all_devices() Failure in mem_reg: addr=0x2aaaaac00000, length=361600063, mr_pos=0, mr_array[mr_pos]=0, dev=0x700080, ibv_dev=mlx4_0
 VMA WARNING: bpool[0x702f40]:247:register_memory() Failed registering memory, This might happen due to low MTT entries. Please refer to README.txt for more info

Original issue reported on code.google.com by [email protected] on 29 Jan 2014 at 8:32

Please explain VMA error

What steps will reproduce the problem?
1. run a ping


What is the expected output? What do you see instead?

VMA ERROR  : rfs[0x7fe18ff494c0]:186:create_ibv_flow() Create of QP flow ID 
failed with flow dst:172.17.38.36:60005, src:205.209.217.135:1025, protocol:UDP


What version of the product are you using? On what operating system?

Linux aurarb01 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 
x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.5 (Santiago)
VMA 6.5.9

Please provide any additional information below.

[denis@aurarb01 ~]$ sudo LD_PRELOAD=libvma.so ping 205.209.217.135
 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : VMA_VERSION: 6.5.9-0 Release built on 2013-12-23-16:14:27
 VMA INFO   : Cmd Line: ping 205.209.217.135
 VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]
 VMA INFO   : ---------------------------------------------------------------------------
mlx4: prefer_bf=1
mlx4: prefer_bf=1
 VMA ERROR  : rfs[0x7fe18ff494c0]:186:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.36:60005, src:205.209.217.135:1025, protocol:UDP
PING 205.209.217.135 (205.209.217.135) 56(84) bytes of data.
64 bytes from 205.209.217.135: icmp_seq=1 ttl=60 time=0.086 ms
64 bytes from 205.209.217.135: icmp_seq=2 ttl=60 time=0.049 ms
64 bytes from 205.209.217.135: icmp_seq=3 ttl=60 time=0.038 ms
64 bytes from 205.209.217.135: icmp_seq=4 ttl=60 time=0.042 ms
64 bytes from 205.209.217.135: icmp_seq=5 ttl=60 time=0.038 ms

Original issue reported on code.google.com by [email protected] on 10 Apr 2014 at 4:23

handle SOCK_NONBLOCK and SOCK_CLOEXEC socket() flags

Since 2.6.27 the socket() systemcall accepts SOCK_NONBLOCK and
SOCK_CLOEXEC as flags.

Original issue reported on code.google.com by [email protected] on 6 Feb 2014 at 11:12

Attachments:

0001-handle-SOCK_NONBLOCK-and-SOCK_CLOEXEC-socket-flags.patch

recvfrom_zcopy in epoll_wait loop

In epoll_wait loop with multiple FDs registered should I call recvfrom_zcopy 
once only?

Otherwise if there is a continuous stream of UDP packets incoming it always 
returns recvfrom_zcopy > 0 and I never get any chance to get to the other FD.

See attached file stest.c - line #91. Is it correct (provides optimal 
performance) to attempt to receive only once?

Thanks

Original issue reported on code.google.com by [email protected] on 13 Jul 2014 at 10:07

Attachments:

stest.c

vlogger: pass buffer as string in fprintf


The buf may contain special characters like '%' which may be interpreted
by fprintf.  As an example this can occur when the command run under
VMA has arguments that are escaped for example:

LD_PRELOAD=libvma.so /bin/echo 
"ts_keywords=type:tr_delay%20host:foo%20venue:isld%20source:nasdaq"

Always pass the buffer as a string.

Original issue reported on code.google.com by [email protected] on 3 Jan 2014 at 11:52

Attachments:

0001-vlogger-pass-buffer-as-string-in-fprintf.patch

Support for 802.3ad bonding mode

What need is this feature going to satisfy?
1. support for 802.3ad bonding mode

What is the expected output? What do you have currently? Any workaround?
Currently active-backup mode only is availaible with fail_over_mac=1

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 13 Aug 2014 at 2:31

Expand raw packet QP error message to include alternatives

If you don't have disable_raw_qp_enforcement you can still run as root
or with CAP_NET_RAW.

Original issue reported on code.google.com by [email protected] on 19 Dec 2013 at 11:02

Attachments:

0001-Expand-raw-packet-QP-error-message-to-include-altern.patch

Please advise on optimal UDP/TCP traffic distribution

Dear Or,

Would be very grateful if you could advise on what'd be optional. I've two MLNX 
cards and four 10GB lines to the switch.

I need to receive a lot of multicast data combined to logical channels. Most of 
the data in a channel is incremental A multicast group and B MC group (there 
could be ~100-10000 msgs/sec).

Then I need to send short TCP packet or series of TCP packets (0.5-10 msgs/sec).

The question is - should I stream all the multicast data RX through one MLNX 
card and do TCP TX through another or it's better to consume A multicast group 
on one MLNX card and B multicast on another card?

Target latencies ~3-10usec.

Does it even matter to the hardware/firmware/software - the separation of 
traffic I'm talking about? Or any combination of routing would yield pretty 
much the same performance?

Thx
Denis

Original issue reported on code.google.com by [email protected] on 28 Apr 2014 at 8:31

Support IPv6 (AF_INET6(10))

What need is this feature going to satisfy?
1. open socket(type = AF_INET6(10)) 
2. use IPv6 addressing to: bind(), connect(), sendto(), ...
3.

What is the expected output? What do you have currently? Any workaround?
Provide IPv6 support for recv and send of offloaded traffic via VMA


Please use labels and text to provide additional information.
See more info in http://man7.org/linux/man-pages/man7/ipv6.7.html

Original issue reported on code.google.com by [email protected] on 19 Aug 2014 at 8:58

Silence ofed_info: command not found error

Now that flow steering is making its way into upstream you won't need to have 
OFED and thus won't have ofed_info.

 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : VMA_VERSION: 6.6.0-0 Development Snapshot built on Feb 12 2014 09:17:53
 VMA INFO   : Cmd Line: netperf --help
 VMA INFO   : Current Time: Fri Mar  7 14:39:13 2014
 VMA INFO   : Pid: 30353
sh: ofed_info: command not found
 VMA INFO   : OFED Version: � VMA INFO   : Architecture: x86_64
 VMA INFO   : Node: berbox2
 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]
 VMA INFO   : ---------------------------------------------------------------------------

I've attached a patch that simply sends stderr to /dev/null to swallow the 
"command not found" error.  There are probably a number of other solutions to 
this problem, but this is the quick easy one.

Original issue reported on code.google.com by [email protected] on 7 Mar 2014 at 10:10

Attachments:

0001-Silence-ofed_info-command-not-found-error.patch

Delete rx channel from global fd collection

Previously rx channel fds were never explicitly removed from the global
fd collection, and the m_p_n_rx_channel_fds member was leaked.  Since
the rx channel fds were never removed, a subsequent call to
add_cq_channel_fd() with a new fd that happens to match an old no longer
existing rx channel fd, would trigger the removal of the fd from the
global collection and would not add the new fd.

This removes the rx channel fd from the global fd collection as the
completion channels are destroyed, and frees the m_p_n_rx_channel_fds
member so that it is not leaked.

Original issue reported on code.google.com by [email protected] on 17 Jan 2014 at 6:40

Attachments:

0001-Delete-rx-channel-from-global-fd-collection.patch

Add receive software packet timestampping support

These patches update libvma to support software receive packet timestampping, 
with SO_TIMESTAMP, SO_TIMESTAMPNS, and SO_TIMESTAMPING.

Additionally I'll mention that I really would like to have hardware packet 
timestamps but it is my understanding that the infrastructure to get and 
synchronize the hardware timestamps with the system time is still in 
development.

Original issue reported on code.google.com by [email protected] on 29 Jan 2014 at 7:32

Attachments:

Configurable message/packet number limits

Background - after Knight Capital 
http://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stock_trading_disruption 
software related trading loss of $400+ million incured in minutes a lot of 
people have concerns about out of control trading algorithms. Even ~500 packet 
a second sent by out of control algorithm could create losses in tens of 
millions before human operator would even notice.

Of course libvma is by no means proper place to put real risk management and 
it'd be just one of the measures / lines of defence between trading algorithm 
and the exchange. Idea is to limit number of packets sent, it's the most 
damaging scenario - infinite loop in application sending orders. Of course the 
application layer would do all possible to prevent it but any additional layer 
would help.

Other network layers can provide some protection features (through QoS) but 
switches and OFED are MB/s oriented not packet/s oriented typically. Linux 
kernel has "tc" (traffic control) but I'm sure it'd incur very much latency and 
VMA can't be used. 

From technical perspective I see it as a counter(s) per application / remote 
destination configured in VMA conf . Once certain threshold value exceeded 
socket needs to be closed.

1. total number of packets (application would typically write entire trading 
order at once so number of socket write calls is okay) since some point (socket 
open or midnight local time) allowed, once limit exceeded please close the 
socket.
2. even better would be a limit on messages per unit of time, this value could 
be computed for example as exponential moving average (similar to Linux load 
average) per application and network destination.
3. this could impact latency somewhat but we could count a certain pattern in 
outgoing application flow i.e. "8=FIX.4.2" and count these instead of number of 
socket writes

Features 2, 3 are great to have but at least 1 would be of great help.
This patch could be in long term a marketing advantage of Mellanox.

Best
Denis

Original issue reported on code.google.com by [email protected] on 30 Mar 2014 at 10:26

Support AF_PACKET(17)

What need is this feature going to satisfy?
1. open socket(type = AF_PACKET(17)) 
2.
3.

What is the expected output? What do you have currently? Any workaround?
Expected: VMA to offload the traffic. 
Receive and send all raw packet's via VMA offloaded interface


Please use labels and text to provide additional information.
See more info in http://man7.org/linux/man-pages/man7/packet.7.html

Original issue reported on code.google.com by [email protected] on 19 Aug 2014 at 8:54

Add "source specific multicast" support to use with IP_ADD_SOURCE_MEMBERSHIP

Currently VMA only support IGMPv2 with IP_ADD_MEMBERSHIP.

Original issue reported on code.google.com by [email protected] on 1 Oct 2013 at 2:20

VMA does not honor multiple routing tables with policy-based routing

What steps will reproduce the problem?
1. Given two local interfaces, mlx0 and mlx1, give them IP addresses in the 
same subnet.
2. Set up the proper routing tables and rules.
3. Write a simple test application; create a TCP socket, bind() it to a 
specific local IP address and try to connect() to an external IP like 8.8.8.8. 
Doing this without loading VMA succeeds. Once you try this with VMA it fails 
finding the gateway for that particular interface and is not able to connect, 
giving errors about not being able to find the destination gateway and 
subsequently saying it can't lookup the neigh for the destination. For the 
record, this fails with SO_BINDTODEVICE as well.

An example of the whole process:

sysctl net.ipv4.conf.all.arp_ignore=1
sysctl net.ipv4.conf.all.arp_announce=2

ip link set dev mlx0 up
ip link set dev mlx1 up

ip addr add dev mlx0 10.0.0.201/24 brd +
ip addr add dev mlx1 10.0.0.202/24 brd +

ip route add 10.0.0.0/24 dev mlx0 proto kernel scope link src 10.0.0.201 table 
10
ip route add 10.0.0.0/24 dev mlx1 proto kernel scope link src 10.0.0.202 table 
20

ip route add default via 10.0.0.254 dev mlx0 table 10
ip route add default via 10.0.0.254 dev mlx1 table 20

ip rule add from 10.0.0.201 pri 10000 lookup 10
ip rule add from 10.0.0.202 pri 10000 lookup 20

What is the expected output? What do you see instead?
The expected output is that VMA will lookup the proper route using the routing 
rules and the proper table (lookup rule for src ip 10.0.0.202 for example, see 
it resides in routing table 20, lookup routing table 20 and find the default gw 
there). Instead, VMA fails to find the proper route because it only looks at 
the main routing table and does not consider routing rules at all.

What version of the product are you using? On what operating system?
libvma v6.6.0, Debian with kernel 3.12-rt

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 16 Jun 2014 at 11:12

net_device_table_mgr: Remove unnecessary epoll_wait

After epolling the global ring epoll fd and handling any ready CQ
channels the code does a second zero timeout epoll_wait() on the global
ring epoll fd.  This second epoll_wait() must be unnecessary since both
the return of epoll_wait() and the returned epoll_event array which is
stored on the stack are ignored.  Additionally to my knowledge there are
no secondary side effects by calling epoll_wait() so the comment
implying this is to "clear the nested set fd" must be erroneous or the
code must not be doing what the author intended.

Results of a blocking udp_lat test with and without this change:

Without change
$ VMA_SELECT_POLL=0 VMA_RX_POLL=0 LD_PRELOAD=libvma.so udp_lat -c -t 10 -f 
udp_lat_ips -F epoll
7.138 usec

Removing extra epoll_wait()
$ VMA_SELECT_POLL=0 VMA_RX_POLL=0 LD_PRELOAD=libvma.so udp_lat -c -t 10 -f 
udp_lat_ips -F epoll
6.827 usec

Original issue reported on code.google.com by [email protected] on 4 Feb 2014 at 6:08

Attachments:

0001-net_device_table_mgr-Remove-unnecessary-epoll_wait.patch

VMA_RX_POLL=-1 VMA_SELECT_POLL=-1 with thread affinity

What steps will reproduce the problem?
1. I've three TCP sessions, each one have own receiving thread (non-blocking)
2. Each receiving thread have affinity set to the same CPU (i.e. all have 
cpumask = 3 for example)
3. VMA_RX_POLL=-1 VMA_SELECT_POLL=-1
4. Application is on SCHED_FIFO

What is the expected output? What do you see instead?

Seems like everything hangs. If I leave only VMA_RX_POLL=-1 then it might 
occasionally (very rarely) hang as well.

Is it possible to selectively apply different VMA_RX/SELECT_POLL to UDP and TCP?

What version of the product are you using? On what operating system?

VMA 6.6.4 OFED 2.2 RHEL 6.5

Original issue reported on code.google.com by [email protected] on 8 Jul 2014 at 1:46

Blocked bs Non-blocked sockers

As far as I understand if I see in vma_stats
"UDP, Non-blocked, MC Loop Enabled , MC IF = [172.17.38.36]"

It means the socket is in "Non-blocked" mode and therefore RX_POLL does not 
apply and there is little performance improvement via VMA, nothing is done on 
RX path. Is it correct? "Blocked" is mandatory to achieve decent results?

Original issue reported on code.google.com by [email protected] on 28 Apr 2014 at 4:02

VMA ERROR when running on alias to VLAN interface

What steps will reproduce the problem?
1. bring up VLAN interface (e.g. eth0.10) with IP (e.g. 192.168.1.100)
2. bring up alias on previously created VLAN (e.g. eth0.10:2) with IP (e.g. 
192.168.1.200)
3. run application with VMA (e.g. LD_PRELOAD=libvma.so iperf -u -c 192.168.1.1)

> What is the expected output?

 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : VMA_VERSION: 6.7.2-0 Release built on 2014-08-21-15:28:25
 VMA INFO   : Cmd Line: iperf -u -c 192.168.1.1
 VMA INFO   : OFED Version: MLNX_OFED_LINUX-2.3-1.0.1:
 VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]
 VMA INFO   : ---------------------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.1.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 4.00 MByte (default)

> What do you see instead?

 VMA INFO   : ---------------------------------------------------------------------------
 VMA INFO   : VMA_VERSION: 6.7.2-0 Release built on 2014-08-21-15:28:25
 VMA INFO   : Cmd Line: iperf -u -c 192.168.1.1
 VMA INFO   : OFED Version: MLNX_OFED_LINUX-2.3-1.0.1:
 VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]
 VMA INFO   : ---------------------------------------------------------------------------
 VMA ERROR  : utils:229:priv_read_file() ERROR while opening file /sys/class/net/eth0.10/device/resource
------------------------------------------------------------
Client connecting to 192.168.1.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 4.00 MByte (default)


> What version of the product are you using? On what operating system?
MLNX_OFED_LINUX-2.3-1.0.1-ubuntu12.04-x86_64
Ubuntu 12.04 amd64
Kernel 3.13.0-36-generic


> Please provide any additional information below.
If bind software to aliased address it crashed:

 VMA ERROR  : utils:229:priv_read_file() ERROR while opening file /sys/class/net/eth0.10/device/resource
 VMA PANIC  : ring[0x7f8c68040630]:170:create_resources() ibv_create_comp_channel for tx failed. m_p_tx_comp_event_channel = (nil) (errno=9 Bad file descriptor)

Original issue reported on code.google.com by [email protected] on 8 Oct 2014 at 6:54

ido / libvma-old Goto Github PK

libvma-old's Introduction

libvma-old's People

Contributors

Watchers

Forkers

libvma-old's Issues

Recommend Projects

Recommend Topics

Recommend Org