Git Product home page Git Product logo

disni's People

Contributors

asqasq avatar lynus avatar patrickstuedi avatar pepperjo avatar petro-rudenko avatar the-alchemist avatar vladsokolovsky avatar yuvaldeg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

disni's Issues

work completion event received with wrong value

we tried implementing sever client on top of send recv operations and noticed that sometimes we get a receive completion event following a send operation (or vise versa)

I've managed to reproduce the issue on top of RDMAvsTcpBenchmark ( changed files attached) by saving the last completion event and compering the new event against the old one.
we expect that events will always be send-recv-send-recv-... and found that we get two consecutive similar events.

server output:
2305 [Thread-0] INFO com.ibm.disni - got event type + RDMA_CM_EVENT_ESTABLISHED, srcAddress /192.168.33.137:8881, dstAddress /192.168.33.137:51085
RDMAvsTcpBenchmarkServer::client connection accepted
2308 [Thread-1] INFO com.ibm.disni - cq processing, caught exception but keep going server got IBV_WC_SEND event twice in a row. last id = 2000, current id 2000****
java.lang.RuntimeException: server got IBV_WC_SEND event twice in a row. last id = 2000, current id 2000****
at com.ibm.disni.examples.SendRecvServer$CustomServerEndpoint.dispatchCqEvent(SendRecvServer.java:212)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:37)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:29)
at com.ibm.disni.RdmaCqProcessor.dispatchCqEvent(RdmaCqProcessor.java:106)
at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:136)
at java.lang.Thread.run(Thread.java:745)
2309 [Thread-1] INFO com.ibm.disni - cq processing, caught exception but keep going server got IBV_WC_RECV event twice in a row. last id = 2001, current id 2001****
java.lang.RuntimeException: server got IBV_WC_RECV event twice in a row. last id = 2001, current id 2001****
at com.ibm.disni.examples.SendRecvServer$CustomServerEndpoint.dispatchCqEvent(SendRecvServer.java:212)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:37)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:29)
at com.ibm.disni.RdmaCqProcessor.dispatchCqEvent(RdmaCqProcessor.java:106)
at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:136)
at java.lang.Thread.run(Thread.java:745)

changedFiles.zip

Any specific reason why SVCPostRecv.RecvWRMod statefull is immutable?

It contrast to SVCPostSend.SendWRMod (which provides mutator methods for SGE, RDMA, and work request ID) interface RecvWRMod is completely immutable and limited to getWr_id(). Is there any specific reason for this?

If not, I can offer my help extending it to be similar to SendWRMod.

Need proper handling for RDMA_CM_EVENT_ROUTE_ERROR

I encountered in a situation where client randomly receive RDMA_CM_EVENT_ROUTE_ERROR, when client connects the server at a relatively high frequence. Disni does not handle this case properly, as it would block indefinitely in RdmaEndpoint.connect().
I think a new 'CONN_STATE_ERROR' CM state should be introduced to RdmaEndpoint and properly handled in connect() and close().

Use enum types instead of int

Use enum types instead of int e.g. in IbvWC. Requires an efficient implementation of valueOf, e.g. with switch/case or array lookup.

performance

I am using the RDMA benchmark code to perform a latency test for send/recv using 100Gb/s Mellanox cards directly connected.
Seeing from 100-700us for a small string (12 bytes).
But from qperf rc_lat I get about 4 us.
Is this what I should expect?

   sendBuf.asCharBuffer().put(msg);
    sendBuf.clear();
    postSend.getWrMod(0).getSgeMod(0).setLength(msg.length());
    postSend.execute();
    endpoint.getWcEvents().take();
    postRecv.execute();
    endpoint.getWcEvents().take();
    recvBuf.clear();

our target subsystem name does not follow the NVMe spec, but we already fixed with this, how can I change jNVMf code to connect our target well?

Hello buddy from jNVMf

below is our target configure, I know it is not follow the NVMeOF spec nqn name. but we need use jNVMf to connect our target. could you please point to me how to change the code. or you can help me do that?

o- ports ................................................................................................................... [...]
| o- 98 .................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 16746e7c-32b4-47a1-b5ce-3428dea3ba22 .............................................................................. [...]
| o- 99 .................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 3dd75b04-ff89-424e-b604-3767ce066dba .............................................................................. [...]
| o- 100 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 4011675b-d20c-463f-a92b-0c15dc5c43cb .............................................................................. [...]
| o- 101 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 7a25f7a6-7a3d-47d7-9605-e55ab5f1c295 .............................................................................. [...]
| o- 102 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 8280409b-0742-485a-839f-4b3de2f99c60 .............................................................................. [...]
| o- 103 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 838eef5d-a6b3-4f30-b2de-b060c894eef7 .............................................................................. [...]
| o- 104 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 8615c539-a49a-40c7-b6fe-4772d2b511b8 .............................................................................. [...]
| o- 105 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- 8753506b-c705-4daa-94c0-a9a456961db7 .............................................................................. [...]
| o- 107 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- ae0ced01-f442-428d-a5cb-9e03359a2652 .............................................................................. [...]
| o- 108 ................................................................................................................... [...]
| | o- referrals ........................................................................................................... [...]
| | o- subsystems .......................................................................................................... [...]
| | o- c87dc647-a7e7-4b6a-854d-331cae7d51c1 .............................................................................. [...]
| o- 109 ................................................................................................................... [...]
| o- referrals ........................................................................................................... [...]
| o- subsystems .......................................................................................................... [...]
| o- e36b9b1a-e2ec-4120-92f6-18f50e3b554e .............................................................................. [...]
o- subsystems .............................................................................................................. [...]
o- 16746e7c-32b4-47a1-b5ce-3428dea3ba22 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 98 ................................................................................................................ [...]
o- 3dd75b04-ff89-424e-b604-3767ce066dba .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 99 ................................................................................................................ [...]
o- 4011675b-d20c-463f-a92b-0c15dc5c43cb .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 100 ............................................................................................................... [...]
o- 7a25f7a6-7a3d-47d7-9605-e55ab5f1c295 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 101 ............................................................................................................... [...]
o- 8280409b-0742-485a-839f-4b3de2f99c60 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 102 ............................................................................................................... [...]
o- 838eef5d-a6b3-4f30-b2de-b060c894eef7 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 103 ............................................................................................................... [...]
o- 8615c539-a49a-40c7-b6fe-4772d2b511b8 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 104 ............................................................................................................... [...]
o- 8753506b-c705-4daa-94c0-a9a456961db7 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 105 ............................................................................................................... [...]
o- ae0ced01-f442-428d-a5cb-9e03359a2652 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 107 ............................................................................................................... [...]
o- c87dc647-a7e7-4b6a-854d-331cae7d51c1 .................................................................................. [...]
| o- allowed_hosts ....................................................................................................... [...]
| o- namespaces .......................................................................................................... [...]
| o- 108 ............................................................................................................... [...]
o- e36b9b1a-e2ec-4120-92f6-18f50e3b554e .................................................................................. [...]
o- allowed_hosts ....................................................................................................... [...]
o- namespaces .......................................................................................................... [...]
o- 109 ............................................................................................................... [...]
/>

"setting up protection domain, no context found" error when running the examples

I am using disni-1.6, SoftiWARP (dev branch), linux kernel 4.13.0-43. When I run the ReadServer example as follows:

cd target
java -Djava.library.path=/usr/local/lib -cp "*" com.ibm.disni.examples.ReadServer -a localhost

I get the following error:

log4j:WARN No appenders could be found for logger (com.ibm.disni). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: setting up protection domain, no context found at com.ibm.disni.rdma.RdmaEndpointProvider.createProtectionDomain(RdmaEndpointProvider.java:73) at com.ibm.disni.rdma.RdmaEndpointProvider.createProtectionDomain(RdmaEndpointProvider.java:58) at com.ibm.disni.rdma.RdmaEndpointGroup.createProtectionDomain(RdmaEndpointGroup.java:77) at com.ibm.disni.rdma.RdmaEndpointGroup.createProtectionDomainRaw(RdmaEndpointGroup.java:202) at com.ibm.disni.rdma.RdmaServerEndpoint.bind(RdmaServerEndpoint.java:90) at com.ibm.disni.examples.ReadServer.run(ReadServer.java:58) at com.ibm.disni.examples.ReadServer.launch(ReadServer.java:105) at com.ibm.disni.examples.ReadServer.main(ReadServer.java:110)

I get the same error if i use 127.0.0.1 intead of localhost. Following the discussion here, I used my Ethernet IP instead of the loopback address, but then I got a "binding server address failed" error:

log4j:WARN No appenders could be found for logger (com.ibm.disni). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: binding server address /192.168.1.106:1919, failed at com.ibm.disni.rdma.RdmaServerEndpoint.bind(RdmaServerEndpoint.java:85) at com.ibm.disni.examples.ReadServer.run(ReadServer.java:58) at com.ibm.disni.examples.ReadServer.launch(ReadServer.java:105) at com.ibm.disni.examples.ReadServer.main(ReadServer.java:110)

Changing the port number didn't help either. The root of the problem seems to be that rdmaServerEndpoint.getIdPriv().getVerbs() returns null but I couldn't figure out why.

Output of ibv_devices:

device          	   node GUID
------          	----------------
siw_lo          	7369775f6c6f0000

ibv_devinfo:

hca_id:	siw_lo
	transport:			iWARP (1)
	fw_ver:				0.0.0
	node_guid:			7369:775f:6c6f:0000
	sys_image_guid:			0000:0000:0000:0000
	vendor_id:			0x626d74
	vendor_part_id:			0
	hw_ver:				0x0
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		4096 (5)
			active_mtu:		4096 (5)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

Usage of IbvMr.lKey vs. rKey on remote side when setting up WRITE/READ operation

Hi Patrick,

I noticed that samples send local memory region's LKey to remote side. That is on remote side we use local LKey in SVCPostSend.RdmaMod.setRkey(). This seems to go against Verbs API (which dictates use of RKey). But the strange fact is - examples work fine. Is there any explanation for this?

For example, JVerbsReadClient.java, line 80.

Thanks, Andy

UNKNOWN, srcAddress /0.0.0.0:0

I'm attempting to run crail-spark.
I'm set up as a container running spark with workers, and attempting to just access crail store.
Running either crail fs -ls -R / or say terasort, I end up at the same error.

INFO disni: got event type + UNKNOWN, srcAddress /0.0.0.0:0, dstAddress /192.168.3.100:4420

I've set disni in crail to log DEBUG but I don't get any additional info.

It appears that DiSNI is attempting to set up a QP but is unable to determine the local rNIC's adddress? I do have the container setup as a bridged network, rather than a host network, which I'm guessing could be the issue? I do see all the RDMA nodes from the container however. I tried to use a host network but then got Spark errors because it can't have unique hostnames when attached to the host network.

$ ibv_devices
device node GUID
------ ----------------
i40iw0 0cc47afc00ed0000
mlx5_2 98039b0300989ab6
mlx5_0 98039b0300989b0e
i40iw1 0cc47afc00ec0000
mlx5_3 98039b0300989ab7
mlx5_1 98039b0300989b0f

Snippet of the console output prior to the hang:

19/06/12 08:49:10 INFO crail: CrailHadoopFileSystem construction
19/06/12 08:49:10 INFO crail: creating singleton crail file system
19/06/12 08:49:10 INFO crail: crail.version 3101
19/06/12 08:49:10 INFO crail: crail.directorydepth 16
19/06/12 08:49:10 INFO crail: crail.tokenexpiration 10
19/06/12 08:49:10 INFO crail: crail.blocksize 1048576
19/06/12 08:49:10 INFO crail: crail.cachelimit 0
19/06/12 08:49:10 INFO crail: crail.cachepath /dev/hugepages/cache
19/06/12 08:49:10 INFO crail: crail.user crail
19/06/12 08:49:10 INFO crail: crail.shadowreplication 1
19/06/12 08:49:10 INFO crail: crail.debug true
19/06/12 08:49:10 INFO crail: crail.statistics true
19/06/12 08:49:10 INFO crail: crail.rpctimeout 1000
19/06/12 08:49:10 INFO crail: crail.datatimeout 1000
19/06/12 08:49:10 INFO crail: crail.buffersize 1048576
19/06/12 08:49:10 INFO crail: crail.slicesize 524288
19/06/12 08:49:10 INFO crail: crail.singleton true
19/06/12 08:49:10 INFO crail: crail.regionsize 1073741824
19/06/12 08:49:10 INFO crail: crail.directoryrecord 512
19/06/12 08:49:10 INFO crail: crail.directoryrandomize true
19/06/12 08:49:10 INFO crail: crail.cacheimpl org.apache.crail.memory.MappedBufferCache
19/06/12 08:49:10 INFO crail: crail.locationmap
19/06/12 08:49:10 INFO crail: crail.namenode.address crail://192.168.1.164:9060
19/06/12 08:49:10 INFO crail: crail.namenode.blockselection roundrobin
19/06/12 08:49:10 INFO crail: crail.namenode.fileblocks 16
19/06/12 08:49:10 INFO crail: crail.namenode.rpctype org.apache.crail.namenode.rpc.tcp.TcpNameNode
19/06/12 08:49:10 INFO crail: crail.namenode.log
19/06/12 08:49:10 INFO crail: crail.storage.types org.apache.crail.storage.nvmf.NvmfStorageTier
19/06/12 08:49:10 INFO crail: crail.storage.classes 2
19/06/12 08:49:10 INFO crail: crail.storage.rootclass 0
19/06/12 08:49:10 INFO crail: crail.storage.keepalive 2
19/06/12 08:49:10 INFO crail: buffer cache, allocationCount 0, bufferCount 1024
19/06/12 08:49:10 INFO crail: Initialize Nvmf storage client
19/06/12 08:49:10 INFO crail: crail.storage.nvmf.ip 192.168.3.100
19/06/12 08:49:10 INFO crail: crail.storage.nvmf.port 4420
19/06/12 08:49:10 INFO crail: crail.storage.nvmf.nqn nqn.2018-12.com.StorEdgeSystems:cntlr13
19/06/12 08:49:10 INFO crail: crail.storage.nvmf.hostnqn nqn.2014-08.org.nvmexpress:uuid:1b4e28ba-2fa1-11d2-883f-0016d3cca420
19/06/12 08:49:10 INFO crail: crail.storage.nvmf.allocationsize 1073741824
19/06/12 08:49:10 INFO crail: crail.storage.nvmf.queueSize 64
19/06/12 08:49:10 INFO narpc: new NaRPC server group v1.0, queueDepth 32, messageSize 512, nodealy true
19/06/12 08:49:10 INFO crail: crail.namenode.tcp.queueDepth 32
19/06/12 08:49:10 INFO crail: crail.namenode.tcp.messageSize 512
19/06/12 08:49:10 INFO crail: crail.namenode.tcp.cores 1
19/06/12 08:49:10 INFO crail: connected to namenode(s) /192.168.1.164:9060
19/06/12 08:49:10 INFO crail: CrailHadoopFileSystem fs initialization done..
19/06/12 08:49:10 INFO crail: lookupDirectory: path /
19/06/12 08:49:10 INFO crail: lookup: name /, success, fd 0
19/06/12 08:49:10 INFO crail: lookupDirectory: path /
19/06/12 08:49:10 INFO crail: lookup: name /, success, fd 0
19/06/12 08:49:10 INFO crail: getDirectoryList: /
19/06/12 08:49:10 INFO crail: CoreInputStream: open, path  /, fd 0, streamId 1, isDir true, readHint 0
19/06/12 08:49:10 INFO crail: Connecting to NVMf target at Transport address = /192.168.3.100:4420, subsystem NQN = nqn.2018-12.com.StorEdgeSystems:cntlr13
19/06/12 08:49:10 INFO disni: creating  RdmaProvider of type 'nat'
19/06/12 08:49:10 INFO disni: jverbs jni version 32
19/06/12 08:49:10 INFO disni: sock_addr_in size mismatch, jverbs size 28, native size 16
19/06/12 08:49:10 INFO disni: IbvRecvWR size match, jverbs size 32, native size 32
19/06/12 08:49:10 INFO disni: IbvSendWR size mismatch, jverbs size 72, native size 128
19/06/12 08:49:10 INFO disni: IbvWC size match, jverbs size 48, native size 48
19/06/12 08:49:10 INFO disni: IbvSge size match, jverbs size 16, native size 16
19/06/12 08:49:10 INFO disni: Remote addr offset match, jverbs size 40, native size 40
19/06/12 08:49:10 INFO disni: Rkey offset match, jverbs size 48, native size 48
19/06/12 08:49:10 INFO disni: createEventChannel, objId 140229751834160
19/06/12 08:49:10 INFO disni: launching cm processor, cmChannel 0
19/06/12 08:49:10 INFO disni: createId, id 140229751892832
19/06/12 08:49:10 INFO disni: new client endpoint, id 0, idPriv 0
19/06/12 08:49:10 INFO disni: resolveAddr, addres /192.168.3.100:4420
19/06/12 08:49:10 INFO disni: got event type + UNKNOWN, srcAddress /0.0.0.0:0, dstAddress /192.168.3.100:4420

register memory that larger than 4GB

Recently, I extended disni enabling On Demand Paging (ODP) feature. With this feeature, it is practical to register large memory whose size should be hold in long type instead of int. In fact, native call ibv_reg_mr() use size_t which is 64-bit wide in 64-bit architecture system.
So what would think of breaking the api and updating IbvMr.length to long typed?

P.S. I encountered some pitfalls to enable ODP. API imcompatibility is just one them.

using rdma write to specified offset position of the remote buffer

When I use one-sided operations RDMA write, I attempted to perform partial writes on the buffer corresponding to the remote endpoint.

The code on the remote side is as follows:

ByteBuffer sendBuf = endpoint.getSendBuf();
IbvMr dataMr = endpoint.getDataMr();
sendBuf.putLong(dataMr.getAddr());
sendBuf.putInt(dataMr.getLength());
sendBuf.putInt(dataMr.getLkey());
sendBuf.clear();
endpoint.postSendExecute();
endpoint.takeEvent();
ByteBuffer dataBuf = endpoint.getDataBuf();
dataBuf.clear();
endpoint.postRecvExecute();
endpoint.takeEvent();
System.out.println("WriteServer::write from client 1: " + dataBuf.get());
System.out.println("WriteServer::write from client 2: " + dataBuf.get());
System.out.println("WriteServer::write from client 3: " + dataBuf.get());

The code on the local side is as follows:

endpoint.pollUntil();
ByteBuffer recvBuf = endpoint.getRecvBuf();
recvBuf.clear();
long addr = recvBuf.getLong();
int length = recvBuf.getInt();
int lkey = recvBuf.getInt();
recvBuf.clear();
System.out.println("WriteClient, receiving rdma information, addr " + addr + ", length " + length + ", lkey " + lkey + ", rkey " + rkey);
System.out.println("WriteClient, preparing read operation...");

IbvSendWR dataWR = endpoint.getDataWR();
dataWR.setWr_id(1001);
dataWR.setOpcode(IbvSendWR.IBV_WR_RDMA_WRITE);
dataWR.setSend_flags(IbvSendWR.IBV_SEND_SIGNALED);
dataWR.getRdma().setRemote_addr(addr);
dataWR.getRdma().setRkey(lkey);

ByteBuffer dataBuf = endpoint.getDataBuf();
dataBuf.clear();
dataBuf.put((byte)5);
endpoint.postDataExecute();
endpoint.pollUntil();

it's ok, remote side can read "5 0 0" from data buffer, write success.
But when I change dataWR.getRdma().setRemote_addr(addr) to dataWR.getRdma().setRemote_addr(addr+1) in local side, I thought remote side can read "0 5 0", it didn't, it read "0 0 0", meaning write failed.

I want to know what went wrong, why the local cannot write data to the specified offset position of the remote buffer?

Affinity >64

Cannot set affinity for hardware threads > 64 since affinity mask is a long

blocked when use tcp to connect rdma server

Hi team,

Normally, client & server use rdma connection working as well.

Abnormally, when client use tcp to connect the rdma server, the thread of rdma server side blocked.

Is there any options to reject the connection instead of blocked the thread,any idea?

执行configure 报错

[root@localhost libdisni]# ./configure --with-jdk=/usr/java/jdk1.8.0_191-amd64
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking how to print strings... printf
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... /usr/bin/sed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for mt... no
checking if : is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... no
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking dependency style of gcc... (cached) gcc3
checking how to run the C preprocessor... gcc -E
checking for ibv_get_device_list in -libverbs... no
configure: error: disni requires libibverbs

Problem with RDMAvsTcpBenchmark of DiSNI over SoftRoCE

Hello,

After a hard and long time spent trying to compile Soft-iWARP (without success), I found a module RXE (kernel: 4.9.0-8-amd64) and Soft-RoCE that works without any problems (sudo apt-get -t stretch-backports install rdma-core ibverbs-providers ibverbs-utils libibverbs-dev librdmacm-dev).

I've successfully compiled and installed DiSNI on the virtual machines (Debian on VirtualBox).

The basic, simple example (com.ibm.disni.examples.SendRecv*) works like a charm.
However, I have problem with com.ibm.disni.benchmarks.RDMAvsTcpBenchmark*

Sometimes it finishes as it should, sometimes it hangs. I've added some diagnostic outputs in the classes, and sometimes it hangs after 2000 iterations of the RDMA loop, sometimes after 1942 iterations and so on.
Then I have also changed the value that is being send. Client sends current iteration number, server sends the minus current iteration number. The received value is not always the next expected value (eg. 0, 1, 1, 3, 3, 4, ... or -1, -2, -4, -5, -5, -6, -7, -8, -9, ...)

I does not have a clue, where I should look what causes the problem so I'm unable to find a solution.

Best regards,
Marek

---cut---

server-101$ java -cp disni-2.0-jar-with-dependencies.jar:disni-tests.jar com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer -a 192.168.56.101 -k 3000
client-102$ java -cp disni-2.0-jar-with-dependencies.jar:disni-tests.jar com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient -a 192.168.56.101 -k 3000

...

jstack prints (GC and other JVM threads ommited) on the server:
"main" #1 prio=5 os_prio=0 tid=0x00007f577800a800 nid=0x773 waiting on condition [0x00007f577fa75000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f5c061f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer.runRDMA(RDMAvsTcpBenchmarkServer.java:116)  # clientEndpoint.getWcEvents().take();
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer.launch(RDMAvsTcpBenchmarkServer.java:159)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkServer.main(RDMAvsTcpBenchmarkServer.java:48)
"Thread-1" #10 prio=5 os_prio=0 tid=0x00007f5778189000 nid=0x784 runnable [0x00007f57575bc000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCqEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaVerbsNat.getCqEvent(RdmaVerbsNat.java:165)
        at com.ibm.disni.verbs.IbvCompChannel.getCqEvent(IbvCompChannel.java:77)
        at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:120)
        at java.lang.Thread.run(Thread.java:748)
"Thread-0" #9 prio=5 os_prio=0 tid=0x00007f57781f2800 nid=0x783 runnable [0x00007f575c1c8000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCmEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaCmNat.getCmEvent(RdmaCmNat.java:193)
        at com.ibm.disni.verbs.RdmaEventChannel.getCmEvent(RdmaEventChannel.java:75)
        at com.ibm.disni.RdmaCmProcessor.run(RdmaCmProcessor.java:68)
        at java.lang.Thread.run(Thread.java:748)

on the client:
"main" #1 prio=5 os_prio=0 tid=0x00007f274000a800 nid=0x748 waiting on condition [0x00007f2749ece000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f5c24ab0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient.runRDMA(RDMAvsTcpBenchmarkClient.java:115)  # endpoint.getWcEvents().take();
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient.launch(RDMAvsTcpBenchmarkClient.java:156)
        at com.ibm.disni.benchmarks.RDMAvsTcpBenchmarkClient.main(RDMAvsTcpBenchmarkClient.java:46)
"Thread-1" #10 prio=5 os_prio=0 tid=0x00007f2740209800 nid=0x759 runnable [0x00007f26f9bf4000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCqEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaVerbsNat.getCqEvent(RdmaVerbsNat.java:165)
        at com.ibm.disni.verbs.IbvCompChannel.getCqEvent(IbvCompChannel.java:77)
        at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:120)
        at java.lang.Thread.run(Thread.java:748)
"Thread-0" #9 prio=5 os_prio=0 tid=0x00007f2740202000 nid=0x758 runnable [0x00007f26f9cf5000]
   java.lang.Thread.State: RUNNABLE
        at com.ibm.disni.verbs.impl.NativeDispatcher._getCmEvent(Native Method)
        at com.ibm.disni.verbs.impl.RdmaCmNat.getCmEvent(RdmaCmNat.java:193)
        at com.ibm.disni.verbs.RdmaEventChannel.getCmEvent(RdmaEventChannel.java:75)
        at com.ibm.disni.RdmaCmProcessor.run(RdmaCmProcessor.java:68)
        at java.lang.Thread.run(Thread.java:748)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.