When I use one-sided operations RDMA write, I attempted to perform partial writes on t

Oh, that's exactly what I missed, the main problem! The length. <code class="notransla

using rdma write to specified offset position of the remote buffer about disni HOT 5 OPEN

ShiningChuang commented on August 24, 2024

using rdma write to specified offset position of the remote buffer

from disni.

Comments (5)

PepperJo commented on August 24, 2024 2

Ok, it seems you addressed most of my issues raised.
When you do increase the remote address do you also decrease the length by 1? Otherwise you might try to write outside the area? It seems you checked this already given your statement above.
Do you check each completion queue result if it was successful? If you get an error there that might be easier to debug.

from disni.

PepperJo commented on August 24, 2024

I see a few problems with your code. On the remote side:

You should always send the Rkey and not the Lkey to a remote side (although for most devices these keys are identical)
You should clear the dataBuf before you send the buffer information (address, length, rkey) to the remote side. Otherwise, the remote write could happen just before you clear the buffer.
The post receive is not necessary here. In fact it will not do anything. You only need to post receive buffers when you are using SEND at the remote side but for WRITE you don't need to post a receive buffer. If you want to use SEND, you need to make sure that the receive buffer is posted before you send the buffer information. Otherwise, it can happen that the remote side issues a send but there is no buffer posted yet.
You assume that after you got the receive event the data should be in the buffer but the (first) event only tells you that the receive buffer has been posted and you will never receive an event that you received data into this buffer because as explained above you are not using SEND on the remote side.

The local side looks ok just be aware that the event you get after a successful WRITE only indicates that the WRITE has been successfully sent to the remote side but it doesn't necessarily mean it has completed on the remote side (meaning placed into memory).

You should try to understand the difference between one-sided (WRITE) and two-sided operations (SEND), so that you can see which operation makes the most sense in your case. Generally speaking one-sided operations are faster but harder to use.

from disni.

ShiningChuang commented on August 24, 2024

Thanks for your correction! I will explain based on your point.

I know I should send rKey, but this code was rewritten based on src/test/java/com/ibm/disni/benchmarks/ReadServer.java, which write sendBuf.putInt(dataMr.getLkey()) in line 161. Considering rkey and lkey are equal and this is not the main problem, so I ignored the problem.
This does bring about the problem of "remote write could happen just before I clear the buffer". I will pay attention to this point later, but this is also not the main problem.
I know that post reception is unnecessary for WRITE. I just want the sender to write the data and then notify the receiver through SEND. Because I'm worried that receiver starting to read before the sender writes. Maybe it’s not something to worry about? Also, this is not the main problem.
I understand this point.

But these all seems are not the main problem causing receiver can't read any data when I change dataWR.getRdma().setRemote_addr(addr) to dataWR.getRdma().setRemote_addr(addr+1).

I originally thought that rkey only had access rights to the first address of the remote data buffer, so when I change the address to addr+1, I can't access it using rkey and I should access it using the key of addr+1. But after checking the rdma information, I found that rkey should have access rights to the addresses in the entire area [address, address+len), so this was not the problem.

Then I suspect that the memory space of the data buffer on the remote side is not continuous, and addr+1 does not correspond to dataBuf.position(1). But I checked that the dataBuf is allocated through ByteBuffer.allocateDirect(), and it should be a continuous memory. I checked the continuity of this address through Unsafe, and it was indeed ok.

So I don't understand what went wrong. I think this is not a problem with my use of WRITE, because addr can read data, just addr+1 cannot

from disni.

ShiningChuang commented on August 24, 2024

Oh, that's exactly what I missed, the main problem! The length. sgeSend.setLength() is set to a fixed buffer length in init().

Thanks a lot! :)

from disni.

PepperJo commented on August 24, 2024

No Problem. As mentioned above, I would always check the completion queue results to see if the command actually succeeded.

from disni.

using rdma write to specified offset position of the remote buffer about disni HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent