Git Product home page Git Product logo

Comments (5)

PepperJo avatar PepperJo commented on June 20, 2024 2

Ok, it seems you addressed most of my issues raised.
When you do increase the remote address do you also decrease the length by 1? Otherwise you might try to write outside the area? It seems you checked this already given your statement above.
Do you check each completion queue result if it was successful? If you get an error there that might be easier to debug.

from disni.

PepperJo avatar PepperJo commented on June 20, 2024

I see a few problems with your code. On the remote side:

  1. You should always send the Rkey and not the Lkey to a remote side (although for most devices these keys are identical)
  2. You should clear the dataBuf before you send the buffer information (address, length, rkey) to the remote side. Otherwise, the remote write could happen just before you clear the buffer.
  3. The post receive is not necessary here. In fact it will not do anything. You only need to post receive buffers when you are using SEND at the remote side but for WRITE you don't need to post a receive buffer. If you want to use SEND, you need to make sure that the receive buffer is posted before you send the buffer information. Otherwise, it can happen that the remote side issues a send but there is no buffer posted yet.
  4. You assume that after you got the receive event the data should be in the buffer but the (first) event only tells you that the receive buffer has been posted and you will never receive an event that you received data into this buffer because as explained above you are not using SEND on the remote side.

The local side looks ok just be aware that the event you get after a successful WRITE only indicates that the WRITE has been successfully sent to the remote side but it doesn't necessarily mean it has completed on the remote side (meaning placed into memory).

You should try to understand the difference between one-sided (WRITE) and two-sided operations (SEND), so that you can see which operation makes the most sense in your case. Generally speaking one-sided operations are faster but harder to use.

from disni.

ShiningChuang avatar ShiningChuang commented on June 20, 2024

Thanks for your correction! I will explain based on your point.

  1. I know I should send rKey, but this code was rewritten based on src/test/java/com/ibm/disni/benchmarks/ReadServer.java, which write sendBuf.putInt(dataMr.getLkey()) in line 161. Considering rkey and lkey are equal and this is not the main problem, so I ignored the problem.
  2. This does bring about the problem of "remote write could happen just before I clear the buffer". I will pay attention to this point later, but this is also not the main problem.
  3. I know that post reception is unnecessary for WRITE. I just want the sender to write the data and then notify the receiver through SEND. Because I'm worried that receiver starting to read before the sender writes. Maybe it’s not something to worry about? Also, this is not the main problem.
  4. I understand this point.

But these all seems are not the main problem causing receiver can't read any data when I change dataWR.getRdma().setRemote_addr(addr) to dataWR.getRdma().setRemote_addr(addr+1).

I originally thought that rkey only had access rights to the first address of the remote data buffer, so when I change the address to addr+1, I can't access it using rkey and I should access it using the key of addr+1. But after checking the rdma information, I found that rkey should have access rights to the addresses in the entire area [address, address+len), so this was not the problem.

Then I suspect that the memory space of the data buffer on the remote side is not continuous, and addr+1 does not correspond to dataBuf.position(1). But I checked that the dataBuf is allocated through ByteBuffer.allocateDirect(), and it should be a continuous memory. I checked the continuity of this address through Unsafe, and it was indeed ok.

So I don't understand what went wrong. I think this is not a problem with my use of WRITE, because addr can read data, just addr+1 cannot

from disni.

ShiningChuang avatar ShiningChuang commented on June 20, 2024

Oh, that's exactly what I missed, the main problem! The length. sgeSend.setLength() is set to a fixed buffer length in init().

Thanks a lot! :)

from disni.

PepperJo avatar PepperJo commented on June 20, 2024

No Problem. As mentioned above, I would always check the completion queue results to see if the command actually succeeded.

from disni.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.