Git Product home page Git Product logo

bobzhuyb / ns3-rdma Goto Github PK

View Code? Open in Web Editor NEW
234.0 234.0 115.0 9.56 MB

NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch

License: GNU General Public License v2.0

Makefile 0.05% Python 62.66% C++ 36.32% Shell 0.01% MATLAB 0.05% Click 0.04% C 0.22% Perl 0.63% Gnuplot 0.02% Batchfile 0.01% Perl 6 0.01%

ns3-rdma's People

Contributors

bobzhuyb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ns3-rdma's Issues

WARNING: Drop because egress Q buffer full

I use Bcube as my network topology. It has 8 switches and 16 nodes. This error occurred when I ran the simulation. Could you tell me why? How can I solve this problem? Thank you very much.

some questions about pfc pause frame

hi,I'm interested in your project.
I want to know how pause frame come to upstream device(NIC). Pause frame is transmittid based L2.Did you use the Global routing?

How to compile and run with WAF under Ubuntu?

Hello, I learned a lot from your project. Now I want to ask a question about WAF compilation. I have run successfully under Windows, now I want to merge it into the ns3.29 project under ubuntu. What should I do? Look forward to your reply!

As for the DCQCN on RDMA READ flows

According to my limited understanding, DCQCN depends on ECN to detect network congestion and utilizes marked ACKs to notify the sender to restrict its sending rate. However, for RDMA READ operations, payload as well as ACKs are carried in the response messages. Further, according to IB transport, there is no further ACKs for "read response". So, how does DCQCN control the rate of RDMA READ flows? Does the NP of DCQCN implement an additional ACK mechanism other than the original IB transport? Thanks.

time run error

Hello, sir!
I have questions to consult with you. I opened the config.txt, but also run failed. Please give me some modification suggestions.
Hope for your reply. Thanks!
run error

Question about some WARNING and ERROR

Hi, Yibo,

I have ported your windows version to linux version based on the ns-3.18. But, I am not quite sure whether I have done things right.

Anyway, the build process terminates successfully.

But there are some ERROR and WARNING messages when I run the default application (third.cc w/ mix/config.txt).

ERROR: Sendingbuffer miss!
WARNING: shouldn't reach here -- socket.h

So, is the simulation still right? or what happens according to these messages?

How to read and analyse the output trace file?

How to read and analyse the output trace file?
I have run the example configure file and got the following output in the mix.tr file:
2.000002 /1 1.2>1.1 u 29348 0 3
...
what does each number mean? where did u define them?

Thank you.

No output in the cmd

Hello,
I have successfully run the simulator on my windows machine and the default program generated the trace file successfully.
However, when running the hello-simulator or first project, I do not get any output in the cmd.
I tried adding an std::clog to the project and it worked fine.
is the NS_LOG_UNCOND deactivated in this version or there is something else wrong?
Thanks.

Run time error "cannot add the same kind of tag twice"

Hi Yibo,

have you ever met this error "cannot add the same kind of tag twice"?
I got this error whenever a CNP is passing two switch nodes.

This error is thrown here:
"src/point-to-point/model/qbb-net-device.cc:459-461 "

if (ipv4h.GetProtocol() != 0xFE) //not PFC
{
        packet->AddPacketTag(FlowIdTag(m_ifIndex));
......

Here, It seems that when a CNP(ipv4h.GetProtocol=0xFF) arrives at a switch node,
the packet tag will be added.

But the tag is not removed when leaving switch, specified in this scope:
"src/point-to-point/model/qbb-net-device.cc:349 "

if (m_queue->GetLastQueue() == qCnt - 1)//this is a pause or cnp, send it immediately!

I traced back and found when the CNP arrives at the next switch node, the error occurs.

Then I added a code snippet to remove the tag within the 'if' scope in
"src/point-to-point/model/qbb-net-device.cc:349"

if (m_queue->GetLastQueue() == qCnt - 1)//this is a pause or cnp, send it immediately!
  {
+      if (h.GetProtocol() != 0xFE) //not PFC , here h refers to ipv4header
+        {
+            p->RemovePacketTag(t);
+        }
	TransmitStart(p);
  }

The error does not occur now.

I wonder if this is a bug or due to I have missed something in configuration.
I met this in an Incast test scenario.

Thanks,
Ge

some questions about the implementation of Timely

Hi ,
About timely branch, i study the code but i find i am confused.
Where is the algorithm of "timely" implemented?
In the main function, if i don't use qbb-device, can i simulate the protocol of Timely?
Thank you for your reply.

Some questions about how modules work in simlator.

Hi.
I have run the main.exe correctly and now i want to implement some algorithm on simulator.
Can i know how the simulator work , like the relationship of qbb-device and broadcom-node, and the role of qbb-device in simulation.
I also want to know where and who decides when to send PFC packet.
Thank you for your reply

How to run with visual studio 2015?

Hello, I'm very impressed with the work you've done.However, I still have some questions about how this project will be used in visual studio 2015.I wonder if you could write a tutorial to teach us how to use it.

Other version of ns3

Hello, is there a way that dcqcn can be simulated on other version of ns3, like ns-3.30/ns-3.35/ns-3.36?

Missing of build file

Hello , I have a question about the link of build file. The link is not work right now, can you update the new link. It will be really helps me

Build Errors with visual studio 2015 community.

Hi,

I am using visual studio 2015 community and try to build the solution but got error like this:
image

There are 3 kinds of errors, C1083, MSB3073 and LNK2001.
I search for solutions but they seem to be useless.
Can you help? Or is there anything I did wrong?
Also the C1083 error is confusing, because I didn't do any change on the codes.

Thx. Nice day :)

What does the hops in qbb-net-device mean?

Hi Yibo,

I am reading the code of qbb-net-device. I see many per-flow variables also have hop indexes (e.g., m_alpha[fCnt][maxHop], m_targetRate[fCnt][maxHop]). What does the hop mean here? I do not see this concept in the DCQCN paper.

Thanks,
Yuliang

questions about Timely

    Sorry to bother you. But i think i really need some help. I build the solution in Timely , but i didn't get the main.exe in release, instead i get it in a new directory "Debug“.  I run it suing the config.txt ,however it did not generate trace file . I don't know if it is normal or something wrong happened.
   the output information shows that it just generate some files in Debug. 

timelyoutput

flow-level ECMP may not work properly

Hi,
I tried to used Flow-level ECMP in fat-tree topology, but I found the total throughput of the receiver was not as much as the maximum. I think you use the original source codes about flow-level ECMP of ns-3, and I ran the same topology using TCP in ns-3 and visualized it. Surprisingly, only some core switches were used, so I thought that flow-level ECMP might not work properly and it is the main cause for low throughput. Have you ever noticed this? or I used it wrongly?

thanks.

How to implement RSVP protocol?

Recently, I have been studying various commonly used protocol emulation. May I ask if NS3 can realize RSVP protocol emulation? I have not seen anyone realize RSVP on the Internet, so I have a skeptical attitude about the feasibility.

some questions about the DCQCN and Timely

hello, I have read your paper about analysis of DCQCN and TIMELY. TIMELY has a control engine that inserts delays between segments to achieve the target rate. And how dose DCQCN achieve the target rate in real NIC or ns3 ? Thank you.

a small bug in code in ReceiverCheckSeq function

Hi, yibo
I think I found a small bug in your codes in the ReceiverCheckSeq function in qbb-net-device.cc, it does nothing when seq<expected, which means that the NIC receives a duplicate data packets. Let's think of a condition, when the ack(n=4000) lost and the sender didn't receives the ack, so it waits for a period of time, then it began to retransmit, unfortunately, the receiver will do nothing when it receives the duplicate data packets so the sender will never receives the ack(n=4000). This is what I met when I set the loss rate to 0.01 determinately(drop 1 per 100 packets passes the switch), and the ack(n=16000) get lost, thus cause the network a livelock. I think the algorithm should check the seq even though seq<expected. And when (seq+1)%m_chunk==0, the receiver will send back a "duplicate" ack to the sender.

some questions about how to realize rdma on server

Is the DCQCN based on RoCEv2 the only one to realize RDMA? I compare the code with the basic ns3 about udp. But I cannot find the code about kernel bypass.Have you realizede it ?
Thanks for your reply.

Can I turn off buffer sharing among ports on a switch?

Hi Yibo,
From your replies in the issue Some questions about how modules work in simlator, I learnt that all the ports on a switch share the same queue buffer by "A node can have multiple qbb-net-device (especially on a switch), which share the same m_broadcom and m_queue."
I would like to know, can I turn off the buffer sharing and let each port have its own fixed-size buffer? In this way I can control the buffer resource allocated to each switch port.

Thanks~

I want to know how many times the queues drop packets and mark ecn ?

I want to know how many times the queues drop packets and mark ecn ?
I find the network/utils/boadcom-egress-queue.cc::BEgressQueue has the limit of queue and network/model/boadcom-node.cc also has the limit of buffer.
1.So what the difference ?
2.If I change the queue's MaxBytes in network/utils/boadcom-egress-queue.cc , where should I also change ?
3.I need to count the drop times and ecn times, where should I add the counter?

hi,i try to run but it seems has something wrong

Hi friend.
I use vs 2015 to build the project and it builds successfully.
But when i try to run the simulation with the command in readme, it takes very long time, it has run 10 hours and is still runing, does this work normally? And i can't see the trace file in mix.

image

image

hi , i meet some problems when i build solution on VS2012

hi
i open the file ns-3-dev.sh by VS2012. But when i build solution i meet this problem and it happened many times:

c:\users\ns3-rdma-master\windows\ns-3-dev\headers\ns3\nstime.h(145): error C3861: “lround”: 找不到标识
image
May be u can give me some solution , it will be nice !
Thanks

M1 MacBook installation

What is the best way to build this for m1 macs? Is it necessary to have a windows vm or vscode 2015? Would it be easier to try to make changes to the waf and adapt the build for m1 macs?

Some questions about the output

Can you tell me where the content in the mix.tr file comes from?What is the meaning of qFb in the figure below?Thank you.
image

how to visualize simulation?

Hi,
On ubuntu OS, pyviz and netanim can be used to visualize simualtion. Is there any tool supported by project ns3-rdma to visualize simulation? I have tried to generate .xml file in simulation, and then open this file in Ubuntu, but netanim told me "This XML format is not supported. Minimum Version:3.106" (the verison of netanim I used is 3.107, the version of ns-3 is 3.26). Do you have any suggestion about visualization?
thank you in advance!

Can you tell me what the output is?

Hello, friend. I ran your code and got the output file mix.tr. But I don't know what the output means. Can you tell me what is the output of each item? Thank you very much.
image

How to adjust the sending rate?

Could you tell me how to set the sending rate of a sender to a fixed value?Where should I change the code?
Thank you in advance.

Some question about simulation(about fluid model)

hi, yi bo
I try to use ns3 to verfiy the fluid model you come up in 'Congestion Control for Large-Scale RDMA
Deployments
'.
Then i get some strange performance.
For example, i change parameter BYTE_COUNTER to 10MB which comes from your paper, but the rate of host can't converge and queue length at bottleneck varies a lot. Then i found some parameters that i don't understand:
CLAMP_TARGET_RATE
CLAMP_TARGET_RATE_AFTER_TIMER
If i set them both to 0, the rate of host can converge but it and queue length at bottleneck still oscillate a lot.
I stick the figure of performance(2 flow) as below.

i cyni_haz 1_bg 242ab

Thank you!

Output interpretation

Hello,

I was wondering what single entry in mix.tr signifies.

2.000002 /1 1.2>1.1 u 25671 0 6

2.000002 is the timestamp
1.2 is the source
1.1 is the destination
u for udp
25671 , I am not sure what it stands for.
0 packet number
6 is the priority

Please correct me if I am wrong. Also If possible could anyone please let me know what 25671 signifies.

a specific derivation process for R_AI

Hello, I would like to inquire about the rate increment R_AI during the active increase process of the DCQCN algorithm.Is there a specific derivation process for R_AI , and what factors are related to the determination of this rate increment? Is there a specific expression? It doesn't seem to be mentioned in the article.

PAUSE never triggered

Hi,
I was doing a simulation using a N:1 (N is big) incast topology, and the buffer is quickly full leading to egress drop, without triggering PAUSE.
So I went back to check the code, and found out that PAUSE generation is checked (checkqueuefull()) inside the send() function, not the receive() function; also it's checked after the ingress and egress admission check, and the ingress and egress admission are checked at the same time.
I thought first in the receive(), we should check the pause generation and ingressadmission, then in the send(), check egressadmission.
So I am confused here, since the simulation keeps telling me buffer is full without triggering any pause generation.
Thank you.

some questions about the implementation of tcp

hello,Yibo.
I have tested the tcp-flow in the project.But the sequeueNumber in mix.tr is always 42. Does this mean that the receiver cannot receive the packets and send back NACK?
And the codes about generate NACK and ACK in qbb-net-device is used for UDP only?
Waiting for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.