soheil-ab / orca Goto Github PK
View Code? Open in Web Editor NEWOrca: Towards Mastering Congestion Control In the Internet
License: MIT License
Orca: Towards Mastering Congestion Control In the Internet
License: MIT License
Hi,
I am trying to reproduce the training curve of Orca (the score curve). I simplified the training to one-actor case and found the reward(score) could reach 60 at the early stages, but then drops. Usually, the reward ends in fluctuating between ~38 and ~ 3.
Did you encounter this before? Do you have any insights why this happens?
Thank you
Hello ,i want to integrate orca into Pantheon,but i meet some questions.
i just comment the mahimahi and the client part in orca-server-mahimahi.cc(152-155,243) as the sender,and use client as before.
but when i use them in Pantheon,i get confused result.
pantheon_report.pdf
it seems it can't run correctly,but i can run outside the Pantheon successfully.
i noticed that you said in issues before that you had plan to integrate it to Pantheon.
i wonder whether there are some easily ways to integrate it and where is my method wrong.
thanks for your help!
Hello, thank you very much for your contribution to the community. Here I would like to ask you a question. What is the purpose of setting target_ratio=1.1*orca_info for each time in the slow startup stage?
It seems that you did not mention this in your paper.
Thank you very much for your answer.
Here is the corresponding code snippet,
if(!slow_start_passed)
{
//got_no_zero=1;
tcp_info_pre=orca_info;
t0=timestamp();
target_ratio=1.1*orca_info.cwnd;
ret1 = setsockopt(sock_for_cnt[i], IPPROTO_TCP,TCP_CWND, &target_ratio,sizeof(target_ratio));
if(ret1<0)
{
DBGPRINT(0,0,"setsockopt: for index:%d flow_index:%d ... %s (ret1:%d)\n",i, flow_index,strerror(errno),ret1);
return((void *)0);
}
Hello @Soheil-ab ,
Thank you for your work!
I'm currently trying to evaluate Orca under different network conditions, but I am unsure where the code loads pre-trained models from.
I see the models being saved to ./train_dir/learner0/model*.ckpt - and I see a load_model and save_model in agent.py - but these functions aren't used anywhere (rephrase: where would be the right place to use load_model)?
Also - replay_memory has been initiated, but not used - is this on purpose? and How do I continue learning on the previous model using this option?
TL;DR: Could you please explain how to go about loading different models to evaluate Orca's performance?
Thank you!
Hi @Soheil-ab ,
I want to evaluate Orca's performance by tuning the parameters of the reward function so I need to re-train the DRL model. I followed the intruction and ran the "./orca.sh 4 44444" and "./orca.sh 1 44444" commands to achieve that goal. However, I didn't found there are new checkpoints derived in /models and there are no new content in the rl-module/log/sum-*.
Could you please inform me that is there anything wrong with my procedure ?What's more, how can I ensure that the training process is running smoothly?
In Figure. 10 in the Orca paper. I find the overhead of Orca is significantly lower than Aurora.
However, running Orca means running the Cubic and RL model at the same time, while Aurora only runs the RL model to adjust the CWND window. How can I achieve the low overhead of Orca? With smaller model architecture or longer MTP? Didn't get a clue from the paper.
Hi, @Soheil-ab .
Thanks for sharing the work. Lately I've been reading the paper and the source code. However I have not found which parameter controls the Monitoring Time Period. Could you show me which part of the code controls the MTP accurately?
Thanks!
Hello,i try to use your algorithm,but i meet some difficulties when i try to patching Orca's Kernels follow the option 1.
i use ubuntu 14.04 in tencnet cloud and my linux kernels is 3.13.0-128-generic.
my /etc/default/grub is
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'
GRUB_DEFAULT=0
#GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX_DEFAULT="crashkernel=1800M-4G:128M,4G-:168M panic=5"
GRUB_CMDLINE_LINUX="console=ttyS0,9600n8 console=tty0"
GRUB_SERIAL_COMMAND="serial --speed=9600 --unit=0 --word=8 --parity=no --stop=1"
# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console
GRUB_TERMINAL="console serial"
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true
# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"
# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"
GRUB_RECORDFAIL_TIM
EOUT=5
GRUB_GFXPAYLOAD_LINUX=text
i can successfully perform the first two steps but after i reboot
i get
it really troubled me for a long time, I don’t know whether you could give me some suggestions.
thanks for your help!
Hey , @Soheil-ab
I found some of the following phenomena when using the model,
In conducting the bandwidth burst(24mbps to 12mbps) test, I found that when rtt is large (rtt>100ms),tput will oscillate after the bandwidth burst decrease.
In some experiments, such as bw=5mbps, 10mbps, 12mbps, tput will appear ‘tail-up’phenomenon.
When training the model on the basis of the original model (Train trace:Bandwidth :24mbps->12mbps),I found that the actor loss and critical loss continued to rise, and the results in the test trace(Bandwidth :24mbps->12mbps) after training were not as good as the original model.
Here are my concrete results
I would like to ask you if you encountered any of the problems I mentioned above during your previous use or testing? If so, how did you solve these problems?
Thank you very much for your answer.
Hello, @Soheil-ab , sorry to bother you. I'm having some problems with shared memory. I refer to your program and try to use shared memory to communicate between C and Python. But my program crashes after a while, removing the shared memory does not. I did a lot of experiments and modifications, but still did not solve the problem or find the cause. I don't know if you have encountered this problem. Looking forward to your help, thank you very much!
@Soheil-ab Sorry to bother you, I'm a beginner. I am having some difficulties in running and would like to ask for your help. I followed the step-by-step installation as described in the code. However, when running the sample test, the problem of "no process found" appears as shown in the figure below.
I have tried many ways to solve it, and my colleagues also encounter the same problem. Although the questions may be fool, I hope to get your help.
One more question, I read in your paper that you have also implemented a purely DRL-based version. I wonder how to switch to this version. I see that the reward in the GYM_Env_Wrapper function in envwrapper.py is fixed to 10. So I'm not sure if I switch from here? Looking forward to your help, thanks!
Hi,
from the server's code, it looks like the congestion window is updated in two instances:
In the second case, how does the update rule relate to the window increase presented in the paper, i.e. 2^(alpha) * cwnd?
Thanks for the clarification.
用第一个方法,在更换内核的时候,初始化内存盘一直卡着,好几天没有结果怎么办?
In the paper ,it seems Orca used Pantheon for test,so is there any help for integration with Pantheon ,like wrapper.py for Pantheon?
Thanks a lot !
Hi, I wanna test Orca in my transport protocol.
My system has transport function and integrated cubic. I only need the output of Orca DRL module.
So could you tell me how can I run DRL solely and the input&output formation/position of it?
Thanks a lot :)
Hello ,i have the question about the network data.
In define.h
You define "#define TCP_ORCA_INFO 46 ",
I check the socket.h,46 means SO_BUSY_POLL
"#define SO_BUSY_POLL 46"
I can not understand how use "getsockopt( sk, SOL_TCP, TCP_ORCA_INFO, (void *)info, (socklen_t *)&tcp_info_length );"
get the data of info.
Maybe the question is very silly , i am sorry for that:)
From the file model.ckpt-1283529.index, I notice that you train the model with more than 1000000 epochs. Could you give me some information how long it takes to train this model?
I want to run Orca server and Orca client on two separate machines which are connected by the third one (router) and emulate different network environments on the router by linux tc and netem. How should I run Orca without mahimahi?
Hey , @Soheil-ab
I found some of the following phenomena when using the model,
1、Orca is far less competitive than Cubic.
2、There is also a gap between Orca's competitiveness and Cubic's.
Here are my concrete results
I would like to ask you if you encountered any of the problems I mentioned above during your previous use or testing? If so, how did you solve these problems?
Thank you very much for your answer.
Hi @Soheil-ab,
I had a few questions regarding training Orca.
The paper mentions you use 256 actors interacting with different environments - can you please give some insight on:
Thank you for taking the time to read and answer these questions!
Hi, I'm trying to run Orca in remote mode without Mahimahi.
I have a dumb question here: it seems that the client (compiled from client.c) is a simple receiving/acknowledging tool. So and it run on a linux kernel without Orca patched?
Thanks!
@Soheil-ab
Hi, Soheil.
I set the following network topology:
client ---------- router------------- server
and set the bottleneck bandwidth with 128Mbps, RTT with 150ms, and queue length with 1BDP at the router using Linux tc and netem.
I found the maximum throughput of Orca is only ~60Mbps but pure cubic can reach 100+Mbps?
Do I need to configure Orca specifically for this environment? (I have asked you how to run Orca without mahimahi.)
Sorry to interrupt,I want to test the throughput of No-model and the throughput of Clean-Slate Model in the same network environment as the sample code, how should I modify it based on the sample code? Looking forward to your answer, thanks!
I would like to ask you a question: in order to reproduce the case of Figure 9: in your paper, and I found that the maximum value of cwnd in this process is up to 2147483648, could it be that target_ratio=1.1*orca_info is causing the cwnd to increase too aggressively?
Here are my Orac cubic 24mbps-12mbps
Thank you very much for your answer.
Dear Soheil,
Have you ever tested the Orca in the 5G wireless scenario where bandwidth changes quickly and unpredictablely and How Orca perform in the aforementioned scenario? The second question is whether we can train Orca in the new unseen environment to enhance the performance in new scenario while perform inference to guide the classic cc ? In another word,I want to train the Orca in real time between two MTPs(Monitoring Time Period) and infer at the MTP, How do you think of this proposal?
Many thanks!
Best regards,
Eric
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.