Comments (17)
I experimented with a few more threshold values. The setup is:
- instances
[a-d]*
(enough instances to be representative of benchmark results) - 10 seeds
- greedy, rollout, and rollout long (better simulation solutions). Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.
The thresholds used for rollout are 15% (original), 20%, 30% and 40%. There is one slightly confusing thing due to a mistake of mine:
- 15 and 20% were run with 120s epoch time limits
- 30 and 40% were run with 60s epoch time limits
AFAICT the time limit does not play a huge role because the only significant difference is how many simulations you get, but this is often already enough in the 60s epoch time limit.
Here is a summarized table of results:
The best scenario, rollout_long30
, is now 20K improving over greedy! Insane. That means that there is definitely merit in trying to obtain better solutions.
Crazy stuff! I'm stoked 😄
from euro-neurips-2022.
Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.
So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?
That's correct. These experiments show that, if we can get "rollout long quality" solutions within 60s, then we can improve our dynamic performance even further. At the moment, this is not possible, but to be honest, this is not a very big challenge to achieve (see my comment above with 3 suggestions on how to improve solution quality). I also think that tuning HGS will work, but I have a very bad understanding of how to tune these parameters.
from euro-neurips-2022.
The table below shows the average results over 10 seeds for [a-d]*
-id instances for greedy, rollout and 'rollout-long', 60 second "epoch time limit". See this commit if you want to see the adjustments in detail.
Rollout-long is rollout where the simulation instances are solved using 500 iterations with HGS and slightly adjusted config. The number of simulations between rollout and rollout-long is similar so that the only difference is the quality of the solution.
There is not really an epoch time limit for rollout long. The only restriction is that the dispatch instance is solved in half the epoch time, just like rollout.
Having better quality simulation-solutions gives a ~400 improvement. This is not a lot, given that the solutions from rollout long are quite a bit better than rollout. I don't have hard numbers on this, but these "long" solution are on average 5% better for the first few epochs. In the later epochs this goes down to 0.1-0.5%, because there are the simulation instances become smaller.
from euro-neurips-2022.
@leonlan what am I looking at in the above results? I guess dynamic performance, but how did you increase the epoch time limit to allow for more iterations?
from euro-neurips-2022.
@N-Wouda I updated my earlier comment! There's some merit in improving the solution quality, but the current experiment shows that this is only very little. I'll try a few more experiments in the next days.
from euro-neurips-2022.
I ran a similar experiment to one mentioned earlier. 10 seeds, [a-d]*
instances. But this time I used N_LOOKAHEAD=8
and 120 epoch time limits.
Just like before, rollout long improves rollout but only by a little (500-ish).
The results are very similar to the previous experiment, even though we have 120 epoch time limits and more lookaheads. As already said before somewhere, more simulations generally don't lead to better improvements in rollout (i.e., we already have enough simulations). Moreover, more lookahead periods than 3 don't produce a lot more requests and those extra requests are hard to combine (tight time windows and release dates).
from euro-neurips-2022.
In summary:
- Solving simulations instances much better only yields a small improvement on reward. In the numerical experiments I ran, this was about 400-500 points on average using 4x more computation time. This seems to indicate that having "good enough" solutions is sufficient, even though they may be 10-15% off from the optimal solution. The general conclusion is: improve solving the simulation instance where possible, but don't overdo it. Here are some low-effort ideas to improve simulation instance quality:
- Initialize solutions using the postponed routes from previous epoch.
- Improve release time-based checks in the static solver. Most things, like neighborhood computation, greedy repair, etc., do not take into account release times yet. This is an easy win.
- Re-use previous simulation as starting solution.
- Increasing
N_LOOKAHEAD
doesn't seem to help a lot. So this is a parameter tuning problem. - Increasing the number of simulations doesn't seem to be a lot either. This means that we already simulate enough with the current configuration.
An important note is that all these observations have to be interpreted within the context of the threshold dispatching criteria. If we use some other dispatching criteria based on costs, this story may result in a completely different story. See #101 for more on that.
I leave this issue open for a while to discuss these experiments. Also let me know if you have any suggestions for numerical experiments to try out.
from euro-neurips-2022.
@leonlan do you have any idea/intuition for what happens when we change the threshold?
Edit: ah, that's issue #101 :-)
from euro-neurips-2022.
Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch
and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/
module implementing just rollout, and taking its arguments directly from the solver script.
I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.
from euro-neurips-2022.
Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing
run_dispatch
and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a singlerollout/
module implementing just rollout, and taking its arguments directly from the solver script.I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.
Sure. We should keep greedy as baseline reference but we can just make a module for that as well.
from euro-neurips-2022.
@leonlan do you have any idea/intuition for what happens when we change the threshold?
Edit: ah, that's issue #101 :-)
I changed DISPATCH_THRESHOLD
from 0.15 to 0.20. Other factors remained the same as in the last experiment.
The differences between rollout and rollout long are now much larger. There's now a 1500 difference, up from 500 before. Interesting! What's also interesting: rollout 0.15 vs rollout 0.20 threshold is still about the same.
from euro-neurips-2022.
Very nice!! Could you share the exact configuration you are using for rollout_long30?
from euro-neurips-2022.
Very nice!! Could you share the exact configuration you are using for rollout_long30?
Here you can find it:
Euro-NeurIPS-2022/dynamic/rollout/rollout_long.py
Lines 52 to 58 in 5cc1067
Note that it still uses an older rollout code version (but with same performance). The solutions with rollout_long are between 2-8% better than rollout in the earlier epochs, in the later epochs this falls off because the instances become smaller. Actually, I even fall-back on regular rollout if rollout long does not improve the solutions. But these are all (insignificant) details, I hope 😅.
from euro-neurips-2022.
Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.
How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?
from euro-neurips-2022.
Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.
How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?
Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.
from euro-neurips-2022.
Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.
So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?
from euro-neurips-2022.
Unfortunately, the results for rollout_long30 were incomplete. A single instance set was missing, meaning that the comparison of results was faulty since the other methods did have results for all the instance sets. The sole reason why rollout_long30 had better results was simply because of missing data.
This means that my conclusion that improving the simulation solution quality could lead to drastic improvements was incorrect :-( There are small gains (1-2K) which are already merged by #132.
I'll close this issue, but feel free to open again when relevant.
from euro-neurips-2022.
Related Issues (20)
- Improve rollout dispatching criteria HOT 2
- Filter instance method unsafe? HOT 9
- How to structure codebase HOT 3
- Single static solver builder HOT 6
- Route minimization procedures HOT 15
- Configuration management
- Change restarting mechanism HOT 7
- Parent selection for crossover HOT 12
- Documentation HOT 12
- Rename rollout and parameters
- High variance in solution quality HOT 6
- Fitness comparison in binary tournament
- TODOs in code HOT 7
- Neighbourhood sizes HOT 15
- Determining minimum number of vehicles HOT 10
- Make sure everything's deterministic once we fix a seed HOT 21
- Slack-induced string removals as mutation operator HOT 1
- Solve epochs with low number of must_dispatch requests greedily HOT 17
- Postprocess after finishing LS HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from euro-neurips-2022.