Git Product home page Git Product logo

Comments (17)

leonlan avatar leonlan commented on June 25, 2024 1

I experimented with a few more threshold values. The setup is:

  • instances [a-d]* (enough instances to be representative of benchmark results)
  • 10 seeds
  • greedy, rollout, and rollout long (better simulation solutions). Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

The thresholds used for rollout are 15% (original), 20%, 30% and 40%. There is one slightly confusing thing due to a mistake of mine:

  • 15 and 20% were run with 120s epoch time limits
  • 30 and 40% were run with 60s epoch time limits

AFAICT the time limit does not play a huge role because the only significant difference is how many simulations you get, but this is often already enough in the 60s epoch time limit.

Here is a summarized table of results:
image

The best scenario, rollout_long30, is now 20K improving over greedy! Insane. That means that there is definitely merit in trying to obtain better solutions.

Crazy stuff! I'm stoked 😄

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024 1

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?

That's correct. These experiments show that, if we can get "rollout long quality" solutions within 60s, then we can improve our dynamic performance even further. At the moment, this is not possible, but to be honest, this is not a very big challenge to achieve (see my comment above with 3 suggestions on how to improve solution quality). I also think that tuning HGS will work, but I have a very bad understanding of how to tune these parameters.

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

The table below shows the average results over 10 seeds for [a-d]*-id instances for greedy, rollout and 'rollout-long', 60 second "epoch time limit". See this commit if you want to see the adjustments in detail.

Rollout-long is rollout where the simulation instances are solved using 500 iterations with HGS and slightly adjusted config. The number of simulations between rollout and rollout-long is similar so that the only difference is the quality of the solution.

There is not really an epoch time limit for rollout long. The only restriction is that the dispatch instance is solved in half the epoch time, just like rollout.

image

Having better quality simulation-solutions gives a ~400 improvement. This is not a lot, given that the solutions from rollout long are quite a bit better than rollout. I don't have hard numbers on this, but these "long" solution are on average 5% better for the first few epochs. In the later epochs this goes down to 0.1-0.5%, because there are the simulation instances become smaller.

from euro-neurips-2022.

N-Wouda avatar N-Wouda commented on June 25, 2024

@leonlan what am I looking at in the above results? I guess dynamic performance, but how did you increase the epoch time limit to allow for more iterations?

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

@N-Wouda I updated my earlier comment! There's some merit in improving the solution quality, but the current experiment shows that this is only very little. I'll try a few more experiments in the next days.

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

I ran a similar experiment to one mentioned earlier. 10 seeds, [a-d]* instances. But this time I used N_LOOKAHEAD=8 and 120 epoch time limits.

image

Just like before, rollout long improves rollout but only by a little (500-ish).

The results are very similar to the previous experiment, even though we have 120 epoch time limits and more lookaheads. As already said before somewhere, more simulations generally don't lead to better improvements in rollout (i.e., we already have enough simulations). Moreover, more lookahead periods than 3 don't produce a lot more requests and those extra requests are hard to combine (tight time windows and release dates).

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

In summary:

  • Solving simulations instances much better only yields a small improvement on reward. In the numerical experiments I ran, this was about 400-500 points on average using 4x more computation time. This seems to indicate that having "good enough" solutions is sufficient, even though they may be 10-15% off from the optimal solution. The general conclusion is: improve solving the simulation instance where possible, but don't overdo it. Here are some low-effort ideas to improve simulation instance quality:
    • Initialize solutions using the postponed routes from previous epoch.
    • Improve release time-based checks in the static solver. Most things, like neighborhood computation, greedy repair, etc., do not take into account release times yet. This is an easy win.
    • Re-use previous simulation as starting solution.
  • Increasing N_LOOKAHEAD doesn't seem to help a lot. So this is a parameter tuning problem.
  • Increasing the number of simulations doesn't seem to be a lot either. This means that we already simulate enough with the current configuration.

An important note is that all these observations have to be interpreted within the context of the threshold dispatching criteria. If we use some other dispatching criteria based on costs, this story may result in a completely different story. See #101 for more on that.

I leave this issue open for a while to discuss these experiments. Also let me know if you have any suggestions for numerical experiments to try out.

from euro-neurips-2022.

N-Wouda avatar N-Wouda commented on June 25, 2024

@leonlan do you have any idea/intuition for what happens when we change the threshold?

Edit: ah, that's issue #101 :-)

from euro-neurips-2022.

N-Wouda avatar N-Wouda commented on June 25, 2024

Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/ module implementing just rollout, and taking its arguments directly from the solver script.

I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

Re. tuning. It's easiest to tune if we pass the settings in via the function arguments, rather than the constants file we have now. For this I suggest we refactor fairly substantially, removing run_dispatch and the other dynamic strategies (unless anyone wants to continue with those, but since there's less than a month left in the competition, I suspect not). Then we can have a single rollout/ module implementing just rollout, and taking its arguments directly from the solver script.

I'll see if I can write out a tuning script with SMAC next week, both for static and dynamic/rollout.

Sure. We should keep greedy as baseline reference but we can just make a module for that as well.

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

@leonlan do you have any idea/intuition for what happens when we change the threshold?

Edit: ah, that's issue #101 :-)

I changed DISPATCH_THRESHOLD from 0.15 to 0.20. Other factors remained the same as in the last experiment.
image

The differences between rollout and rollout long are now much larger. There's now a 1500 difference, up from 500 before. Interesting! What's also interesting: rollout 0.15 vs rollout 0.20 threshold is still about the same.

from euro-neurips-2022.

jmhvandoorn avatar jmhvandoorn commented on June 25, 2024

Very nice!! Could you share the exact configuration you are using for rollout_long30?

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

Very nice!! Could you share the exact configuration you are using for rollout_long30?

Here you can find it:

sim_sol_long, cost_long, is_feas_long = solve_simulation(
sim_inst,
500, # 500 iterations is often around 5% from bks.
initialTimeWarpPenalty=50,
nbPenaltyManagement=20,
generationSize=30,
minPopSize=20,

Note that it still uses an older rollout code version (but with same performance). The solutions with rollout_long are between 2-8% better than rollout in the earlier epochs, in the later epochs this falls off because the instances become smaller. Actually, I even fall-back on regular rollout if rollout long does not improve the solutions. But these are all (insignificant) details, I hope 😅.

from euro-neurips-2022.

jmhvandoorn avatar jmhvandoorn commented on June 25, 2024

Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

Note that the number of simulations in rollout long is roughly equal to the number of simulations in rollout.

How is that possible? The time to run a single simulation is 20x as long right? Or what do I miss?

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

from euro-neurips-2022.

jmhvandoorn avatar jmhvandoorn commented on June 25, 2024

Sorry that's a bit confusing indeed. I extrapolate from a single "normal" rollout simulation how long that would take. Then I calculate how many simulations that would have resulted in. This is precisely the number of "long" simulations that we are allowed to do in rollout long.

So do I understand correctly, that this means the "rollout long" is relevant as a test, but not possible within 60s per epoch?

from euro-neurips-2022.

leonlan avatar leonlan commented on June 25, 2024

Unfortunately, the results for rollout_long30 were incomplete. A single instance set was missing, meaning that the comparison of results was faulty since the other methods did have results for all the instance sets. The sole reason why rollout_long30 had better results was simply because of missing data.

This means that my conclusion that improving the simulation solution quality could lead to drastic improvements was incorrect :-( There are small gains (1-2K) which are already merged by #132.

I'll close this issue, but feel free to open again when relevant.

from euro-neurips-2022.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.