Git Product home page Git Product logo

Comments (12)

rcurtin avatar rcurtin commented on August 31, 2024 2

g++-10.2.1
armadillo-10.1.2
libblas3-3.9.0
Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz (24 cores)
128GB RAM

iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19437.87915 ms or 19.43787915 seconds.
MOEAD: 759.300531 ms or 0.759300531 seconds.
MOEAD is faster than NSGA2 by: 25.59971758 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19618.96229 ms or 19.61896229 seconds.
MOEAD: 749.125403 ms or 0.749125403 seconds.
MOEAD is faster than NSGA2 by: 26.18915632 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19503.08402 ms or 19.50308402 seconds.
MOEAD: 755.964217 ms or 0.755964217 seconds.
MOEAD is faster than NSGA2 by: 25.7989513 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19427.15078 ms or 19.42715078 seconds.
MOEAD: 756.711937 ms or 0.756711937 seconds.
MOEAD is faster than NSGA2 by: 25.67311263 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19050.98517 ms or 19.05098517 seconds.
MOEAD: 754.264042 ms or 0.754264042 seconds.
MOEAD is faster than NSGA2 by: 25.25771362 times!
==============================================

I didn't realize that I didn't have OpenBLAS, so I installed that and tried again.

iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22651.73566 ms or 22.65173566 seconds.
MOEAD: 767.95823 ms or 0.76795823 seconds.
MOEAD is faster than NSGA2 by: 29.49605171 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 21999.72798 ms or 21.99972798 seconds.
MOEAD: 753.613641 ms or 0.753613641 seconds.
MOEAD is faster than NSGA2 by: 29.19231658 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22076.3243 ms or 22.0763243 seconds.
MOEAD: 756.145916 ms or 0.756145916 seconds.
MOEAD is faster than NSGA2 by: 29.19585206 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22188.43106 ms or 22.18843106 seconds.
MOEAD: 756.922088 ms or 0.756922088 seconds.
MOEAD is faster than NSGA2 by: 29.31402242 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22429.88484 ms or 22.42988484 seconds.
MOEAD: 758.182905 ms or 0.758182905 seconds.
MOEAD is faster than NSGA2 by: 29.58373856 times!
==============================================

I noticed that this still only ran singlethreaded, so I added the -fopenmp flag to the Makefile to see if it made a difference.

iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22344.18103 ms or 22.34418103 seconds.
MOEAD: 763.240758 ms or 0.763240758 seconds.
MOEAD is faster than NSGA2 by: 29.27540333 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22285.86647 ms or 22.28586647 seconds.
MOEAD: 752.802438 ms or 0.752802438 seconds.
MOEAD is faster than NSGA2 by: 29.60387127 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22511.27473 ms or 22.51127473 seconds.
MOEAD: 753.323705 ms or 0.753323705 seconds.
MOEAD is faster than NSGA2 by: 29.88260502 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22437.68501 ms or 22.43768501 seconds.
MOEAD: 749.727739 ms or 0.749727739 seconds.
MOEAD is faster than NSGA2 by: 29.92777757 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22283.38624 ms or 22.28338624 seconds.
MOEAD: 767.106755 ms or 0.767106755 seconds.
MOEAD is faster than NSGA2 by: 29.04861167 times!
==============================================

Yet, that still ran primarily singlethreaded, so this optimizer must not be using many parallelized OpenBLAS functions. Why not one more? This time with -DARMA_NO_DEBUG and -march=native:

g++ main.cpp -larmadillo -march=native -DARMA_NO_DEBUG -O3 -o main -fopenmp
iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22678.61628 ms or 22.67861628 seconds.
MOEAD: 596.203135 ms or 0.596203135 seconds.
MOEAD is faster than NSGA2 by: 38.03840495 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 21937.10597 ms or 21.93710597 seconds.
MOEAD: 615.782141 ms or 0.615782141 seconds.
MOEAD is faster than NSGA2 by: 35.62478434 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 21995.86473 ms or 21.99586473 seconds.
MOEAD: 615.038286 ms or 0.615038286 seconds.
MOEAD is faster than NSGA2 by: 35.7634073 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22471.14412 ms or 22.47114412 seconds.
MOEAD: 603.519221 ms or 0.603519221 seconds.
MOEAD is faster than NSGA2 by: 37.23351857 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22628.71482 ms or 22.62871482 seconds.
MOEAD: 637.63713 ms or 0.63763713 seconds.
MOEAD is faster than NSGA2 by: 35.48838949 times!
==============================================

Okay I'm having fun here, what if we reduce OpenBLAS's overhead by setting OMP_NUM_THREADS=1 so it doesn't allocate any OpenMP threads at all?

g++ main.cpp -larmadillo -march=native -DARMA_NO_DEBUG -O3 -o main -fopenmp
iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19015.71507 ms or 19.01571507 seconds.
MOEAD: 597.977543 ms or 0.597977543 seconds.
MOEAD is faster than NSGA2 by: 31.80004884 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 18738.19142 ms or 18.73819142 seconds.
MOEAD: 595.199401 ms or 0.595199401 seconds.
MOEAD is faster than NSGA2 by: 31.48220812 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 18898.03075 ms or 18.89803075 seconds.
MOEAD: 596.83052 ms or 0.59683052 seconds.
MOEAD is faster than NSGA2 by: 31.66398185 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19049.85148 ms or 19.04985148 seconds.
MOEAD: 611.494579 ms or 0.611494579 seconds.
MOEAD is faster than NSGA2 by: 31.15293599 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 18672.83906 ms or 18.67283906 seconds.
MOEAD: 608.351875 ms or 0.608351875 seconds.
MOEAD is faster than NSGA2 by: 30.69414237 times!
==============================================

And hey why not let's switch back to reference BLAS... (now -fopenmp and OMP_NUM_THREADS are not needed)

g++ main.cpp -larmadillo -march=native -DARMA_NO_DEBUG -O3 -o main
iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19276.34759 ms or 19.27634759 seconds.
MOEAD: 607.342601 ms or 0.607342601 seconds.
MOEAD is faster than NSGA2 by: 31.73883663 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 18940.404 ms or 18.940404 seconds.
MOEAD: 595.555752 ms or 0.595555752 seconds.
MOEAD is faster than NSGA2 by: 31.80290667 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19399.05679 ms or 19.39905679 seconds.
MOEAD: 613.459964 ms or 0.613459964 seconds.
MOEAD is faster than NSGA2 by: 31.62236808 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19199.02058 ms or 19.19902058 seconds.
MOEAD: 597.695936 ms or 0.597695936 seconds.
MOEAD is faster than NSGA2 by: 32.12171846 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 18696.17214 ms or 18.69617214 seconds.
MOEAD: 607.011509 ms or 0.607011509 seconds.
MOEAD is faster than NSGA2 by: 30.80035858 times!
==============================================

Interesting, I guess the OpenMP overhead accounts for the runtime differences between OpenBLAS and reference BLAS.

Now I'm curious about what -O2 vs. -O3 does here. Sometimes it can make things worse to go to -O3, actually.

g++ main.cpp -larmadillo -march=native -DARMA_NO_DEBUG -O2 -o main
iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 20159.83326 ms or 20.15983326 seconds.
MOEAD: 661.115805 ms or 0.661115805 seconds.
MOEAD is faster than NSGA2 by: 30.49364893 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 20188.41654 ms or 20.18841654 seconds.
MOEAD: 657.896879 ms or 0.657896879 seconds.
MOEAD is faster than NSGA2 by: 30.68629322 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19748.83728 ms or 19.74883728 seconds.
MOEAD: 659.091739 ms or 0.659091739 seconds.
MOEAD is faster than NSGA2 by: 29.96371538 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 20082.95162 ms or 20.08295162 seconds.
MOEAD: 661.550343 ms or 0.661550343 seconds.
MOEAD is faster than NSGA2 by: 30.35740489 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 19833.78406 ms or 19.83378406 seconds.
MOEAD: 670.485473 ms or 0.670485473 seconds.
MOEAD is faster than NSGA2 by: 29.58122861 times!
==============================================

Looks like -O3 helps us out here a little bit.

It would be interesting to see what clang does, but, I probably should move on to other things...

Anyway I have absolutely no idea if all this information is useful, but I had fun playing with compiler flags. Results will not generalize past my particular machine. :)

from ensmallen.

zoq avatar zoq commented on August 31, 2024 1

armadillo-10.5.2
OpenBLAS 0.2.19-3
AMD Ryzen Threadripper 1920X 12-Core Processor, 4.0 GHz
64GB RAM

iter: 1
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22230.23347 ms or 22.23023347 seconds.
MOEAD: 768.330896 ms or 0.768330896 seconds.
MOEAD is faster than NSGA2 by: 28.93315053 times!
==============================================
iter: 2
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22473.75603 ms or 22.47375603 seconds.
MOEAD: 765.544132 ms or 0.765544132 seconds.
MOEAD is faster than NSGA2 by: 29.356578 times!
==============================================
iter: 3
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22116.02249 ms or 22.11602249 seconds.
MOEAD: 751.838324 ms or 0.751838324 seconds.
MOEAD is faster than NSGA2 by: 29.41592864 times!
==============================================
iter: 4
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22493.12273 ms or 22.49312273 seconds.
MOEAD: 764.744531 ms or 0.764744531 seconds.
MOEAD is faster than NSGA2 by: 29.41259704 times!
==============================================
iter: 5
==============================================
Speed in milliseconds for ZDT3
NSGA2: 22298.54773 ms or 22.29854773 seconds.
MOEAD: 763.427007 ms or 0.763427007 seconds.
MOEAD is faster than NSGA2 by: 29.20848689 times!
==============================================

from ensmallen.

zoq avatar zoq commented on August 31, 2024 1

Personally, I would not keep this open or pin it; what has been shown here is that MOEAD is faster than NSGA2 on the ZDT test suite, which aligns with the results in the paper. Looking at the results posted, they are all very similar, so I don't think we will see huge differences between the two methods on different setups.

One thing that currently isn't covered in your benchmark script is to start with different initial coordinates; maybe the one we are testing against right now are more beneficial for MOEAD, or maybe NSGA2 found a close solution before max generation is reached, what we can see here is that one MOEAD step is faster than NSGA2.

That said, what we should do is compile a suite of problems and benchmark different optimizers with different settings, similar to https://arxiv.org/abs/1910.05446, https://arxiv.org/abs/2007.01547 publish it in the form of a paper and link that one on the website.

from ensmallen.

zoq avatar zoq commented on August 31, 2024 1

I really like the real world problems - http://www-personal.umich.edu/~fioretto/cfp/OPTMAS18/papers/paper_7.pdf. But if we move forward with it, we should make sure the interface is there to easily add new problems.

from ensmallen.

jonpsy avatar jonpsy commented on August 31, 2024

Speed

Using the configurations mentioned above, I've ran the test setting random seed for DefaultMOEAD, BBSMOEAD, DirichletMOEAD. The results were more or less the same for all the ZDT test suites.

My specs:

Processor: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz 1.80 GHz
Installed RAM: 8.00 GB (7.89 GB usable)
System type: 64-bit operating system, x64-based processor

DefaultMOEAD was slightly faster (average 32x) than BBSMOEAD and DirichletMOEAD(average 30x).

@zoq @coatless @say4n (or anyone else for that matter) I'd be grateful if you guys could confirm these results on your local PC by cloning the repository above along with your specs.

WSL2
image

Native Linux
image

Somehow native Linux is slower than WSL2?

from ensmallen.

jonpsy avatar jonpsy commented on August 31, 2024

Quality of Front

I've used IGD+ and Epsilon Indicators for this.

image

The variation seems to be too high to give out any concrete numbers. But I think we can say affirmly that MOEAD is a LOT better than NSGA-II in terms of "quality".

from ensmallen.

jonpsy avatar jonpsy commented on August 31, 2024

Plots

So I tested all of these in the portfolio optimization notebook, and here one could clearly see why the indicator values are so much better for MOEAD.

NSGA-II

image
Left to right, Generated Front vs Expected Front.

Note that the right image tracks the Pareto front for the entire optimization process. The final Pareto Front is the left most layer in the picture.

MOEAD

  1. DefaultMOEAD: MOEAD<Uniform, Tchebycheff>

image

Much better! The plot is continuous and the shape of the Pareto Front is perfect by all means.

  1. BBSMOEAD: MOEAD<BayesianBootstrap, Tchebycheff>
    image

Not bad, although there are gaps here and there.

  1. DirichletMOEAD: MOEAD<Dirichlet, Tchebycheff>

image

Good! The Front has very few "holes" compared to BBSMOEAD but still couldn't beat DefaultMOEAD.

In conclusion: DefaultMOEAD > DirichletMOEAD > BBSMOEAD >>> NSGA2.

from ensmallen.

jonpsy avatar jonpsy commented on August 31, 2024

@rcurtin I can't thank you enough for the detailed analysis you've provided me with. I have a request, can we keep this thread open indefinitely or pin this or add in the documentation? What's your thoughts?

from ensmallen.

jonpsy avatar jonpsy commented on August 31, 2024

Acknowledged. To re-iterate, we agree that the implemented algorithm is displaying results as was promised in the paper (perhaps more than promised). MOEAD has not only outdone classic NSGA-II in terms of Quality Metric but also speed. In conclusion, we have sufficient proof to say algorithm is implemented correctly.

Reg. Initial Point, if diving into that matter, would you suggest we start randomly or should making an educated guesses? If the latter, what should be the criterion? Moving on to "finding solution before max gen", we need a way to track this. I guess a callback would do? We can maybe modify StopAtMinLoss for our case. something like

template<typename IndicatorType, typename TestSuiteType ......>
FunctionName(const IndicatorType& indicator, const TestSuiteType suite)
{
  // For example suite = ZDT
   CubeType bestFront = suite.GetReferenceFront();
// For example indicator = IGD+
   currentQuality= indicator::Apply(currentParetoFront, bestFront);
  if( currentQuality < bestQuality)
  {
    ....
  }
  steps++
   .....
}

something like this?

Finally, do you have other suites in mind? We can try a combination of single-objective functions but not sure to what avail they'll test the optimizer's ability. We could port DLTZ and the test suite mentioned in the MOEAD paper as well.

Let me know your thoughts.

from ensmallen.

mlpack-bot avatar mlpack-bot commented on August 31, 2024

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

from ensmallen.

say4n avatar say4n commented on August 31, 2024

Should we re-open this?

from ensmallen.

jonpsy avatar jonpsy commented on August 31, 2024

Should we re-open this?

+1. @zoq ?

from ensmallen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.