Git Product home page Git Product logo

rapids-ai-bbo-2nd-place-solution's Introduction

 NVIDIA RAPIDS.AI Solution of BBO NeurIPS 2020

We won the 2nd place of the NeurIPS 2020 competition: Find the best black-box optimizer (BBO) for machine learning.🎉 We proposed a simple ensemble algorithm of black-box optimizers that outperforms any single optimizer but within the same timing budget. Evaluation of optimizers is a computing-intensive and time consuming task since the number of test cases grow exponentially with models, datasets and metrics. In our case, we need to evaluate 15 optimizers, execute 4,230 jobs, train 2.7 million models and run 541,440 optimizations (suggest-observe). Utilizing the RAPIDS libraries cuDF and cuML, our GPU Accelerated exhaustive search is capable of finding the best ensemble in reasonable time. On a DGX-1, the search time is reduced from more than 10 days on two 20-core CPUs to less than 24 hours on 8-GPUs.

Update

Our paper is published on arxiv! GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms

Please use the following BibTeX if you want to cite our work:

@misc{liu2020gpu,
    title={GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms},
    author={Jiwei Liu and Bojan Tunguz and Gilberto Titericz},
    year={2020},
    eprint={2012.04201},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Introduction

In this competition, black-box optimization algorithms are evaluated on real-world objective functions, using a benchmark system built on top of the AutoML challenge workflow and the Bayesmark package. This competition has widespread impact as black-box optimization is relevant for hyper-parameter tuning in almost every machine learning project (especially deep learning). The leader board will be determined using the optimization performance on held-out (hidden) objective functions, where the optimizer must run without human intervention.

Final Submission

Our final submission is an ensemble of optimizer TuRBO and scikit-optimize. Code is in example_submissions/turbosk.

Our ensemble method turbo-skopt (LB 92.9 ranking 2nd) improves significantly upon the single optimizers it consists of, namely turbo (LB 88.9 ranking 24) and skopt (LB 88.08 ranking 36) on the final leaderboard. We see similar improvement in our local validation.

Solution Overview

Our solution includes two parts:

  • A multi-GPU optimized exhaustive search algorithm (this repo).
  • Rapids-enabled Bayesmark (rapids branch)
GPU acceleration of BBO. (a) GPUs are used to execute computing intensive function evaluations with cuDF and cuML libraries. (b) Parallel execution of function evaluation and optimization on multiple GPUs.

Install Instructions

Create a conda Environment

  • conda create -n bbo_rapids python=3.7
  • conda activate bbo_rapids

Install cudf, cuml and pytorch

  • conda install "pytorch=1.6" "cudf=0.16" "cuml=0.16" cudatoolkit=10.2.89 -c pytorch -c rapidsai -c nvidia -c conda-forge -c defaults

Install optimization algorithms

Install rapids-enabled Bayesmark

How to Run the Code

  • please change the global variable NUM_GPUS in run_one_opt.py accordingly
  • run a quick sanity check experiment with python run_one_opt.py, which takes 6 mins on a dgx-1.
  • run the exhaustive search with python run_exhaustive_search.py. which takes less than 24 hours on a dgx-1.

Key Findings

  1. The ensemble of optimizers outperform single optimizers in terms of generalization performance.

Performance of optimization algorithms in terms of (a) cross validation score that is visible to and minimized by optimizers and (b) holdout validation score which represents the generalization ability of the optimizer. The y-axis is normalized mean score and lower is better. The top 5 optimizers are highlighted in each sub-figure.

  1. Optimizers are good at different machine learning models.
Generalization performance of optimizers on each cuML model. The best optimizer for each model (per row) is highlighted.
  1. The overall execution time is dominant by model evaluation rather than optimization
(a) Running time comparison between the proposed multi-GPU implementation and multi-core CPU implementation. (b) The breakdown of execution time per iteration.
Run time comparison of cuML vs sklearn models.

We chose turbo-skopt as our final submission because

  • it has a Top-3 generalization score.
  • it converges faster than single optimizers.
  • it achieves best performance for a representative deep learning model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.