Git Product home page Git Product logo

lamcts's Introduction

Latent Action Monte Carlo Tree Search (LA-MCTS)

LA-MCTS is a new MCTS based derivative-free meta-solver. It learns to partition the search space, so that solvers such as Bayesian optimization or evolutionary algorithms can focus on a smaller region to find better solutions with fewer samples.

Please 🌟star🌟 the repo if you like our work, thank you.

Contributors

Linnan Wang (First Author), Yuandong Tian (Principal Investigator), Yiyang Zhao, Saining Xie, Teng Li and Rodrigo Fonesca.

What's in this release?

This release contains our implementation of LA-MCTS and its application to Neural Architecture Search (LaNAS), but it can also be applied to large-scale hyper-parameter optimization, reinforcement learning, scheduling, optimizing computer systems, and many others.

Neural Architecture Search (NAS)

Black-box optimization

  • Performance with baselines: 1 minute evaluations of LA-MCTS v.s. Bayesian Optimization and Evolutionary Search.
    In the NeurIPS-2020 black box optimization challenge, the concept of LA-MCTS is used by 3rd (JetBrains) and 8th (KAIST) place teams. Check out the leaderboard here.

  • Mujoco Experiments: LA-MCTS on Mujoco environment.

Project Logs

Building the MCTS based NAS agent

Inspired by AlphaGo, we build the very first NAS search algorithm based on Monte Carlo Tree Search (MCTS) in 2017, namely AlphaX. The action space is fixed (layer-by-layer construction) and MCTS is used to steer towards promising search regions. We showed the Convolutional Neural Network designed by AlphaX improve the downstream applications such as detection, style transfer, image captioning, and many others.

Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search
AAAI-2020, [code]
Linnan Wang (Brown), Yiyang Zhao(WPI), Yuu Jinnai(Brown), Yuandong Tian(FAIR), Rodrigo Fonseca(Brown)

From AlphaX to LaNAS

On AlphaX, we find that different action space used in MCTS significantly affects the search efficiency, which motivates the idea of learning action space for MCTS on the fly during training. This leads to LaNAS. LaNAS uses a linear classifier at each decision node of MCTS to learn good versus bad actions, and evaluates each leaf node, which now represents a subregion of the search space rather than a single architecture, by a uniform random sampling one architecture and evalute. The first version of LaNAS implemented a distributed system to perform NAS by training every such samples from scratch using 500 GPUs. The second version of LaNAS, called one-shot LaNAS, uses a single one-shot subnetwork to evaluate the quality of samples, trading evaluation efficiency with accuracy. One-shot LaNAS finds a reasonable solution in a few GPU days.

Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search
TPAMI 2021
Linnan Wang (Brown), Saining Xie (FAIR), Teng Li(FAIR), Rodrigo Fonesca (Brown), Yuandong Tian (FAIR)

From LaNAS to a generic solver LA-MCTS

Since LaNAS works very well on NAS datasets, e.g. NASBench-101, and the core of the algorithm can be easily generalized to other problems, we extend it to be a generic solver for black-box function optimization. LA-MCTS further improves by using a nonlinear classifier at each decision node in MCTS and use a surrogate (e.g., a function approximator) to evaluate each sample in the leaf node. The surrogate can come from any existing Black-box optimizer (e.g., Bayesian Optimization). The details of LA-MCTS can be found in the following paper.

Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search
NeurIPS 2020
Linnan Wang (Brown University), Rodrigo Fonesca (Brown University), Yuandong Tian (Facebook AI Research)

From one-shot NAS to few-shot NAS

To overcome issues of one-shot NAS, we propose few-shot NAS that uses multiple supernets, each covering different regions of the search space specified by the intermediate of the search tree. Extensive experiments show that few-shot NAS significantly improves upon one-shot methods. See the paper below for details.

Few-shot Neural Architecture Search
[code]
Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI)

License

LA-MCTS is under CC-BY-NC 4.0 license.

lamcts's People

Contributors

linnanwang avatar yard1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lamcts's Issues

关于search中函数值计算的一些问题

我注意到在collect_samples function中对函数值是做了*-1,我的理解是想取到函数的最小值,但是为何在输出 best f(x)时使用np.absolute(curt_best_value)?

latnet cutmix import not found

Hello I have problem to run your code on CiFar10
i cannot find the function cutmix
from cutmix.cutmix import CutMix from cutmix.utils import CutMixCrossEntropyLoss

A question on the readme.md

I found in the readme that the command python train.py --auxiliary --batch_size=32 --init_ch=128 --**layer**=24 sets the parameter **layer** to 24. However, in the actual code, it should be specified as **layers** using parser.add_argument('--**layers**', type=int, default=20, help='total number of layers'). I am not sure if using 20 for layers can achieve an accuracy of 99.03%.

Assertion error in Node.py update_kids() after increasing batch_size TuRBO-1

Hello,

I am testing LA-MCTS together with TuRBO-1 and found it works well. I tried to increase the batch_size in TuRBO-1 to reduce the time spent on GP fitting. From what I got, the TuRBO paper indicates this is a legit choice for increasing overall computational efficiency without affecting the quality of the results.

To this end, I only modified collect_samples() in MCTS.py to handle the multiple values returned by each TuRBO-1 iteration. Nothing particularly ambiguous here.

And now on rare occasion I get an AssertionError thrown at line 47 in update_kids() in Node.py:

assert self.kids[0].classifier.get_mean() > self.kids[1].classifier.get_mean()

I'm not sure why just yet. And I don't understand why returning more samples from TuRBO-1 could trigger this.

Could you please indicate possible reasons for this?

I believe increasing batch_size in TuRBO-1 is something desirable and could benefit many.

Or am I misunderstanding something in the interplay between LA-MCTS and TuRBO-1?

Thanks for the great algo!

Does Cp support negative value?

The objective goal is to maximize my function f(range from 100 to 300). To use LaMCTS, I minimize the -f returning to LaMCTS. Just wanna double-check in this case Cp should be -10, correct? Thank you.

Discrete Search Space

Hello,
I'd like to apply LaMCTS to a discrete but highly dimensional search space (e.g., 200 dimensions).
Each dimension, however, has only four options.
Would LaMCTS be suitable for this? How hard is it to adapt the code for this?

Clustering features

In the line

tmp = np.concatenate( (self.X, self.fX.reshape([-1, 1]) ), axis = 1 )
the clustering in nodes is done based on both x and f(x) though in the paper it is said that clustering is performed based on f(x). Could you please clarify, is f(x) considered as just another variable of x for clustering?

The prediction results of kmean are all the same & cause the svm error

Hi,

Recently I am trying to use LA-MCTS to optimize my own task, but some errors occurred during the execution.

My task has 9 dimensions, and the HP setting is:

  • Cp = 10 ( follow the suggestion from the paper, ~= 10% of max f(x) )
  • leaf_size = 10
  • ninits = 40
  • kernel_type = "rbf"

Here is the error log:

Traceback (most recent call last):
  File "mcts-exec.py", line 30, in <module>
    agent.search(iterations = args.iterations)
  File "/root/workspace/lamcts/MCTS.py", line 239, in search
    self.dynamic_treeify()
  File "/root/workspace/lamcts/MCTS.py", line 116, in dynamic_treeify
    bad_kid.update_bag(  bad_kid_data  )
  File "/root/workspace/lamcts/Node.py", line 83, in update_bag
    self.is_svm_splittable = self.classifier.is_splittable_svm()
  File "/root/workspace/lamcts/Classifier.py", line 55, in is_splittable_svm
    self.learn_boundary(plabel)
  File "/root/workspace/lamcts/Classifier.py", line 410, in learn_boundary
    self.svm.fit(self.X, plabel)
  File "/opt/app-root/lib/python3.6/site-packages/sklearn/svm/_base.py", line 173, in fit
    y = self._validate_targets(y)
  File "/opt/app-root/lib/python3.6/site-packages/sklearn/svm/_base.py", line 560, in _validate_targets
    " class" % len(cls))
ValueError: The number of classes has to be greater than one; got 1 class

Since it occurs at Classifier.learn_clusters(), I print the result of self.kmean.predict(tmp) and find some clues:

  • The error will occur if plabel (the results of self.kmean.predict(tmp)) are all the same
# normal
plabel: [0 1 1 0 0 0 1 0 0 1 1]
# exception
plabel: [0 0 0 0 0]

I temporarily avoid this exception by making is_splittable_svm return False when plabel only contains a single class.

However, I would like to know that is it possible to happen in the general case? Or it may be caused by my own function?

Thank you for the work & sharing your code!

Cannot reproduce the results of Ant-v2 in the paper LAMCTS

Hi, recently we try to reproduce the result of mujoco task Ant-v2 in your paper. However we found all algorithms include LAMCTS cannot find reward more than -1000 after 10000 iterations. (In the paper LAMCTS should get a reward more than 1000.)

According to your paper, We use mean and std for linear policy from https://github.com/modestyachts/ARS and use 20 rollouts to get an average reward. (We use turbo as the local optimizer)

Actually, we got similar results of the paper in simple tasks like Swim, Hopper and HalfCheetah. But for Ant-888d task, we cannot reproduce the result in your paper. Can you reproduce the Ant-v2 result using the current code? Is there anything we need to change to reproduce the results?

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

When I use the scripts:
python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 1500
to train the cifar-10 dataset, I met this runtime error
How could I fix this?
I just use one P100 GPU
my pytorch version is 1.11.0+cu113

Are there any other ways to get the CIFAR10 pre-trained checkpoint?

Hi,
When I tried to download the pre-trained checkpoint lanas_128_99.03.zip from the url https://drive.google.com/file/d/1bZsEoG-sroVyYR4F_2ozGLA5W50CT84P/view?usp=sharing you provided, I failed. Therefore, are there any other ways to get the pre-trained checkpoint?

multi gpu run

Hi, is there a multi-gpu running option?
when I try to set --gpu=0,1 .. I get an error.

I'm asking because I saw that the checkpoint that got 99% acc had 96 batch size and on one GPU I can't run a train with 96 batch size.

Thanks

Genetic algorithm

I would like to test it out for genetic algorithm. But I'm not sure where the fitness function, or the parent selection function would go. Would you have a demo that perharps applies this?

Hi can you reproduce the acc in cifar10 now

Hi
I am now reproducing the acc in cifar10 (99%),since the process is very slow and i get limited gpus ,also i noticed that the previous issue about lanet ,so i want to know what configs should i use.
tks a lot
adamas

Use the K-Means labels or the SVM prediction labels to separate the samples into kid nodes?

Hello, I am now re-implementing the LA-MCTS according to the paper. My question is:
I use the labels predicted by SVM to decide which nodes of the parent should go to the good kid or bad kid. However, the SVM predicted labels are NOT always the same as those of K-Means. In some extreme cases, the mean of the bad kid was bigger than that of the good kid, due to the SVM classification error.
I checked the code of yours (as below), I found K-Means labels were used, instead of the SVM labels. I don't understand why should use the clustering labels, they don't represent the splitting, right?

def split_data(self):
        good_samples = []
        bad_samples = []
        train_good_samples = []
        train_bad_samples = []
        if len(self.samples) == 0:
            return good_samples, bad_samples

        plabel = self.learn_clusters()
        self.learn_boundary(plabel)

        for idx in range(0, len(plabel)):
            if plabel[idx] == 0:
                # ensure the consistency
                assert self.samples[idx][-1] - self.fX[idx] <= 1
                good_samples.append(self.samples[idx])
                train_good_samples.append(self.X[idx])
            else:
                bad_samples.append(self.samples[idx])
                train_bad_samples.append(self.X[idx])

        train_good_samples = np.array(train_good_samples)
        train_bad_samples = np.array(train_bad_samples)

        assert len(good_samples) + len(bad_samples) == len(self.samples)

        return good_samples, bad_samples

Questions about one-shot LaNAS

Hi, when I try to replicate the one-shot LaNAS with the source code, I cannot get the curve as in the paper.
image
Then I check the code and find something strange, could you please help me work it out?
(1) In one-shot LaNAS/LaNAS/Classifier.py, it seems that the learning rate is too small and the learned linear model is far from the optimal. I think the learning rate should be bigger (such as 0.01), is that true?
image
(2) When (1) is solved, I find that in function search_samples_under_constraints in one-shot LaNAS/LaNAS/MCTS.py, why just a pair of W and b is considered? And when I change the code into considering all Ws and bs, it seems to be hard to retrieve a sample under all constraints. How to solve this issue?
image
(3) In the search function in one-shot LaNAS/LaNAS/MCTS.py, the tree is updated after each sample is evaluated, but in other scripts (like Distributed_LanAS and LaNAS_NASBench101), the tree is updated after evaluating 20/50 samples. How often should the "learning phase" be recondutced (or in other words, how many samples should be retrieved in each "searching phase")?
(4) Which Cp should be chosen to replicate the result of one-shot LaNAS? I find it very big (Cp=10) in the code and it seems a little bit unreasonable.

TuRBO experiment reproduction

Hi, I want to reproduce the experiment using TuRBO. It seems that following the steps in https://github.com/facebookresearch/LaMCTS/tree/main/LA-MCTS can not successfully run the commented code. I have some questions about current code.

  1. Line 362 in Classifier.py. It seems that we need to input initial points into TuRBO agent. But the original TuRBO implementation does not support this customization.
  2. In the paper, you mentioned that "we restrict the TuRBO to uniformly sample from the intersection of the bounding box and $\Omega_{selected}$ ". But I didn't find the implementation in current code.

Could you give me some suggestions about TuRBO reproduction?

Cannot reproduce the results of Ackley-20d in the paper

Hi, thank you for your good work.

Now I am trying to reproduce the results of Ackley-20d in the LAMCTS paper (Fig. 4). I use the same hyper-parameters as (cp=1, leaf size=20) and run:
cd LA-MCTS
python run.py --func ackley --dims 20 --iterations 1000

I have repeated the process for 20 times, however I cannot obtain the the results in the paper. Is there anything I need to change to reproduce the results?

Question regarding the Mujoco environments

Dear authors,

Thank you for open-sourcing the code! I have a few questions regarding the Mujoco environments. First, I wonder how the mean and std arrays of `

class Hopper:

def __init__(self):
    self.mean    = np.array([1.41599384, -0.05478602, -0.25522216, -0.25404721, 
                             0.27525085, 2.60889529,  -0.0085352, 0.0068375, 
                             -0.07123674, -0.05044839, -0.45569644])
    self.std     = np.array([0.19805723, 0.07824488,  0.17120271, 0.32000514, 
                             0.62401884, 0.82814161,  1.51915814, 1.17378372, 
                             1.87761249, 3.63482761, 5.7164752 ])`

are computed?

I am also hoping to figure out how to construct the equivalent function evaluation class for 2dWalker, HalfCheetah, Ant, and Humanoid as reported in the paper. As these are not provided in the repo, could you let me know how to do so? Do they also require some hard-coded mean and std values to normalize the policy matrix M?

Thank you very much!

Reproducing CIFAR10 experiment

Hi,

firstly, thank you for great work and for sharing your code!

I'm trying to run train the best architecture on CIFAR-10, yet I cannot seem to get the reported numbers, I guess I must be missing something. I tried both 1500 epochs as well as 600 as you suggested in another ticket, but still the accuracy seems lower than 99%. Is there perhaps another hyperparameter setting I need to adjust ?

python train.py --auxiliary --batch_size=32 --init_ch=128 --layer=24 --arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]' --model_ema --model-ema-decay 0.9999 --auto_augment --epochs 600

`2021-01-28 03:44:52,831 gpu device = 0
2021-01-28 03:44:52,831 args = Namespace(arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]', auto_augment=True, auxiliary=True, auxiliary_weight=0.4, batch_size=32, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=600, exp_path='exp/cifar10', gpu=0, grad_clip=5, init_ch=128, layers=24, lr=0.025, model_ema=True, model_ema_decay=0.9999, model_ema_force_cpu=False, model_path='saved_models', momentum=0.9, report_freq=50, save='checkpoints/auto_aug-600-[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]-128-True-0.9999-0.2-24-16', seed=0, track_ema=False, wd=0.0003)
2021-01-28 03:44:56,852 param size = 44.591498MB
2021-01-28 03:44:59,182 epoch 0 lr 2.499966e-02
...
2021-02-03 00:52:40,405 valid_acc: 98.410000
2021-02-03 00:52:40,405 best_acc: 98.490000
2021-02-03 00:52:42,316 epoch 599 lr 0.000000e+00

or for 1500 epochs respectivelly: ```

2021-01-11 10:40:08,600 gpu device = 0
2021-01-11 10:40:08,600 args = Namespace(arch='[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]', auto_augment=True, auxiliary=True, auxiliary_weight=0.4, batch_size=32, cutout=False, cutout_length=16, data='../data', drop_path_prob=0.2, epochs=1500, exp_path='exp/cifar10', gpu=0, grad_clip=5, init_ch=128, layers=24, lr=0.025, model_ema=True, model_ema_decay=0.9999, model_ema_force_cpu=False, model_path='saved_models', momentum=0.9, report_freq=50, save='checkpoints/auto_aug-1500-[2, 2, 0, 2, 1, 2, 0, 2, 2, 3, 2, 1, 2, 0, 0, 1, 1, 1, 2, 1, 1, 0, 3, 4, 3, 0, 3, 1]-128-True-0.9999-0.2-24-16', seed=0, track_ema=False, wd=0.0003)
2021-01-11 10:40:11,371 param size = 44.591498MB
2021-01-11 10:40:12,744 epoch 0 lr 2.499995e-02


2021-01-27 07:02:55,357 valid_acc: 98.250000
2021-01-27 07:02:55,358 best_acc: 98.310000
2021-01-27 07:02:57,400 epoch 1499 lr 5.483112e-08

Thank you!
Lukas

Python package

Are there any plans to add setuptools scripts (or other), so that LA-MCTS and LaNAS can be packaged and installed through pip? It would greatly simplify development of projects that would like to use them as dependencies.

I would be happy to help out with this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.