Git Product home page Git Product logo

far-ho's Introduction

FAR-HO

Gradient-based hyperparameter optimization and meta-learning package based on TensorFlow

This is the new package that implements the algorithms presented in the paper Forward and Reverse Gradient-Based Hyperparameter Optimization. For the older package see RFHO. FAR-HO features simplified interfaces, additional capabilities and a tighter integration with tensorflow.

  • Reverse hypergradient (ReverseHG), generalization of algorithms presented in Domke [2012] and MacLaurin et Al. [2015] (without reversable dynamics and "reversable dtype"); including the "truncated reverse version" ReverseHG.truncated, see Truncated Back-propagation for Bilevel Optimization
  • Forward hypergradient (ForwardHG)
  • Online versions of the two previous algorithms: Real-Time HO (RTHO) and Truncated-Reverse HO (TRHO)
  • Implicit differentiation (ImplicitHG), which can be used to implement HOAG algorithm [Pedregosa, 2016]

These algorithms algorithms compute, with different procedures, the (approximate) gradient of an outer objective such as a validation error with respect to the outer variables (e.g. hyperparameters). We call the gradient of the outer objective hypergradient. The "online" algorithms may perform several updates of the outer variables before reaching the final iteration, and are in general are much faster then their "batch" version. This procedure is linked to warm restart for solving the inner optimizaiton problem, but the hypergradient is, in general, biased.

IMPORTANT NOTE: This is not a plug-and-play hyperparameter optimizaiton package, but rather a research package that collects some useful methods that aim at simplifying the creation of experiments in gradient-based hyperparameter optimizaiton and related areas. With respect to other HPO packages, here a more specific problem structure is required. Furthermore, depending on the specific problem, the performance may be somewhat sensiteve to algorithmic parameters. As an important example, the inner optimizaion dynamics should not diverge in order for the hypergradients to yield useful informations [ Troubleshooting section coming soon! ].

NOTE II: In Italian FARO means beacon or lighthouse (so... no "H", but the "H" in Italian is silent!) .

alt text

These algorithms are useful also in meta-learning where parameters of various meta-learners effectively play the role of outer variables, as explained here in the workshop paper A Bridge Between Hyperparameter Optimization and Learning-to-learn. and Bilevel Programming for Hyperparameter Optimization and Meta-Learning

This package is also described in the workshop paper _ Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning_ presented at AutoML 2018 at ICML

Installation & Dependencies

Clone the repository and run setup script.

git clone git clone https://github.com/lucfra/FAR-HO.git
cd FAR-HO
python setup.py install

Beside "usual" packages (numpy), FAR-HO is built upon tensorflow. Some examples depend on the package experimet_manager while automatic dataset download (Omniglot) requires datapackage.

Please note that required packages will not be installed automatically.

Overview

Aim of this package is to implement and develop gradient-based hyperparameter optimization (HO) techniques in TensorFlow, thus making them readily applicable to deep learning systems. This optimization techniques find also natural applications in the field of meta-learning and learning-to-learn. Feel free to issues comments, suggestions and feedbacks! You can email me at [email protected] .

Quick Start

Core Steps

  • Create a model1 with TensorFlow
  • Create the hyperparameters you wish to optimize2 with the function get_hyperparameter (which could be also variables of your model)
  • Define an inner objective (e.g. a training error) and an outer objective (e.g. a validation error) as scalar tensorflow.Tensor
  • Create an instance of HyperOptimizer after choosing an hyper-gradient computation algorithm among ForwardHG, ReverseHG and ImplicitHG (see next section)
  • Call the function HyperOptimizer.minimize specifying passing the outer and inner objectives, as well as an optimizer for the outer problem (which can be any optimizer form tensorflow) and an optimizer for the inner problem (which must be an optimizer contained in this package; at the moment gradient descent, gradient descent with momentum and Adam algorithms are available, but it should be quite straightforward to implement other optimizers, email me if you're interested!)
  • Execute HyperOptimizer.run(T, ...) function inside a tensorflow.Session, optimize inner variables (parameters) and perform a step of optimization of outer variables (hyperparameter).

Two scripts in the folder autoMLDemos showcase typical usage of this package

import far_ho as far
import tensorflow as tf

model = create_model(...)  

lambda1 = far.get_hyperparameter('lambda1', ...)
lambda1 = far.get_hyperparameter('lambda2', ...)
io, oo = create_objective(...)

inner_problem_optimizer = far.GradientDescentOptimizer(lr=far.get_hyperparameter('lr', 0.1))
outer_problem_optimizer = tf.train.AdamOptimizer()

farho = far.HyperOptimizer() 
ho_step = farho.minimize(oo, outer_problem_optimizer,
                     io, inner_problem_optimizer)

T = 100
with tf.Session().as_default():
  for _ in range(100):
    ho_step(T)    

1 This is gradient-based optimization and for the computation of the hyper-gradients second order derivatives of the training error show up (even tough no Hessian matrix is explicitly computed at any time); therefore, all the ops used in the model should have a second order derivative registered in tensorflow.

2 For the hyper-gradients to make sense, hyperparameters should be real-valued. Moreover, while ReverseHG should handle generic r-rank tensor hyperparameters, ForwardHGrequires scalars hyperparameters. Use the keyword argument scalar=True in get_hyperparameter for obtaining a scalr splitting of a general tensor.

Which Algorithm Do I Choose?

Forward and Reverse-HG compute the same hypergradient, so the choice is a matter of time versus memory!

alt text

The online versions of the algorithms can dramatically speed-up the optimization.

The Idea Behind: Hyperparameter Optimization

The objective is to minimize some validation function E with respect to a vector of hyperparameters lambda. The validation error depends on the model output and thus on the model parameters w. w should be a minimizer of the training error and the hyperparameter optimization problem can be naturally formulated as a bilevel optimization problem.
Since these problems are rather hard to tackle, we
explicitly take into account the learning dynamics used to obtain the model
parameters (e.g. you can think about stochastic gradient descent with momentum), and we formulate HO as a constrained optimization problem. See the paper for details.

New features and differences from RFHO

  • Simplified interface: optimize paramters and hyperparamters with "just" a call of far.HyperOptimizer.minimize, create variables designed as hyperparameters with far.get_hyperparameter, no more need to vectorize the model weights, far.optimizers only need to specify the update as a list of pairs (v, v_{k+1})
  • Additional capabilities: set an initalizaiton dynamics and optimize the (dsitribution) of initial weights, allowed explicit dependence of the outer objective w.r.t. hyperparameters, support for multiple outer objectives and multiple inner problems (episode batching, average the sampling from distributions, ...)
  • Tighter integration: collections for hyperparameters and hypergradients (use far.GraphKeys), use out-of-the-box models (no need to vectorize the model), use any TensorFlow optimizer for the outer objective (validation error)
  • Lighter package: only code for implementing the algorithms and running the examples
  • Forward hypergradient methods have been reimplemented with a double reverse mode trick, thanks to Jamie Townsend.

Citing

If you use this package please cite

@InProceedings{franceschi2017forward,
  title = 	 {Forward and Reverse Gradient-Based Hyperparameter Optimization},
  author = 	 {Luca Franceschi and Michele Donini and Paolo Frasconi and Massimiliano Pontil},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {1165--1173},
  year = 	 {2017},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/franceschi17a/franceschi17a.pdf},
}
Works on meta-learning
@InProceedings{franceschi2018bilevel,
  title = 	 {Bilevel Programming for Hyperparameter Optimization and Meta-learning},
  author = 	 {Luca Franceschi and Paolo Frasconi and Saverio Salzo and Riccardo Grazzi and Massimiliano Pontil},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning (ICML 2018},
  year = 	 {2018},
  series = 	 {Proceedings of Machine Learning Research},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/franceschi18a/franceschi18a.pdf},
}
@article{franceschi2017bridge,
  title={A Bridge Between Hyperparameter Optimization and Larning-to-learn},
  author={Franceschi, Luca and Frasconi, Paolo and Donini, Michele and Pontil, Massimiliano},
  journal={arXiv preprint arXiv:1712.06283},
  year={2017}
}

This package has been used for the project LDS-GNN: the code for the ICML 2019 paper "Learning Discrete Structures for Graph Neural Networks".

far-ho's People

Contributors

jmikko avatar lucfra avatar prolearner avatar simonepanicucci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

far-ho's Issues

20-way classification on MiniImageNet h5 error

Hi @lucfra ,
I ran the same code with CLASSES=20 with your h5 files. However, I get an error. Here is the code:

from far_ho.examples.hyper_representation import train, mini_imagenet_model

if __name__ == '__main__':
  CLASSES = 20
  SHOTS = 1
  META_BATCH_SIZE = 4
  from experiment_manager.datasets import load
  mini_imagenet = load.meta_mini_imagenet(std_num_classes=CLASSES,
                                          std_num_examples=(SHOTS*CLASSES, 15*CLASSES), h5=True)
  res = train(mini_imagenet, 'test', mini_imagenet_model, T=1, print_every=1000, MBS=META_BATCH_SIZE, n_episodes_testing=150, patience=20)

The error is related to dataset.

Traceback (most recent call last):
  File "run_hyper.py", line 11, in <module>
    res = train(mini_imagenet, 'maml', mini_imagenet_model, T=1, print_every=500, MBS=META_BATCH_SIZE, n_episodes_testing=150, patience=20)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/far_ho/examples/hyper_representation.py", line 245, in train
    farho.run(T[0], trfd, vfd)  # one iteration of optimization of representation variables (hyperparameters)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/savers/records.py", line 72, in _saver_wrapped
    self._execute_save(res, *args, **kwargs)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/savers/records.py", line 125, in _execute_save
    super()._execute_save(res, *args, **kwargs)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/savers/records.py", line 98, in _execute_save
    _res=res)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/savers/save_and_load.py", line 535, in save
    rss = _compute_value(pt, save_dict)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/savers/save_and_load.py", line 516, in _compute_value
    if callable(pt[1]) else _tf_run_catch_not_initialized(pt, _partial_save_dict)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/savers/save_and_load.py", line 492, in _maybe_call
    _out = _method()
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/far_ho/examples/hyper_representation.py", line 157, in <lambda>
    'FLAT', lambda: accs_and_errs(metasets),
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/far_ho/examples/hyper_representation.py", line 133, in accs_and_errs
    for _d in meta_dataset.generate(n_episodes_testing, batch_size=MBS, rand=0):
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/datasets/structures.py", line 265, in generate
    yield self.generate_batch(batch_size, rand=rand, *args, **kwargs)
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/datasets/structures.py", line 271, in generate_batch
    return [self.generate_datasets(rand, *args, **kwargs) for _ in range(batch_size)]
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/datasets/structures.py", line 271, in <listcomp>
    return [self.generate_datasets(rand, *args, **kwargs) for _ in range(batch_size)]
  File "/home/amir/.conda/envs/farho/lib/python3.5/site-packages/experiment_manager/datasets/load.py", line 400, in generate_datasets
    random_classes = rand.choice(list(clss.keys()), size=(num_classes,), replace=False)
  File "mtrand.pyx", line 1437, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17481)
ValueError: Cannot take a larger sample than population when 'replace=False'

Could you please let me know how to fix this?

pytorch support

This is a nice package for HPO, but it seems that it is built only on TF.
Just wondering is there any version that supports pytorch?
I think that will make the package more popular for HPO reseachers!

'FAR-HO/tests/check forward.ipynb' is not working!

When I run this notebook I get the following error:

First I am told that the param lambda is not even connected with the model, which doe not make any sense.

Then it fails at the assertion a t line 113 in file utils.py

Please help, and it would be really helpful, if you could also provide an example notebook with ForwardHG for MNIST (both) with and without online learning ...

Thanks in advance ...
Habib

//----------------------------------------------------------------------------------------------------------

Hyperparameter <tf.Variable 'lambda_components/0:0' shape=() dtype=float32_ref> is detached from this optimization dynamics.


AssertionError Traceback (most recent call last)
in ()
----> 1 ss, farho, cost, oo = _test(far.ForwardHG)
2
3 tf.global_variables_initializer().run()
4
5 # execution with gradient descent! (looks ol)

in _test(method)
25 optim_oo = tf.train.AdamOptimizer(.01)
26 farho = far.HyperOptimizer(hypergradient=method())
---> 27 farho.minimize(oo, optim_oo, cost, io_optim)
28 return ss, farho, cost, oo

/volume1/scratch/r0605927/backMeUpPlz/lib/python2.7/site-packages/far_ho/hyper_parameters.pyc in minimize(self, outer_objective, outer_objective_optimizer, inner_objective, inner_objective_optimizer, hyper_list, var_list, init_dynamics_dict, global_step, aggregation_fn, process_fn)
132 """
133 optim_dict = self.inner_problem(inner_objective, inner_objective_optimizer, var_list, init_dynamics_dict)
--> 134 self.outer_problem(outer_objective, optim_dict, outer_objective_optimizer, hyper_list, global_step)
135 return self.finalize(aggregation_fn=aggregation_fn, process_fn=process_fn)
136

/volume1/scratch/r0605927/backMeUpPlz/lib/python2.7/site-packages/far_ho/hyper_parameters.pyc in outer_problem(self, outer_objective, optim_dict, outer_objective_optimizer, hyper_list, global_step)
117 :return: itself
118 """
--> 119 hyper_list = self._hypergradient.compute_gradients(outer_objective, optim_dict, hyper_list=hyper_list)
120 self._h_optim_dict[outer_objective_optimizer].update(hyper_list)
121 self._global_step = global_step

/volume1/scratch/r0605927/backMeUpPlz/lib/python2.7/site-packages/far_ho/hyper_gradients.pyc in compute_gradients(self, outer_objective, optimizer_dict, hyper_list)
343 # d_E_T = dot(vectorize_all(d_oo_d_state), vectorize_all(zs))
344 d_E_T = [dot(d_oo_d_s, z) for d_oo_d_s, z in zip(d_oo_d_state, zs)
--> 345 if d_oo_d_s is not None and z is not None]
346 hg = maybe_add(tf.reduce_sum(d_E_T), d_oo_d_hyp) # this is right... the error is not here!
347 # hg = maybe_add(d_E_T, d_oo_d_hyp)

/volume1/scratch/r0605927/backMeUpPlz/lib/python2.7/site-packages/far_ho/utils.pyc in dot(a, b, name)
111 Dot product between vectors a and b with optional name
112 """
--> 113 assert a.shape.ndims == 1, '{} must be a vector'.format(a)
114 assert b.shape.ndims == 1, '{} must be a vector'.format(b)
115 with tf.name_scope(name, 'Dot', [a, b]):

AssertionError: Tensor("Mean_1_1/gradients/mul_5_grad/Reshape:0", shape=(2, 3), dtype=float32) must be a vector

HyperGradient Computation Methods Are Not Isolated...

Hi,

I wrote the following code to compare the hyper-gradient computed by ReverseHG and ForwardHG methods in the same file:

### ReverseHG
farho = far.HyperOptimizer()
hypergradient = farho.hypergradient
run = farho.minimize(val_loss, oo_optim, tr_loss, io_optim)
grads_hvars = [hypergradient.hgrads_hvars(hyper_list=hll)
    for opt, hll in farho._h_optim_dict.items()]
run(T, inner_objective_feed_dicts=tr_supplier, outer_objective_feed_dicts=val_supplier, _skip_hyper_ts=True)
grads_hvars_val = ss.run(grads_hvars, _opt_fd(farho._global_step, val_supplier))
print(grads_hvars_val)


### ForwardHG
hypergradient_fwd = far.ForwardHG()
farho_fwd = far.HyperOptimizer(hypergradient=hypergradient_fwd)
run_fwd = farho_fwd.minimize(val_loss, oo_optim, tr_loss, io_optim)
grads_hvars_fwd = [hypergradient_fwd.hgrads_hvars(hyper_list=hll)
    for opt, hll in farho_fwd._h_optim_dict.items()]
run_fwd(T, inner_objective_feed_dicts=tr_supplier, outer_objective_feed_dicts=val_supplier, _skip_hyper_ts=True)
grads_hvars_fwd_val = ss.run(grads_hvars_fwd, _opt_fd(farho_fwd._global_step, val_supplier))
print(grads_hvars_fwd_val)

They receive identical inputs and compute the hyper-gradient for the same hyper-variable (_skip_hyper_ts=True so the hyper-parameter remains unchanged) but for some reason their output is quiet different. If noticed that if I run them in separate files (with a fix random seed) or run ForwardHG block before ReverseHG block their outputs would be similar. I can not see how Reverse and Forward hyper-gradient computation can effect each other as they don't share any variable. Could you please explain how these two methods can be run in the same file?

I have also attached the complete python code for this experiment.

cp.py.zip

Issues about far.AdamOptimizer() with ReverseHG

We work on a sparse logistic regression task on data set 20newsgroups and want to find the best regularization lambda. When we tried to use far.AdamOptimizer() for the inner optimizer and ReverseHG() for hyper optimizer, lambda goes to NAN. We found that:

  1. The hypergradient is an uninitialized value in this situation.
  2. The same setting work perfect on ForwardHG() with far.AdamOptimizer().
  3. A small learning rate for far.AdamOptimizer() will work, but it is not efficient.
  4. A dense version fails too, the reason may be the large feature dimensions?

Best regards,
Xiang Geng

Issues about optimizing other parameters besides learning rate

I have emailed Luca Franceschi about some issues with this library, and he asked me to share it.
I've been working on a MLP and wanted to optimize the following parameters, but found some problems:

  • Keep probability of a dropout layer: Luca explained to me that this is not posible since it has some non differentiable points.
  • Regularization beta: We are using tf.nn.l2_loss, but can't optimize the beta.
  • AdamOptimizer: When we tried to use far.AdamOptimizer() for the inner optimizer the code started crashing. Apparently there are some undefined variables: _beta1_power and _beta2_power. I think this is an error in the library.
    Until now we have been able to optimize only the learning rate. It would be great if there could be a list of the things you can and can't do with this library.

Best regards,
Nicolás Zorzano.

Python Version

Hi,

Thanks for providing the code. It is well written and very easy to use. I was wondering which version of Python and Tensorflow you used for testing?

Reproducing mini imagenet results

Hi, I was wondering if you could share a code that can reproduce the mini imagenet results in you workshop paper. I have tried couple of different learning rates and the best one-shot test accuracy I could get was around 43%. I used T=4 as it is mentioned in the paper.

Thanks,
Haamoon

hyper_representation.py OOM error on default miniImageNet settings

Hi @lucfra
While I wanted to try miniImageNet for meta batch size > 1, I got out of memory errors (tried on gtx 1080 and titax X gpus). Below is the code:

from hyper_representation import train, mini_imagenet_model

if __name__ == '__main__':
  CLASSES = 5
  SHOTS = 1
  META_BATCH_SIZE = 2
  from experiment_manager.datasets import load
  mini_imagenet = load.meta_mini_imagenet(std_num_classes=CLASSES,
                                          std_num_examples=(SHOTS*CLASSES, 15*CLASSES), h5=False, load_all_images=True)
  res = train(mini_imagenet, 'maml', mini_imagenet_model, T=1, print_every=500, MBS=META_BATCH_SIZE, n_episodes_testing=150, patience=20)

Could you please let me know how you ran the miniImageNet experiments?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.