Git Product home page Git Product logo

meta-sim's Introduction

Meta-Sim: Learning to Generate Synthetic Datasets

PyTorch code for Meta-Sim (ICCV 2019). For technical details, please refer to:

Meta-Sim: Learning to Generate Synthetic Datasets
Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler
ICCV, 2019 (Oral)
[Paper] [Video] [Project Page]

Abstract: Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose Meta-Sim, which learns a generative model of synthetic scenes, and obtain images as well as its corresponding ground-truth via a graphics engine. We parametrize our dataset generator with a neural network, which learns to modify attributes of scene graphs obtained from probabilistic scene grammars, so as to minimize the distribution gap between its rendered outputs and target data. If the real dataset comes with a small labeled validation set, we additionally aim to optimize a meta-objective, i.e. downstream task performance. Experiments show that the proposed method can greatly improve content generation quality over a human-engineered probabilistic scene grammar, both qualitatively and quantitatively as measured by performance on a downstream task.

Note: This codebase is a reimplementation of Meta-Sim, and currently has the MNIST experiments from the paper. Some practices (eg: testing by generating a static final dataset and training task network offline, creating separate validation data (used by task network) and testing data (used to report numbers) for the target distribution) are omitted for simplicity of code use and understanding. Comments are provided at appropriate locations for interested users, and the changes required should be simple.

Citation

If you use this code, please cite:

@inproceedings{kar2019metasim,
title={Meta-Sim: Learning to Generate Synthetic Datasets},
author={Kar, Amlan and Prakash, Aayush and Liu, Ming-Yu and Cameracci, Eric and Yuan, Justin and Rusiniak, Matt and Acuna, David and Torralba, Antonio and Fidler, Sanja},
booktitle={ICCV},
year={2019}
}

Environment Setup

All the code has been run and tested on Ubuntu 16.04, Python 3.7 with NVIDIA Titan V GPUs

  • Clone repository
git clone [email protected]:nv-tlabs/meta-sim.git
cd meta-sim
  • Setup python environment
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
export PYTHONPATH=$PWD:$PYTHONPATH
  • Download assets
./scripts/data/download_assets.sh
  • Create target data
python scripts/data/generate_dataset.py --config data/generator/config/mnist_val.json
python scripts/data/generate_dataset.py --config data/generator/config/bigmnist_val.json

Training

First, define an experiment file, such as mnist_rot.yaml. Then, run train.py as,

# For MNIST rotation of digits experiment
python scripts/train/train.py --exp experiments/mnist_rot.yaml

Synthetic images generated for a training epoch for the task net should be available in the {logdir} inside the appropriate experiment directory. The model should slowly learn to rotate digits and they look like this across time:

Getting Started: To get your hands dirty, train.py is the appropriate location.

Tips:

  • Training with the task-loss is slow, with one gradient update for a lot of computation. For larger experiments, we train with just MMD first, and finetune with the task loss. Here, both are set to be on by default. Depending on initialization, sometimes training might take a long time to converge, but in our experience, it eventually always converges.
  • Sometimes, it is important to have enough target data for distribution matching to work properly. Here, for example we generate 1000 examples synthetically to use as target data, which sometimes might be not enough due to randomness in how diverse the generated data is. Try increasing the size if you face issues by modifying the appropriate config file used by the data generation script.

meta-sim's People

Contributors

amlankar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

meta-sim's Issues

Why sample action from a gaussian distribution instead of using continuous methods, e.g., DDPG?

Hi developers,

Thank you for releasing the code.

While reading your code, I find it interesting that you're sampling action from a gaussian distribution and control the mean of that gaussian. link.

There are some continuous control alternatives that can also be applied to this problem, e.g. PPO, DDPG. Can you please elaborate on why you choose your method and how does it help the meta-sim framework?

Best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.