Git Product home page Git Product logo

embodiedqa's Introduction

EmbodiedQA

Code for the paper

Embodied Question Answering
Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
arxiv.org/abs/1711.11543
CVPR 2018 (Oral)

In Embodied Question Answering (EmbodiedQA), an agent is spawned at a random location in a 3D environment and asked a question (for e.g. "What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person vision, and then answer the question ("orange").

This repository provides

If you find this code useful, consider citing our work:

@inproceedings{embodiedqa,
  title={{E}mbodied {Q}uestion {A}nswering},
  author={Abhishek Das and Samyak Datta and Georgia Gkioxari and Stefan Lee and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

Setup

virtualenv -p python3 .env
source .env/bin/activate
pip install -r requirements.txt

Download the SUNCG v1 dataset and install House3D.

NOTE: This code uses a fork of House3D with a few changes to support arbitrary map discretization resolutions.

Question generation

Questions for EmbodiedQA are generated programmatically, in a manner similar to CLEVR (Johnson et al., 2017).

NOTE: Pre-generated EQA v1 questions are available for download here.

Generating questions for all templates in EQA v1, v1-extended

cd data/question-gen
./run_me.sh MM_DD

List defined question templates

from engine import Engine

E = Engine()
for i in E.template_defs:
    print(i, E.template_defs[i])

Generate questions for a particular template (say location)

from house_parse import HouseParse
from engine import Engine

Hp = HouseParse(dataDir='/path/to/suncg')
Hp.parse('0aa5e04f06a805881285402096eac723')

E = Engine()
E.cacheHouse(Hp)
qns = E.executeFn(E.template_defs['location'])

print(qns[0]['question'], qns[0]['answer'])
# what room is the clock located in? bedroom

Pretrained CNN

We trained a shallow encoder-decoder CNN from scratch in the House3D environment, for RGB reconstruction, semantic segmentation and depth estimation. Once trained, we throw away the decoders, and use the encoder as a frozen feature extractor for navigation and question answering. The CNN is available for download here:

wget https://www.dropbox.com/s/ju1zw4iipxlj966/03_13_h3d_hybrid_cnn.pt

The training code expects the checkpoint to be present in training/models/.

Supervised Learning

Download and preprocess the dataset

Download EQA v1 and shortest path navigations:

wget https://www.dropbox.com/s/6zu1b1jzl0qt7t1/eqa_v1.json
wget https://www.dropbox.com/s/lhajthx7cdlnhns/a-star-500.zip
unzip a-star-500.zip

If this is the first time you are using SUNCG, you will have to clone and use the SUNCG toolbox to generate obj + mtl files for the houses in EQA.

NOTE: Shortest paths have been updated. Earlier we computed shortest paths using a discrete grid world, but we found that these shortest paths were sometimes innacurate. Old shortest paths are here.

cd utils
python make_houses.py \
    -eqa_path /path/to/eqa.json \
    -suncg_toolbox_path /path/to/SUNCGtoolbox \
    -suncg_data_path /path/to/suncg/data_root

Preprocess the dataset for training

cd training
python utils/preprocess_questions_pkl.py \
    -input_json /path/to/eqa_v1.json \
    -shortest_path_dir /path/to/shortest/paths/a-star-500 \
    -output_train_h5 data/train.h5 \
    -output_val_h5 data/val.h5 \
    -output_test_h5 data/test.h5 \
    -output_data_json data/data.json \
    -output_vocab data/vocab.json

Visual question answering

Update pretrained CNN path in models.py.

python train_vqa.py -input_type ques,image -identifier ques-image -log -cache

This model computes question-conditioned attention over last 5 frames from oracle navigation (shortest paths), and predicts answer. Assuming shortest paths are optimal for answering the question -- which is predominantly true for most questions in EQA v1 (location, color, place preposition) with the exception of a few location questions that might need more visual context than walking right up till the object -- this can be thought of as an upper bound on expected accuracy, and performance will get worse when navigation trajectories are sampled from trained policies.

A pretrained VQA model is available for download here. This gets a top-1 accuracy of 61.54% on val, and 58.46% on test (with GT navigation).

Note that keeping the cache flag ON caches images as they are rendered in the first training epoch, so that subsequent epochs are very fast. This is memory-intensive though, and consumes ~100-120G RAM.

Navigation

Download potential maps for evaluating navigation and training with REINFORCE.

wget https://www.dropbox.com/s/53edqtr04jts4q0/target-obj-conn-maps-500.zip

Planner-controller policy

python train_nav.py -model_type pacman -identifier pacman -log

REINFORCE

python train_eqa.py \
    -nav_checkpoint_path /path/to/nav/ques-image-pacman/checkpoint.pt \
    -ans_checkpoint_path /path/to/vqa/ques-image/checkpoint.pt \
    -identifier ques-image-eqa \
    -log

Changelog

09/07

  • We added the baseline models from the CVPR paper (Reactive and LSTM).
  • With the LSTM model, we achieved d_T values of: 0.74693/3.99891/8.10669 on the test set for d equal to 10/30/50 respectively training with behavior cloning (no reinforcement learning).
  • We also updated the shortest paths to fix an issue with the shortest path algorithm we initially used. Code to generate shortest paths is here.

06/13

This code release contains the following changes over the CVPR version

  • Larger dataset of questions + shortest paths
  • Color names as answers to color questions (earlier they were hex strings)

Acknowledgements

License

BSD

embodiedqa's People

Contributors

abhshkdz avatar lisaanne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

embodiedqa's Issues

TypeError: local_create_house() takes 2 positional arguments but 3 were given

After I prepared all the data as mentioned in README.md modified the House3D/tests/config.json file, I run the train_nav.py and it always gives me the error "TypeError: local_create_house() takes 2 positional arguments but 3 were given"

Specifically,

Process Process-2:
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
TypeError: local_create_house() takes 2 positional arguments but 3 were given
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "train_nav.py", line 817, in train
    train_loader = EqaDataLoader(**train_loader_kwargs)
  File "/home/jiayi/EmbodiedQA/training/data.py", line 890, in __init__
    max_actions=max_actions)
  File "/home/jiayi/EmbodiedQA/training/data.py", line 224, in __init__
    self._load_envs(start_idx=0, in_order=True)
  File "/home/jiayi/EmbodiedQA/training/data.py", line 326, in _load_envs
    self.all_houses = pool.starmap(local_create_house, _args)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 268, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
TypeError: local_create_house() takes 2 positional arguments but 3 were given

Any help?

Supply new results on the provided dataset

These would be very nice to have so we don't have to retrain the model ourselves. Would it be possible for you to provide the same results as in the paper but on the uploaded dataset?

build_graph() in house3d.py takes a lot of memory

I am trying to generate shortest paths from environments without questions. But the build_graph() function takes up more than 64 GB memory and takes more than an hour to finish. Is this situation normal or is there something wrong with the code?

load_graph() returns empty graph

I used the load_graph() method from House3DUtils to load a previously computed graph (the file path was valid), but self.graph was empty after the following lines:

EmbodiedQA/utils/house3d.py

Lines 272 to 273 in 9113156

self.graph = Graph()
self.graph.load(path)

Not sure if anyone else has had this issue; in any case, the following code worked for me:

import pickle
g = pickle.load(open(path, 'rb'))
from dijkstar import Graph
self.graph = Graph(g)

Errors running train_eqa.py

Hi I encountered some problems during the training of train_eqa.py. The mode is set as "train"

Traceback (most recent call last):
File "train_eqa_em.py", line 961, in
train(0, args, shared_nav_model, shared_ans_model)
File "train_eqa_em.py", line 641, in train
action = planner_prob.multinomial().data
TypeError: multinomial() missing 1 required positional arguments: "num_samples"`

The python is 3.7 from anaconda3. The torch version is 0.4.1.post2.
Any suggestions on this error? Thanks!

train_vqa (and others) hanging when using multiprocessing

I'm trying to run this model as per the instructions and it keeps hanging, usually during or after optimizer.step() but sometimes in other places as well. I've found that completely removing the multiprocessing and just running train() on it's own get's rid of my problem (I'm using a p3.2xlarge AWS instance, so memory/processing power is not an issue).

I also found this page which appears to address a very similar issue with using the data loader which you are also using in your code, so I am wondering if this could be the root of the problem. I have downloaded and installed and deleted and reinstalled all the repositories and data and everything else numerous times so I am pretty certain the issue is not my fault. Thanks!

Pretraining MultitaskCNN

Hi all,

I can't find code for pretraining the MultitaskCNN. Is it available somewhere?

Thanks!

Clarify eval/train parallel runs in vqa, nav, and eqa main

From what I can tell, in each of train_vqa, train_nav, and train_eqa, a model with shared parameters is fed to a thread running eval() and at least one thread running train() (more if specified via command-line arguments).

However, the train() method runs substantially slower, so for a fixed number of epochs (say, the default of 1000), on my machine eval() reaches the epoch cap when train() is only on around 160. After that, the train() thread(s) keep running but there's no eval() thread to checkpoint them! So I gain nothing by letting them continue to run.

For the models presented in the paper, what was your training paradigm to address this? Do you use the last checkpoint (of highest accuracy) that epoch() makes regardless of how far along the train() threads are?

Errors when running train_nav.py

Hi I encountered some errors when training the navigator.

Sometimes it would show:
AssertionError: [Environment] House object not found!
But when I cd into that house directory, there is a house.obj file.
The errors is caused by the the House3D/core.py, line 47, in local_create_house

And sometimes it would show:
python: vendor/csv.h:442: char * io::LineReader::next_line():
Assertion 'data_begin < data_end' failed

So where might be the problem of these errors?
Thanks

Assertion failure when running vanilla train_nav.py

I'm running into an assertion error when running the train_nav.py

I'm able to train VQA models, but when I run train_nav.py (after setting the -target_obj_conn_map_dir to the appropriate path on my system), the code starts training on the first epoch, reaching about 2% before failing an assertion in training/data.py.

The issued command matches the github example:

python train_nav.py -to_log 1 -model_type pacman -identifier pacman

I sort of assume I have failed to download or move some specific file, but as far as I can tell everything checks out, so I thought it could be a bug from the changes you guys have been making recently.

Let me know what you think; I'd really appreciate it so we can get the system running!

A question about VQA part

if len(pos_queue) < 5:  
    pos_queue = train_loader.dataset.episode_pos_queue[len(
        pos_queue) - 5:] + pos_queue

In train_eqa.py, when the length of input frames of vqa_model less than 5, it will use episode_pos_queue[len(pos_queue)-5:], is there something "using standard pos" occur?

For example, when randomly put the agent in somewhere far away from target object, it can stop immediately, and get final 5 frames from standard pos_queue, which lead to a high accuracy.

How about replacing it with the following code?

pos_queue = [pos_queue[0].copy() for _ in range(5 - len(pos_queue))] + pos_queue

Out of memory in train_vqa.py

train_vqa.py is running into out of memory issues. I managed to get some runs through but now it consistently goes out of memory. Even removing the multiprocessing does not solve that. Any help?
I have enough memory to run train_nav.py and train_eqa.py so it's not a problem of resources. I'm using AWS p3.2xlarge.

While I greatly appreciate you releasing the code, it would be great if you could add a bit of documentation in the code.

Out of memory in train_vqa.py with 8 num_processes

I have a linux server with 93G physical memory and 64G swap, but it still got the error below when I set num_processes=8:

File "/home/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch 
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Update: I found that the problem is it needs 65G Memory for each process in data loader. And the data in data loader is not shared. The implementation is consistent of multiprocessing. Since A3C need threads to get a better performance, I'm wondering if there is some way to have more num_processes by sharing data in data loader between processes?

train log
X-axes is number of epoch. This is the log I test on validation set every training epoch. My train log shows that the training process of train_vqa.py is not so stable. Is it similar to yours ?

suncg data unavailable

The SUNCG data is unavailable now. Is there another way to get obj + mtl files for the houses in EQA?

The version of SUNCG dataset v1, v2 or v2.1

I download the suncg v2.1 dataset, and do "python make_houses.py"
We only get 742 house.obj file while the eqa_v1.json reports a total envs of 770.
This cause the error during our training and evalution.
Error Code:
house objects not found! objFile=</EmbodiedQA/data/suncg/house/8675a21d3eb31d8c69e85a945ceeec00/house.obj>

I am sure the house path is correct. Is this problem relevant to the version suncg dataset ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.