Git Product home page Git Product logo

landmark-exploration's Introduction

trophy

Aleksei's GitHub stats

landmark-exploration's People

Contributors

akumaraguru avatar alex-petrenko avatar gautams3 avatar pelillian avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

pelillian

landmark-exploration's Issues

Research: dense vs sparse rewards

Based on distance metric we can provide both sparse rewards (e.g. only +1 for discovering new landmarks) and dense reward (e.g. give a positive reward for getting further from the known landmarks).

Intuitively, sparse rewards should be more reliable, because with dense reward there might exist unexpected local maxima. Although with dense rewards we can achieve better sample efficiency.

Baselines: PPO+ICM and PPO+RND

This should be pretty straightforward. We already have an implementation of ICM with A2C, need to use it with PPO algorithm. RND is the same but without training of the inverse dynamics model.

Ablation study: reachability metric vs embedding space

Space discretization can be implemented in two different ways:

  1. (current implementation) train a binary classifier, 0 - observations are close, 1 - observations are far apart
  2. learn an embedding space and use the distance in embedding space to determine the reachability

(2) has the potential to be more generalizable, and I've never seen people use it this way.

Experiments: generalization

We randomly generate a big number of 3D mazes, split them into train, test and validation sets. Then the policy is only trained on a train set.

We can open-source this dataset, which should be a good contribution.

Research: learn a reliable locomotion policy

The agent should be able to robustly navigate between any pair of landmarks connected by an edge in the graph.
Possibilities are:

  1. Collect past trajectories from exploration policy that initially discovered the landmarks and train locomotion using behaviour cloning
  2. Train locomotion policy with RL: reward +1 is given when target is reached (based on distance metric).

Research: graph pruning and reset policy

This mostly applies to training on a single environment. We need to both learn an exploration policy (that would know how to expand the graph), but we also need to make progress so we should not reset the graph too often.

Research: sample the most "interesting" landmark and navigate to it

When the exploration is stuck we want to look at the graph and select the next target for exploration. A basic policy can be to just randomly sample one, although might not be efficient for large environments, because we always want to be on the "frontier" of exploration.

One idea is to use the value estimate of the exploration policy as the "potential" of the landmark for further exploration. We can also use UCB or Thompson sampling to make sure we explore all landmarks, even those that don't seem promising now.

Baselines: episodic curiosity through reachability

This should be TMAX but without:

  1. Getting rewards for finding edges in the graph
  2. Using locomotion policy
  3. Using neighborhood encoder

I think we should just have a flag that would disable all of the above.

Experiments: large fixed environments

The baseline methods were designed mostly to explore a single sparse-rewards environment rather than learn a general exploration policy.
We could generate a number of particularly challenging mazes (or use this Unity environment) to see how our method compares to baselines when trained and evaluated on a single environment.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.