yandexdataschool / agentnet Goto Github PK

View Code? Open in Web Editor NEW

300.0 28.0 73.0 204.65 MB

Deep Reinforcement Learning library for humans

Home Page: http://agentnet.rtfd.org/

License: Other

Python 59.27% Jupyter Notebook 40.73%

reinforcement-learning framework theano lasagne opeani-gym binder qlearning deep-learning deep-neural-networks

agentnet's People

Contributors

Stargazers

Watchers

agentnet's Issues

Getting published

Subj.
After all preparations are done i have to advertise AgentNet a bit to let the interested persons know about it.

Current candidates for consideration:

Awesome ML
blog posts @ habr, VK public, Yandex ML
just write mails to the interested people i know

Session pool architecture (a.k.a. experience replay)

subj. Instead of training directly from experience, maintain a pool of experience sessions that are most interesting for training. The pool is updated by regular new session generation, while the least interesting sessions are removed from it.

Stage one - set up basic experiment
Stage two - find what kind of session filtering and training works [best]

KSfinder experiment setup

Create (or at least mock) the KSfinder as an experiment setup.

This also requires finding some place to store data since adding 9gb archives to github is kinda not an option.

Maybe create a separate repo for KSfinder experiment as a testimonial that it's possible to do so?

Auxilary functions -> Utilities

Move whatever is not module-specific to the auxulary/utils part
Also make a folder of it.

could not download data to installation folder

When trying to run experiments/wikicat for the first time it tries to download dataset, but it fails to put it to the installation folder
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/agentnet-0.0.6-py2.7.egg/agentnet/experiments/wikicat/musicians_categorized.csv'

Evaluate custom expressions in Agent.get_sessions

Create a tool that allows evaluating custom theano expressions (or a/l layer outputs) for each session step.

Allow environments that work outside theano + make tutorial

Make a separate function to shape experience replay outside theano. I know it slows down everything, just make it possible.

Also explain how to do that.

Adversarial architecture

The idea is to train two networks: the first predicts the Q-values, and the seconds predicts how gravely will the first one err on each Q-value. Both are recurrent.

Example about experiment setup

Create a tutorial on creating own experiments.

LSTM memory cell

Make a one-step LSTM memory cell out of the Lasagne LSTM layer

Continuous action space policy gradient

to be implemented as a learning method

Description here
http://arxiv.org/pdf/1509.02971v5.pdf

Session training: defining batches on the fly

Right now the architecture only allows to agent.get_sessions on a full session pool. It would be nice to be able to select a fraction of session pool and train on it.
Maybe allow to create an environment over theano variables? Or make a sort of "mask" for session_pool and allow to set it? The former seems more long-term

rename to agentnet

That's it, no caps-lock needed.

Parallelize get_reward

Right now objective.get_reward_sequences uses scan to compute rewards, while in fact all operations can be executed in parallel. This becomes a bottleneck when computing on gpu.

unrolling scan won't help since it requires pre-defined amount of batches
maybe just make user implement the fully vectorized get_reward?

Wikicat & default experiment as experiments

Designate some "Experiments" folder where all experiments can be stored. Also standardize experiment definition to make adding new ones simpler. Reshape Wikicat (mb several major versions) and default logical setup as experiments.

Store initial hidden values with SessionPool and SessionBatch

Required within BlackBox challenge

Generic metrics & visualization

Transform print_session (from examples) into a module of visualization tools that can be used to get insight of what is happening.
Might also need to implement classical RL metrics (regret, action probas, etc)

Assemble a python library

Convert the repo into a library that one can ./setup.py build install
Find some way to download experiments on demand.
Structurize the modules into something sensible (Lasagne-like?)

Early tinkering -> wiki

Write some of the curious experiment results to the wiki

RMSPROP vs mommentum methods
Multilayer vs shallow

Learning refactor

Add is_alive support to all learning algos, make one-function loss computation, rename learning as objectives like in Lasagne.

persistence: support unnamed layers and layers with same names

Right now it asserts with an unhelpful error

detach learning from objective

Right now, both q-learning algorithms exist as separate methods inside objective.BaseObjective

Since more various Qlearning algos are planned, it is better to detach them from BaseObjective into some "LearningAlgorithm"-style thing.

Environment model agent training (a.k.a learning curiosity)

The idea is simple: you devote some of agent's memory to predicting environment next state and reward (like alphaGo NN).

The env model gets trained on agent's sessions and agent gets trained interacting with this model in a tree-like fashion, so whenever there is some high reward possibility extrapolated by the env model, the agent will follow this possibility and either get to a better policy or obtain a more accurate environment model.

Docs & tutorials

Find some place to store an entry guide to installing and running the thing + basic examples.

Wiki?
Readthedocs?

Custom memory layer

Make a simple tool to build custom LSTM-like layers

Learned baseline

Implement an algorithm that learns common baseline for Q-values

http://arxiv.org/pdf/1301.2315.pdf

K-step reinforcement learning

There are plenty of such. Consider implementing them and comparing against one another.

A3c a.k.a. Actor-Critic method

Implement it and compare with others.

It might make sense to follow the entropy trick from here:
http://arxiv.org/pdf/1602.01783.pdf

Recurrence internal refactor as Lasagne layer

Make sure no currently working API breaks down.

Implement
BaseAgent -> Recurrence (inputs = [], input_inits='zeros', input_sequences=[], state_variables={},outputs = []), that does essentially the same but without an environment

MDPAgent -> a child of Recurrence that adds environment on top of resolver

Generator -> just a simpler interface for recurrence that is compatible with Stack RNN example.

Recurrence must implement MergeLayer

Make method .as_layer, that takes input LAYERS, sequence LAYERS and init LAYERS (if any), and returns layers for all outputs, done via applying self as layer and that slicing the output tuple via ExpressionLayers.

Counter, switch, everyK (see if it works and saves time)

See if it is possible to implement a set of layers to implement layers that trigger every k turns

Counter() - a layer that stores a single integer as a state (time_tick)
EveryKTurnsLayer(counter, k, every_k, otherwise, lazy=True)
that outputs every_k once in K turns, otherwise on all other turns.

By default that timer works using lazy ifelse. If lazy is False, it uses switch

Make sure if_else works faster than switch, otherwise remove it

example: Learning to be kind

See if this can become a curious research:

Train 2 models on opensubtitles:

First is an environment model / generator, that tries to maximize it's likelihood over corpora.

Second is an agent, that tries to "talk to" the first model.

The objective of an agent is to maximize env model generated sequence's expected sentiment extimated by simple sentiment analysis model.

The expected result is that agent trains to respond "kinder" than in plain language model.

CudaNdArray shared vars

Find whether there is a difference in performance when storing shared varaibles not as CUDANDARRAY

Fix examples so that they work on both CPU and GPU.

Release preparations

There is a number of things (aside other tickets) to be done before this can be used with comfort:

checking if it runs on fresh machines with various setups
docker-container
vagrant?

"it runs" can be defined as "it could be installed with a single script without fighting machine-specific issues and it allows user to run the existing experiment and research notebooks without errors "

Softmax resolver

Create a resolver that chooses actions proportional to their b + k* (Q_a - Q_mean) / Q_variance
Where k and b are shared parameters

Forced category predictions

Make a branch of wikicat where NN is forced to predict person category by a separate net in the end.

Research potential TensorFlow conversion consequences

How hard/beneficial will that be to convert the entire thing to TensorFlow?

Window memory

implement a memory that keeps track of N last states/observations/whatever. Needed for atari demo.

Continuous/ndimensional action support

Make sure that all elements but for particular environments and objectives support non-integer and non-scalar actions [required for some control problems]

Environment interface with Lasagne layers

planned via wrapping .ger_action_results(...)

Also requires that time_tick is not provided as a default parameter

Find a way to maximize GPU utilization

Subj. Right now it's evaluated as 30%

Find if there are some profiling tools or tips for optimization. Find out how exactly are shared_values stored (on what side?)

Session printing broken

It shows ~~something~~ but not the actual actions, apparently [p.e. it does not stop with last action, but stops arbitrarily]

e.g.
decades_active:2000(qv = 0.273086220026) -> 1.0(ref = 1.15275239944) | decades_active:2010(qv = 0.160792022943) -> 1.0(ref = 1.00328731537) | end_session_now(qv = 0.00346032530069) -> 0.0(ref = 0.0252232588828) | category:List_of_tenors_in_non-classical_music(qv = 0.0265507996082) -> 0.0(ref = 0.0) |

Mb feature_names is broken? Mb for wikicat only?

Util:
get_action_qvalues

Make examples presentable

Current version of ./examples is mostly unreadable for an unprepared english-speaking person

Half of the text is russian <- translate to english
Some explainations are missing, some are obsolete <- verify and fix
Examples contain loads of repeated code
- probably better to make a separate "problem definition" notebook and import it later

yandexdataschool / agentnet Goto Github PK

agentnet's People

Contributors

Stargazers

Watchers

Forkers

agentnet's Issues

Recommend Projects

Recommend Topics

Recommend Org