yandexdataschool / agentnet Goto Github PK
View Code? Open in Web Editor NEWDeep Reinforcement Learning library for humans
Home Page: http://agentnet.rtfd.org/
License: Other
Deep Reinforcement Learning library for humans
Home Page: http://agentnet.rtfd.org/
License: Other
Subj.
After all preparations are done i have to advertise AgentNet a bit to let the interested persons know about it.
Current candidates for consideration:
subj. Instead of training directly from experience, maintain a pool of experience sessions that are most interesting for training. The pool is updated by regular new session generation, while the least interesting sessions are removed from it.
Stage one - set up basic experiment
Stage two - find what kind of session filtering and training works [best]
Create (or at least mock) the KSfinder as an experiment setup.
This also requires finding some place to store data since adding 9gb archives to github is kinda not an option.
Maybe create a separate repo for KSfinder experiment as a testimonial that it's possible to do so?
Move whatever is not module-specific to the auxulary/utils part
Also make a folder of it.
When trying to run experiments/wikicat for the first time it tries to download dataset, but it fails to put it to the installation folder
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/agentnet-0.0.6-py2.7.egg/agentnet/experiments/wikicat/musicians_categorized.csv'
Create a tool that allows evaluating custom theano expressions (or a/l layer outputs) for each session step.
Make a separate function to shape experience replay outside theano. I know it slows down everything, just make it possible.
Also explain how to do that.
The idea is to train two networks: the first predicts the Q-values, and the seconds predicts how gravely will the first one err on each Q-value. Both are recurrent.
Create a tutorial on creating own experiments.
Make a one-step LSTM memory cell out of the Lasagne LSTM layer
to be implemented as a learning method
Description here
http://arxiv.org/pdf/1509.02971v5.pdf
Right now the architecture only allows to agent.get_sessions on a full session pool. It would be nice to be able to select a fraction of session pool and train on it.
Maybe allow to create an environment over theano variables? Or make a sort of "mask" for session_pool and allow to set it? The former seems more long-term
That's it, no caps-lock needed.
Right now objective.get_reward_sequences uses scan to compute rewards, while in fact all operations can be executed in parallel. This becomes a bottleneck when computing on gpu.
Designate some "Experiments" folder where all experiments can be stored. Also standardize experiment definition to make adding new ones simpler. Reshape Wikicat (mb several major versions) and default logical setup as experiments.
Required within BlackBox challenge
Transform print_session (from examples) into a module of visualization tools that can be used to get insight of what is happening.
Might also need to implement classical RL metrics (regret, action probas, etc)
Convert the repo into a library that one can ./setup.py build install
Find some way to download experiments on demand.
Structurize the modules into something sensible (Lasagne-like?)
Write some of the curious experiment results to the wiki
Add is_alive support to all learning algos, make one-function loss computation, rename learning as objectives like in Lasagne.
Right now it asserts with an unhelpful error
Right now, both q-learning algorithms exist as separate methods inside objective.BaseObjective
Since more various Qlearning algos are planned, it is better to detach them from BaseObjective into some "LearningAlgorithm"-style thing.
The idea is simple: you devote some of agent's memory to predicting environment next state and reward (like alphaGo NN).
The env model gets trained on agent's sessions and agent gets trained interacting with this model in a tree-like fashion, so whenever there is some high reward possibility extrapolated by the env model, the agent will follow this possibility and either get to a better policy or obtain a more accurate environment model.
Find some place to store an entry guide to installing and running the thing + basic examples.
Make a simple tool to build custom LSTM-like layers
Implement an algorithm that learns common baseline for Q-values
There are plenty of such. Consider implementing them and comparing against one another.
Implement it and compare with others.
It might make sense to follow the entropy trick from here:
http://arxiv.org/pdf/1602.01783.pdf
Make sure no currently working API breaks down.
Implement
BaseAgent -> Recurrence (inputs = [], input_inits='zeros', input_sequences=[], state_variables={},outputs = []), that does essentially the same but without an environment
MDPAgent -> a child of Recurrence that adds environment on top of resolver
Generator -> just a simpler interface for recurrence that is compatible with Stack RNN example.
Recurrence must implement MergeLayer
Make method .as_layer, that takes input LAYERS, sequence LAYERS and init LAYERS (if any), and returns layers for all outputs, done via applying self as layer and that slicing the output tuple via ExpressionLayers.
See if it is possible to implement a set of layers to implement layers that trigger every k turns
Counter() - a layer that stores a single integer as a state (time_tick)
EveryKTurnsLayer(counter, k, every_k, otherwise, lazy=True)
that outputs every_k
once in K turns, otherwise
on all other turns.
By default that timer works using lazy ifelse. If lazy is False, it uses switch
Make sure if_else works faster than switch, otherwise remove it
See if this can become a curious research:
Train 2 models on opensubtitles:
First is an environment model / generator, that tries to maximize it's likelihood over corpora.
Second is an agent, that tries to "talk to" the first model.
The objective of an agent is to maximize env model generated sequence's expected sentiment extimated by simple sentiment analysis model.
The expected result is that agent trains to respond "kinder" than in plain language model.
Find whether there is a difference in performance when storing shared varaibles not as CUDANDARRAY
Fix examples so that they work on both CPU and GPU.
There is a number of things (aside other tickets) to be done before this can be used with comfort:
"it runs" can be defined as "it could be installed with a single script without fighting machine-specific issues and it allows user to run the existing experiment and research notebooks without errors "
Create a resolver that chooses actions proportional to their b + k* (Q_a - Q_mean) / Q_variance
Where k and b are shared parameters
Make a branch of wikicat where NN is forced to predict person category by a separate net in the end.
How hard/beneficial will that be to convert the entire thing to TensorFlow?
implement a memory that keeps track of N last states/observations/whatever. Needed for atari demo.
Make sure that all elements but for particular environments and objectives support non-integer and non-scalar actions [required for some control problems]
planned via wrapping .ger_action_results(...)
Also requires that time_tick is not provided as a default parameter
Subj. Right now it's evaluated as 30%
Find if there are some profiling tools or tips for optimization. Find out how exactly are shared_values stored (on what side?)
It shows something but not the actual actions, apparently [p.e. it does not stop with last action, but stops arbitrarily]
e.g.
decades_active:2000(qv = 0.273086220026) -> 1.0(ref = 1.15275239944) | decades_active:2010(qv = 0.160792022943) -> 1.0(ref = 1.00328731537) | end_session_now(qv = 0.00346032530069) -> 0.0(ref = 0.0252232588828) | category:List_of_tenors_in_non-classical_music(qv = 0.0265507996082) -> 0.0(ref = 0.0) |
Mb feature_names is broken? Mb for wikicat only?
Now got Python3 compatibility thanks to Andrey Sheka.
Todo - merge with develop and add py3 to docker automated build
Compare the existing (and k-step, once implemented) reinforcement learning algorithms and their mixtures
subj. requires fixing.
is_alive indication must either be fully supported or moved outside the core for user-side tinkering. So far it seems that removing it entirely is better.
Known so far:
core:
agent, method docs
Examples: all examples
Util:
get_action_qvalues
Current version of ./examples is mostly unreadable for an unprepared english-speaking person
Make a flask demo of wikicat categorical network.
Loading a CPU-trained model into a GPU-compiled neural network of the [likely] same size resulter in some Assertion Error during get_history in sessions demo.
Make sure persistency works for both [CPU,GPU] x [CPU,GPU] cases.
Implement State-Action-Reward-State-Action learning algorithm
https://en.wikipedia.org/wiki/State-Action-Reward-State-Action
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.