Git Product home page Git Product logo

agentnet's People

Contributors

arogozhnikov avatar avidereta avatar gitter-badger avatar justheuristic avatar mariewelt avatar persiyanov avatar sidorov-ks avatar tigerneil avatar tswr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agentnet's Issues

Getting published

Subj.
After all preparations are done i have to advertise AgentNet a bit to let the interested persons know about it.

Current candidates for consideration:

  • Awesome ML
  • blog posts @ habr, VK public, Yandex ML
  • just write mails to the interested people i know

Session pool architecture (a.k.a. experience replay)

subj. Instead of training directly from experience, maintain a pool of experience sessions that are most interesting for training. The pool is updated by regular new session generation, while the least interesting sessions are removed from it.

Stage one - set up basic experiment
Stage two - find what kind of session filtering and training works [best]

KSfinder experiment setup

Create (or at least mock) the KSfinder as an experiment setup.

This also requires finding some place to store data since adding 9gb archives to github is kinda not an option.

Maybe create a separate repo for KSfinder experiment as a testimonial that it's possible to do so?

could not download data to installation folder

When trying to run experiments/wikicat for the first time it tries to download dataset, but it fails to put it to the installation folder
IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/agentnet-0.0.6-py2.7.egg/agentnet/experiments/wikicat/musicians_categorized.csv'

Adversarial architecture

The idea is to train two networks: the first predicts the Q-values, and the seconds predicts how gravely will the first one err on each Q-value. Both are recurrent.

LSTM memory cell

Make a one-step LSTM memory cell out of the Lasagne LSTM layer

Session training: defining batches on the fly

Right now the architecture only allows to agent.get_sessions on a full session pool. It would be nice to be able to select a fraction of session pool and train on it.
Maybe allow to create an environment over theano variables? Or make a sort of "mask" for session_pool and allow to set it? The former seems more long-term

Parallelize get_reward

Right now objective.get_reward_sequences uses scan to compute rewards, while in fact all operations can be executed in parallel. This becomes a bottleneck when computing on gpu.

  • unrolling scan won't help since it requires pre-defined amount of batches
  • maybe just make user implement the fully vectorized get_reward?

Wikicat & default experiment as experiments

Designate some "Experiments" folder where all experiments can be stored. Also standardize experiment definition to make adding new ones simpler. Reshape Wikicat (mb several major versions) and default logical setup as experiments.

Generic metrics & visualization

Transform print_session (from examples) into a module of visualization tools that can be used to get insight of what is happening.
Might also need to implement classical RL metrics (regret, action probas, etc)

Assemble a python library

Convert the repo into a library that one can ./setup.py build install
Find some way to download experiments on demand.
Structurize the modules into something sensible (Lasagne-like?)

Early tinkering -> wiki

Write some of the curious experiment results to the wiki

  • RMSPROP vs mommentum methods
  • Multilayer vs shallow

Learning refactor

Add is_alive support to all learning algos, make one-function loss computation, rename learning as objectives like in Lasagne.

detach learning from objective

Right now, both q-learning algorithms exist as separate methods inside objective.BaseObjective

Since more various Qlearning algos are planned, it is better to detach them from BaseObjective into some "LearningAlgorithm"-style thing.

Environment model agent training (a.k.a learning curiosity)

The idea is simple: you devote some of agent's memory to predicting environment next state and reward (like alphaGo NN).

The env model gets trained on agent's sessions and agent gets trained interacting with this model in a tree-like fashion, so whenever there is some high reward possibility extrapolated by the env model, the agent will follow this possibility and either get to a better policy or obtain a more accurate environment model.

Docs & tutorials

Find some place to store an entry guide to installing and running the thing + basic examples.

  • Wiki?
  • Readthedocs?

Recurrence internal refactor as Lasagne layer

Make sure no currently working API breaks down.

Implement
BaseAgent -> Recurrence (inputs = [], input_inits='zeros', input_sequences=[], state_variables={},outputs = []), that does essentially the same but without an environment

MDPAgent -> a child of Recurrence that adds environment on top of resolver

Generator -> just a simpler interface for recurrence that is compatible with Stack RNN example.

Recurrence must implement MergeLayer

Make method .as_layer, that takes input LAYERS, sequence LAYERS and init LAYERS (if any), and returns layers for all outputs, done via applying self as layer and that slicing the output tuple via ExpressionLayers.

Counter, switch, everyK (see if it works and saves time)

See if it is possible to implement a set of layers to implement layers that trigger every k turns

Counter() - a layer that stores a single integer as a state (time_tick)
EveryKTurnsLayer(counter, k, every_k, otherwise, lazy=True)
that outputs every_k once in K turns, otherwise on all other turns.

By default that timer works using lazy ifelse. If lazy is False, it uses switch

Make sure if_else works faster than switch, otherwise remove it

example: Learning to be kind

See if this can become a curious research:

Train 2 models on opensubtitles:

First is an environment model / generator, that tries to maximize it's likelihood over corpora.

Second is an agent, that tries to "talk to" the first model.

The objective of an agent is to maximize env model generated sequence's expected sentiment extimated by simple sentiment analysis model.

The expected result is that agent trains to respond "kinder" than in plain language model.

CudaNdArray shared vars

Find whether there is a difference in performance when storing shared varaibles not as CUDANDARRAY

Fix examples so that they work on both CPU and GPU.

Release preparations

There is a number of things (aside other tickets) to be done before this can be used with comfort:

  • checking if it runs on fresh machines with various setups
  • docker-container
  • vagrant?

"it runs" can be defined as "it could be installed with a single script without fighting machine-specific issues and it allows user to run the existing experiment and research notebooks without errors "

Softmax resolver

Create a resolver that chooses actions proportional to their b + k* (Q_a - Q_mean) / Q_variance
Where k and b are shared parameters

Window memory

implement a memory that keeps track of N last states/observations/whatever. Needed for atari demo.

Continuous/ndimensional action support

Make sure that all elements but for particular environments and objectives support non-integer and non-scalar actions [required for some control problems]

Find a way to maximize GPU utilization

Subj. Right now it's evaluated as 30%

Find if there are some profiling tools or tips for optimization. Find out how exactly are shared_values stored (on what side?)

Session printing broken

It shows something but not the actual actions, apparently [p.e. it does not stop with last action, but stops arbitrarily]

e.g.
decades_active:2000(qv = 0.273086220026) -> 1.0(ref = 1.15275239944) | decades_active:2010(qv = 0.160792022943) -> 1.0(ref = 1.00328731537) | end_session_now(qv = 0.00346032530069) -> 0.0(ref = 0.0252232588828) | category:List_of_tenors_in_non-classical_music(qv = 0.0265507996082) -> 0.0(ref = 0.0) |

Mb feature_names is broken? Mb for wikicat only?

Add py3 to container

Now got Python3 compatibility thanks to Andrey Sheka.

Todo - merge with develop and add py3 to docker automated build

is_alive support

is_alive indication must either be fully supported or moved outside the core for user-side tinkering. So far it seems that removing it entirely is better.

Make examples presentable

Current version of ./examples is mostly unreadable for an unprepared english-speaking person

  • Half of the text is russian <- translate to english
  • Some explainations are missing, some are obsolete <- verify and fix
  • Examples contain loads of repeated code
    • probably better to make a separate "problem definition" notebook and import it later

Loading models from CPU to GPU and vice versa

Loading a CPU-trained model into a GPU-compiled neural network of the [likely] same size resulter in some Assertion Error during get_history in sessions demo.

Make sure persistency works for both [CPU,GPU] x [CPU,GPU] cases.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.