htm-community / comportex Goto Github PK

View Code? Open in Web Editor NEW

152.0 152.0 27.0 1.47 MB

Hierarchical Temporal Memory in Clojure

Clojure 99.26% HTML 0.74%

comportex's People

Contributors

Stargazers

Watchers

comportex's Issues

Suggestion to try "substitution" or "context" pooling

I've been nagging Felix about trying what I call substitution/context pooling. This is based on the intuition I've posted about in the NuPIC-theory thread below and elsewhere:

http://lists.numenta.org/pipermail/nupic-theory_lists.numenta.org/2015-September/003191.html

"Intuitively, something which has identity, a pooled state, should be something which is independent of its environment. Which is the same thing as saying it will occur in a variety of contexts."

A naive first implementation of this might be pooling based on the number of alternate historical paths between two states in a sequence.

To do this we need a way to count historical paths between two points in a sequence. As a first suggestion we might allow all presented states in a sequence to pass activation to neighbouring states in historical sequences, and then count activations at different points in the presented sequence after some number of activation iterations. Then we might order (substitution) poolings based on the activations at each state in the sequence.

Felix has already implemented some of this with a ComportexViz commit:

htm-community/sanity@7e88314

This "issue" is by way of drawing attention to this experiment, and inviting wider comment.

allow algorithm alternatives

In Comportex the regions, layers etc are built to protocols, so technically everything can have alternative implementations. But it is awkward to provide a whole region/layer implementation when you just want to change a part of the core algorithms.

It would be useful to be able to specify alternative implementations (functions) for algorithm parts in "user" code. For example - spatial pooling, i.e. selecting the set of active columns.

This is complicated mainly by the need to be able to serialize a HTM model. So it couldn't just be a function parameter.

I imagine we could specify a namespaced symbol resolved to a function at runtime.

like Onyx.
spec {:spatial-pooling 'org.nfrac.comportex.cells/standard-spatial-pooling}
behind the scenes this could be either just a resolve and call, or a multimethod dispatched on the spec key.

Does that seem reasonable?

(It may also be worth remembering that engineering everything to be customisable is often a worse idea than having simple code that can be edited as needed.)

You can't create a 2-region model with a 3D topology

(require '[org.nfrac.comportex.demos.isolated-2d :as demo])
(demo/n-region-model 2)
==>
NullPointerException from clojure.lang.Numbers/add

Fewer than 3 dimensions are getting passed into ThreeDTopology/index_of_coordinates

Before 5a860d8, this code would cause a stack overflow, so this is just another layer of the onion.

I'm tentatively exploring the issue, but feel free to jump in if it's a no-brainer.

                  Numbers.java:  961  clojure.lang.Numbers/ops
                  Numbers.java:  126  clojure.lang.Numbers/add
                 topology.cljx:   83  org.nfrac.comportex.topology.ThreeDTopology/index_of_coordinates
                  columns.cljx:   61  org.nfrac.comportex.columns$uniform_ff_synapses$fn__11848/invoke
                      core.clj: 6353  clojure.core/mapv/fn
               ArrayChunk.java:   58  clojure.lang.ArrayChunk/reduce
                 protocols.clj:   98  clojure.core.protocols/fn
                 protocols.clj:   19  clojure.core.protocols/fn/G
                 protocols.clj:   31  clojure.core.protocols/seq-reduce
                 protocols.clj:   54  clojure.core.protocols/fn
                 protocols.clj:   13  clojure.core.protocols/fn/G
                      core.clj: 6289  clojure.core/reduce
                      core.clj: 6353  clojure.core/mapv
                  columns.cljx:   55  org.nfrac.comportex.columns$uniform_ff_synapses/invoke
                    cells.cljx:  638  org.nfrac.comportex.cells$layer_of_cells/invoke
                     core.cljx:   94  org.nfrac.comportex.core$sensory_region/invoke
                     core.cljx:  317  org.nfrac.comportex.core$region_network$fn__12331/invoke
                 protocols.clj:  143  clojure.core.protocols/fn
                 protocols.clj:   19  clojure.core.protocols/fn/G
                 protocols.clj:  147  clojure.core.protocols/fn
                 protocols.clj:   19  clojure.core.protocols/fn/G
                 protocols.clj:   31  clojure.core.protocols/seq-reduce
                 protocols.clj:   54  clojure.core.protocols/fn
                 protocols.clj:   13  clojure.core.protocols/fn/G
                      core.clj: 6289  clojure.core/reduce
                     core.cljx:  324  org.nfrac.comportex.core$region_network/invoke
                     core.cljx:  345  org.nfrac.comportex.core$regions_in_series/invoke
              isolated_2d.cljx:  111  org.nfrac.comportex.demos.isolated_2d$n_region_model/invoke
              isolated_2d.cljx:  108  org.nfrac.comportex.demos.isolated_2d$n_region_model/invoke
                          REPL:    1  user/eval19149
                 Compiler.java: 6703  clojure.lang.Compiler/eval
                 Compiler.java: 6666  clojure.lang.Compiler/eval
                      core.clj: 2927  clojure.core/eval
                      main.clj:  239  clojure.main/repl/read-eval-print/fn
                      main.clj:  239  clojure.main/repl/read-eval-print
                      main.clj:  257  clojure.main/repl/fn
                      main.clj:  257  clojure.main/repl
                   RestFn.java: 1523  clojure.lang.RestFn/invoke
        interruptible_eval.clj:   67  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
                      AFn.java:  152  clojure.lang.AFn/applyToHelper
                      AFn.java:  144  clojure.lang.AFn/applyTo
                      core.clj:  624  clojure.core/apply
                      core.clj: 1862  clojure.core/with-bindings*
                   RestFn.java:  425  clojure.lang.RestFn/invoke
        interruptible_eval.clj:   51  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
        interruptible_eval.clj:  183  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
        interruptible_eval.clj:  152  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
                      AFn.java:   22  clojure.lang.AFn/run
       ThreadPoolExecutor.java: 1142  java.util.concurrent.ThreadPoolExecutor/runWorker
       ThreadPoolExecutor.java:  617  java.util.concurrent.ThreadPoolExecutor$Worker/run
                   Thread.java:  745  java.lang.Thread/run

caching in encoders

Some encoders run non-trivial computations: currently the coordinate encoder and unique (random) encoder. We should cache the results for performance.

For JVM there is the very nice core.cache with a choice of expiry strategies: https://github.com/clojure/core.cache/

There doesn't seem to be a good clojurescript port, but it would be easy to pull out LRUCache from https://github.com/clojure/core.cache/blob/master/src/main/clojure/clojure/core/cache.clj

Readme out of date...

Hi.

I'm trying to follow through on the readme examples as per here: https://github.com/nupic-community/comportex/wiki/A-sample-workflow-with-the-REPL

I think some names have changed, eg:

(core/column-state-freqs (:r0 (:regions-map model)))

should now be:

(core/column-state-freqs (:rgn-0 (:regions model)))

Keep up the good work!

lazy-growing synapse graph

HTM model creation can be extremely slow. The time goes into creating the huge proximal synapse graphs containing all potential connections.

The problem of explicitly representing full potential synapse graphs is more acute in higher level layers because their input -- from cell layers -- is extremely sparse: column activation of 2% with depth 20 = 0.1% (except when bursting). With such sparsity, each column needs a lot of synapses in order to reach a reasonable stimulus threshold: to reach 5 active synapses, an average of 5000 random synapse connections are needed.

This is mitigated to some extent by the learning mechanism which can grow additional synapses directly to the active inputs (same mechanism as on distal dendrite segments), but we still need a reasonable degree of initial connectivity to activate columns in the first place.

First proposal

Lazy creation of the proximal synapse graph: synapses are only created upon the first activation of each source bit.
That would be equivalent to the current behaviour except that lazy synapses would not be decremented until they come into existence.
We could bias the new synapses towards neglected columns, achieving boosting and also partially adjusting for the above point.

Second proposal

Lazy creation would only happen while previously unseen input bits continued to appear. But random growth and death of synapses could also continue indefinitely (either eagerly or lazily), giving a boosting effect.

better hash / random number generator

Random numbers are used:

to generate initial proximal (potential) synapse connections;
to sub-sample from active cells when growing dendrites;
in unique encoder (each distinct input given a unique bit set);
in coordinate encoder (coordinates' priorities and corresponding bits);
in some of the example input generators.

The encoder uses require deterministic (seeded) random numbers.

Currently we use PPRNG since it presents a uniform API across clojurescript and jvm clojure. However I found, when testing coordinate encoder, that the RNG produces a very similar bitset (i.e. random number sequence) when given different seeds -- if the seeds are related: cljs hashes of vector tuples having the same first element, [x y] vs [x z].

I've worked around that for the moment, but the same problem would come up in unique encoder, and generally suggests problems (although I haven't looked into it). It may be that this kind of usage is not reasonable for a RNG and we should use a full-blown hashing algorithm instead?

Anyway if we need a new random number generator, a very nice splittable RNG has been implemented by Gary Fredericks for test.check https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/random.clj

Better docs please :)

I've just been reading through:
http://floybix.github.io/2014/11/05/htm-protocols

Pretty hard to grasp from an "outsider". I'd suggest visualising the architecture, with some kind of graph to show where each kind of Protocol goes in the IO flow... I'd love to help out if I could.
Or maybe just link to some more/better articles/videos on htm theory and how it fits in with data structures and info flows

The first part is pretty clear, the step implements the main htm protocol and feeds the input in to each layer in canonical order. However how this fits into the regions is a bit of a mystery.

"Of course for all this to work it needs to call corresponding functions on individual regions, and within regions on layers of cells."

And then:

"And similarly within a region there are layer-activate, layer-learn, layer-depolarise functions."

Could you/we visualise this to make it more clear :)

Please export public API for javascript interop

I started exposing a JS public API here:

kristianmandrup@275ecd7

From the Readme "The main API is in the namespaces core, protocols and encoders"

But the link to the blog post supposed to explain them is a dead link :(

Please help out! Thanks :)

I mainly need some help using clj->js and js->clj for correct interop.

Just getting started - cljc files?

Hi, I'm just getting started with Clojure and NuPIC via clortex and comportex.

I can't seem to figure out how to get editor support for the .cljc files. I'm using Lighttable.
Any suggestions? Thanks :)

Support for Clojure 1.9

Clojure 1.9 is stable and the current core.async version causes issues with Clojure 1.9 . This is fixed at clojure/core.async@2f87bc7 . Consider upgrading to core.async-0.3.442 or the latest core.async to support Clojure 1.9.

Thanks.

select cells consistently, esp. when beginning sequences

When beginning a sequence (or after a sequence reset/break), there is no distal input, so no basis for choosing a winner/learning cell in each column. Cells are then chosen at random.

That random selection is a problem because when the same sequence is presented several times (in isolation) they will begin on different cells; and will consequently not reinforce previous learning, but will have partial learning spread across several cells. This can be seen in repeated sequence demos, where the whole sequence is learned but it keeps bursting.

Proposal - I think it would be better to start on the same cell consistently. The first cell.

Perhaps more generally the choice of winner/learning cell (when there are no predictive cells in a column) should not be completely random but should be a deterministic function of the set of previously-active cells. And it should be a robust function, so that similar activity consistently selects the same cells.

Proposal - Select cell number as (mod depth) of each distal input bit, and take the mode of that. Offset by the current column number (mod depth again), otherwise all cells would be synchronised and we lose combinatorial capacity (see #31).

Needs testing.

REPL improvement: Implement toString for RegionNetwork

When a RegionNetwork's value is returned to the REPL, I'm immediately filled with regret :)

On the REPL wiki page, I dodged this by immediately defing it.

(def model-t1 (htm-step model))

But it would be nice if I could just do (htm-step model) and get quick, comprehensible output. Some sort of summary. That'd make it easier to explore the API.

(I haven't investigated whether the toString approaches for Clojure and ClojureScript are compatible...)

Any benefit to htm.java again

Sorry didnt' find a way to communicate this, can we do htm.java? updates?

composable layers only; get rid of regions

I had thought regions might be a useful construct if there was some special interaction between layers in a region, like shared activation/inhibition of mini-columns.

But they've turned out to just get in the way, like OO cruft. I think it would be better work with flexible (composable) layers, and implement any special interactions on top.

Also - make it easy to have lateral feedback between layers ("cortical columns").

Question: Why focus-i?

Hi,

I tried to understand the use of focus-i. It seems to increase with the col value. I thought it designates the focused column in the input topology around which we then will choose :ff-potential-radius percent of nodes as potential synapse targets. But why can't line 126 then not just be all-ids (vec (p/neighbours-indices itopo col radius))?

learning functions more like STDP

We know that synaptic connections are reinforced if the source cell fires just before target (LTP) and punished if the reverse occurs - target fires before source (LTD).

Currently only the LTP part is implemented, by selecting sources active on the previous time step to a target cell.

I am not sure how sequence learning should happen in a pooling layer. Because cells can remain active for many time steps during pooling, the current implementation does not work: it ends up with all active pooling cells reinforcing connections to each other, even though the connections are not predictive in any useful sense.

I propose, as an experiment, changing this to exclude any source cells that were also active on the same time step as target (i.e. current time step when learning). That would allow sequence transitions to be learned in a pooling layer only when the prior state cells turn off.

Once could imagine looking at the following time step for an LTD implementation but that involve deferring learning, which is doable but hopefully not necessary.

q_learning_1d demo showing "nth not supported" error in tgt->i arg destructuring

FYI, I think this regressed in a recent comportex commit. I verified that my viz pull requests didn't cause this before I submitted them :). This was working recently.

I'm seeing the error when trying this demo in comportexviz.

Separate REPL code from HTM logic, stop forcing autoload, still make autoload convenient

Tentative plan:

Move logic from core.cljx to /comportex/src/org/nfrac/comportex/repl.cljx
Stop enabling truncation automatically in library code. Require a function call.
Create /comportex/dev/user.clj. Make it autoload truncation.

This opens the REPL code to other consumers (making user.clj simply a consumer of the API that anyone else can use)

distal synapse connections from remote cells/input bits

Currently distal synapses refer to a source cell in the same layer. Need to allow the source to be an input bit either from below (motor) or above (feedback) in the hierarchy.

So the source set will become a set of integers which can be mapped back to local cells and/or input bits.

I'm working on this now.

naming things

Proposed namespaces:

layer - renamed from current cells
stuff in current columns should be moved to layer
- and possibly a new ns homeostasis (just algorithms, similar to inhibition)
hierarchy (or architecture / network) - renamed from current core
geometry (or topography) - renamed from current topology
a new ns api (or core)
- with javascript and java ports.

testing / validation strategy

We should have some better automated tests. But avoiding the unproductive mess of lots of naive example-based tests.

Thinking specifically of

validating structure of function arguments and return values, perhaps using Herbert
property-based testing, using test.check. This would work particularly well for encoders I think.

Even Simulation-based testing would arguably be appropriate for HTMs, but not sure I want to invest too much time on that at the moment.

Any thoughts?

separated and serializable encoders

Encoders are the only things in Comportex HTM values that are not serializable. We should fix that so that models can be saved or sent over the wire.

Currently they are created using (reify) which makes a closure. But the main problem is how they get their particular data inputs out of the general, amorphous input-value provided to htm-step. That is using pre-transform which applies some arbitrary function to do the extraction.

Proposal:

Shift the task of formatting the data for encoding outside to whatever is creating the input values. An input-value as provided to htm-step would then be a structured value directly providing the data required by each input's encoder. Each input already has a keyword id specified in core/RegionNetwork so these keys could specify the input data for each.

Example. Suppose there are 2 inputs (feeding into one or more regions), called :main-input and :motor. The former is a concatenation of a category and a coordinate encoding. The latter is a single linear number encoding.

The input-value might then look like:

{:motor 42
 :main-input [:red {:coord [5.0 10.0], :radius 2.5}]
}

Each input encoder will be passed just its sub value.

The various encoder types can be made into Records or Types.

random selection of learning cells from bursting columns

In bursting columns, the winner / learning cell is selected by maximum total excitation (including a penalty on cells with inactive segments, to encourage efficient use of all cells).

Currently ties are broken by keeping the first. The problem with this is, a set of blank columns bursting in

context A will each end up with cell 0 learning,
context B will each end up with cell 1 learning,
context C will each end up with cell 2 learning, etc.

So after depth contexts, the next one will be fully identical to context A.

Instead we need to break ties randomly, unlocking the combinatorial capacity for representation.

Rainbow order is incorrect in logo

Red should be on outside.

htm-community / comportex Goto Github PK

comportex's People

Contributors

Stargazers

Watchers

Forkers

comportex's Issues

First proposal

Second proposal

Proposal:

Recommend Projects

Recommend Topics

Recommend Org