htm-community / comportex Goto Github PK
View Code? Open in Web Editor NEWHierarchical Temporal Memory in Clojure
Hierarchical Temporal Memory in Clojure
I've been nagging Felix about trying what I call substitution/context pooling. This is based on the intuition I've posted about in the NuPIC-theory thread below and elsewhere:
http://lists.numenta.org/pipermail/nupic-theory_lists.numenta.org/2015-September/003191.html
"Intuitively, something which has identity, a pooled state, should be something which is independent of its environment. Which is the same thing as saying it will occur in a variety of contexts."
A naive first implementation of this might be pooling based on the number of alternate historical paths between two states in a sequence.
To do this we need a way to count historical paths between two points in a sequence. As a first suggestion we might allow all presented states in a sequence to pass activation to neighbouring states in historical sequences, and then count activations at different points in the presented sequence after some number of activation iterations. Then we might order (substitution) poolings based on the activations at each state in the sequence.
Felix has already implemented some of this with a ComportexViz commit:
This "issue" is by way of drawing attention to this experiment, and inviting wider comment.
In Comportex the regions, layers etc are built to protocols, so technically everything can have alternative implementations. But it is awkward to provide a whole region/layer implementation when you just want to change a part of the core algorithms.
It would be useful to be able to specify alternative implementations (functions) for algorithm parts in "user" code. For example - spatial pooling, i.e. selecting the set of active columns.
This is complicated mainly by the need to be able to serialize a HTM model. So it couldn't just be a function parameter.
I imagine we could specify a namespaced symbol resolved to a function at runtime.
{:spatial-pooling 'org.nfrac.comportex.cells/standard-spatial-pooling}
resolve
and call, or a multimethod dispatched on the spec key.Does that seem reasonable?
(It may also be worth remembering that engineering everything to be customisable is often a worse idea than having simple code that can be edited as needed.)
(require '[org.nfrac.comportex.demos.isolated-2d :as demo])
(demo/n-region-model 2)
==>
NullPointerException from clojure.lang.Numbers/add
Fewer than 3 dimensions are getting passed into ThreeDTopology/index_of_coordinates
Before 5a860d8, this code would cause a stack overflow, so this is just another layer of the onion.
I'm tentatively exploring the issue, but feel free to jump in if it's a no-brainer.
Numbers.java: 961 clojure.lang.Numbers/ops
Numbers.java: 126 clojure.lang.Numbers/add
topology.cljx: 83 org.nfrac.comportex.topology.ThreeDTopology/index_of_coordinates
columns.cljx: 61 org.nfrac.comportex.columns$uniform_ff_synapses$fn__11848/invoke
core.clj: 6353 clojure.core/mapv/fn
ArrayChunk.java: 58 clojure.lang.ArrayChunk/reduce
protocols.clj: 98 clojure.core.protocols/fn
protocols.clj: 19 clojure.core.protocols/fn/G
protocols.clj: 31 clojure.core.protocols/seq-reduce
protocols.clj: 54 clojure.core.protocols/fn
protocols.clj: 13 clojure.core.protocols/fn/G
core.clj: 6289 clojure.core/reduce
core.clj: 6353 clojure.core/mapv
columns.cljx: 55 org.nfrac.comportex.columns$uniform_ff_synapses/invoke
cells.cljx: 638 org.nfrac.comportex.cells$layer_of_cells/invoke
core.cljx: 94 org.nfrac.comportex.core$sensory_region/invoke
core.cljx: 317 org.nfrac.comportex.core$region_network$fn__12331/invoke
protocols.clj: 143 clojure.core.protocols/fn
protocols.clj: 19 clojure.core.protocols/fn/G
protocols.clj: 147 clojure.core.protocols/fn
protocols.clj: 19 clojure.core.protocols/fn/G
protocols.clj: 31 clojure.core.protocols/seq-reduce
protocols.clj: 54 clojure.core.protocols/fn
protocols.clj: 13 clojure.core.protocols/fn/G
core.clj: 6289 clojure.core/reduce
core.cljx: 324 org.nfrac.comportex.core$region_network/invoke
core.cljx: 345 org.nfrac.comportex.core$regions_in_series/invoke
isolated_2d.cljx: 111 org.nfrac.comportex.demos.isolated_2d$n_region_model/invoke
isolated_2d.cljx: 108 org.nfrac.comportex.demos.isolated_2d$n_region_model/invoke
REPL: 1 user/eval19149
Compiler.java: 6703 clojure.lang.Compiler/eval
Compiler.java: 6666 clojure.lang.Compiler/eval
core.clj: 2927 clojure.core/eval
main.clj: 239 clojure.main/repl/read-eval-print/fn
main.clj: 239 clojure.main/repl/read-eval-print
main.clj: 257 clojure.main/repl/fn
main.clj: 257 clojure.main/repl
RestFn.java: 1523 clojure.lang.RestFn/invoke
interruptible_eval.clj: 67 clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
AFn.java: 152 clojure.lang.AFn/applyToHelper
AFn.java: 144 clojure.lang.AFn/applyTo
core.clj: 624 clojure.core/apply
core.clj: 1862 clojure.core/with-bindings*
RestFn.java: 425 clojure.lang.RestFn/invoke
interruptible_eval.clj: 51 clojure.tools.nrepl.middleware.interruptible-eval/evaluate
interruptible_eval.clj: 183 clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
interruptible_eval.clj: 152 clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
AFn.java: 22 clojure.lang.AFn/run
ThreadPoolExecutor.java: 1142 java.util.concurrent.ThreadPoolExecutor/runWorker
ThreadPoolExecutor.java: 617 java.util.concurrent.ThreadPoolExecutor$Worker/run
Thread.java: 745 java.lang.Thread/run
Some encoders run non-trivial computations: currently the coordinate encoder and unique (random) encoder. We should cache the results for performance.
For JVM there is the very nice core.cache
with a choice of expiry strategies: https://github.com/clojure/core.cache/
There doesn't seem to be a good clojurescript port, but it would be easy to pull out LRUCache from https://github.com/clojure/core.cache/blob/master/src/main/clojure/clojure/core/cache.clj
Hi.
I'm trying to follow through on the readme examples as per here: https://github.com/nupic-community/comportex/wiki/A-sample-workflow-with-the-REPL
I think some names have changed, eg:
(core/column-state-freqs (:r0 (:regions-map model)))
should now be:
(core/column-state-freqs (:rgn-0 (:regions model)))
Keep up the good work!
HTM model creation can be extremely slow. The time goes into creating the huge proximal synapse graphs containing all potential connections.
The problem of explicitly representing full potential synapse graphs is more acute in higher level layers because their input -- from cell layers -- is extremely sparse: column activation of 2% with depth 20 = 0.1% (except when bursting). With such sparsity, each column needs a lot of synapses in order to reach a reasonable stimulus threshold: to reach 5 active synapses, an average of 5000 random synapse connections are needed.
This is mitigated to some extent by the learning mechanism which can grow additional synapses directly to the active inputs (same mechanism as on distal dendrite segments), but we still need a reasonable degree of initial connectivity to activate columns in the first place.
Lazy creation would only happen while previously unseen input bits continued to appear. But random growth and death of synapses could also continue indefinitely (either eagerly or lazily), giving a boosting effect.
Random numbers are used:
The encoder uses require deterministic (seeded) random numbers.
Currently we use PPRNG since it presents a uniform API across clojurescript and jvm clojure. However I found, when testing coordinate encoder, that the RNG produces a very similar bitset (i.e. random number sequence) when given different seeds -- if the seeds are related: cljs hashes of vector tuples having the same first element, [x y]
vs [x z]
.
I've worked around that for the moment, but the same problem would come up in unique encoder, and generally suggests problems (although I haven't looked into it). It may be that this kind of usage is not reasonable for a RNG and we should use a full-blown hashing algorithm instead?
Anyway if we need a new random number generator, a very nice splittable RNG has been implemented by Gary Fredericks for test.check
https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/random.clj
I've just been reading through:
http://floybix.github.io/2014/11/05/htm-protocols
Pretty hard to grasp from an "outsider". I'd suggest visualising the architecture, with some kind of graph to show where each kind of Protocol goes in the IO flow... I'd love to help out if I could.
Or maybe just link to some more/better articles/videos on htm theory and how it fits in with data structures and info flows
The first part is pretty clear, the step implements the main htm protocol and feeds the input in to each layer in canonical order. However how this fits into the regions is a bit of a mystery.
"Of course for all this to work it needs to call corresponding functions on individual regions, and within regions on layers of cells."
And then:
"And similarly within a region there are layer-activate, layer-learn, layer-depolarise functions."
Could you/we visualise this to make it more clear :)
I started exposing a JS public API here:
From the Readme "The main API is in the namespaces core, protocols and encoders"
But the link to the blog post supposed to explain them is a dead link :(
Please help out! Thanks :)
I mainly need some help using clj->js
and js->clj
for correct interop.
Hi, I'm just getting started with Clojure and NuPIC via clortex and comportex.
I can't seem to figure out how to get editor support for the .cljc
files. I'm using Lighttable.
Any suggestions? Thanks :)
Clojure 1.9 is stable and the current core.async version causes issues with Clojure 1.9 . This is fixed at clojure/core.async@2f87bc7 . Consider upgrading to core.async-0.3.442 or the latest core.async to support Clojure 1.9.
Thanks.
When beginning a sequence (or after a sequence reset/break), there is no distal input, so no basis for choosing a winner/learning cell in each column. Cells are then chosen at random.
That random selection is a problem because when the same sequence is presented several times (in isolation) they will begin on different cells; and will consequently not reinforce previous learning, but will have partial learning spread across several cells. This can be seen in repeated sequence demos, where the whole sequence is learned but it keeps bursting.
Proposal - I think it would be better to start on the same cell consistently. The first cell.
Perhaps more generally the choice of winner/learning cell (when there are no predictive cells in a column) should not be completely random but should be a deterministic function of the set of previously-active cells. And it should be a robust function, so that similar activity consistently selects the same cells.
Proposal - Select cell number as (mod depth)
of each distal input bit, and take the mode of that. Offset by the current column number (mod depth again), otherwise all cells would be synchronised and we lose combinatorial capacity (see #31).
Needs testing.
When a RegionNetwork's value is returned to the REPL, I'm immediately filled with regret :)
On the REPL wiki page, I dodged this by immediately def
ing it.
(def model-t1 (htm-step model))
But it would be nice if I could just do (htm-step model)
and get quick, comprehensible output. Some sort of summary. That'd make it easier to explore the API.
(I haven't investigated whether the toString approaches for Clojure and ClojureScript are compatible...)
Sorry didnt' find a way to communicate this, can we do htm.java? updates?
I had thought regions might be a useful construct if there was some special interaction between layers in a region, like shared activation/inhibition of mini-columns.
But they've turned out to just get in the way, like OO cruft. I think it would be better work with flexible (composable) layers, and implement any special interactions on top.
Also - make it easy to have lateral feedback between layers ("cortical columns").
Hi,
I tried to understand the use of focus-i. It seems to increase with the col
value. I thought it designates the focused column in the input topology around which we then will choose :ff-potential-radius
percent of nodes as potential synapse targets. But why can't line 126 then not just be all-ids (vec (p/neighbours-indices itopo col radius))
?
We know that synaptic connections are reinforced if the source cell fires just before target (LTP) and punished if the reverse occurs - target fires before source (LTD).
Currently only the LTP part is implemented, by selecting sources active on the previous time step to a target cell.
I am not sure how sequence learning should happen in a pooling layer. Because cells can remain active for many time steps during pooling, the current implementation does not work: it ends up with all active pooling cells reinforcing connections to each other, even though the connections are not predictive in any useful sense.
I propose, as an experiment, changing this to exclude any source cells that were also active on the same time step as target (i.e. current time step when learning). That would allow sequence transitions to be learned in a pooling layer only when the prior state cells turn off.
Once could imagine looking at the following time step for an LTD implementation but that involve deferring learning, which is doable but hopefully not necessary.
FYI, I think this regressed in a recent comportex commit. I verified that my viz pull requests didn't cause this before I submitted them :). This was working recently.
I'm seeing the error when trying this demo in comportexviz.
Tentative plan:
core.cljx
to /comportex/src/org/nfrac/comportex/repl.cljx
/comportex/dev/user.clj
. Make it autoload truncation.This opens the REPL code to other consumers (making user.clj
simply a consumer of the API that anyone else can use)
Currently distal synapses refer to a source cell in the same layer. Need to allow the source to be an input bit either from below (motor) or above (feedback) in the hierarchy.
So the source set will become a set of integers which can be mapped back to local cells and/or input bits.
I'm working on this now.
Proposed namespaces:
layer
- renamed from current cells
columns
should be moved to layer
homeostasis
(just algorithms, similar to inhibition
)hierarchy
(or architecture
/ network
) - renamed from current core
geometry
(or topography
) - renamed from current topology
api
(or core
)
We should have some better automated tests. But avoiding the unproductive mess of lots of naive example-based tests.
Thinking specifically of
Even Simulation-based testing would arguably be appropriate for HTMs, but not sure I want to invest too much time on that at the moment.
Any thoughts?
Encoders are the only things in Comportex HTM values that are not serializable. We should fix that so that models can be saved or sent over the wire.
Currently they are created using (reify)
which makes a closure. But the main problem is how they get their particular data inputs out of the general, amorphous input-value provided to htm-step
. That is using pre-transform
which applies some arbitrary function to do the extraction.
Shift the task of formatting the data for encoding outside to whatever is creating the input values. An input-value as provided to htm-step
would then be a structured value directly providing the data required by each input's encoder. Each input already has a keyword id specified in core/RegionNetwork
so these keys could specify the input data for each.
Example. Suppose there are 2 inputs (feeding into one or more regions), called :main-input
and :motor
. The former is a concatenation of a category and a coordinate encoding. The latter is a single linear number encoding.
The input-value might then look like:
{:motor 42
:main-input [:red {:coord [5.0 10.0], :radius 2.5}]
}
Each input encoder will be passed just its sub value.
The various encoder types can be made into Records or Types.
In bursting columns, the winner / learning cell is selected by maximum total excitation (including a penalty on cells with inactive segments, to encourage efficient use of all cells).
Currently ties are broken by keeping the first. The problem with this is, a set of blank columns bursting in
So after depth
contexts, the next one will be fully identical to context A.
Instead we need to break ties randomly, unlocking the combinatorial capacity for representation.
Red should be on outside.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.