Comments (5)
I know Jeff once described it as having the consistency of tapioca, but are there any papers which describe biologically what happens with interregional communication that could perhaps provide a hint?
Sent from my iPhone
On Jun 26, 2015, at 11:42 PM, Felix Andrews [email protected] wrote:
HTM model creation can be extremely slow. The time goes into creating the huge proximal synapse graphs containing all potential connections.
The problem of explicitly representing full potential synapse graphs is more acute in higher level layers because their input -- from cell layers -- is extremely sparse: column activation of 2% with depth 20 = 0.1% (except when bursting). With such sparsity, each column needs a lot of synapses in order to reach a reasonable stimulus threshold: to reach 5 active synapses, an average of 5000 random synapse connections are needed.
This is mitigated to some extent by the learning mechanism which can grow additional synapses directly to the active inputs (same mechanism as on distal dendrite segments), but we still need a reasonable degree of initial connectivity to activate columns in the first place.
First proposal
Lazy creation of the proximal synapse graph: synapses are only created upon the first activation of each source bit.
That would be equivalent to the current behaviour except that lazy synapses would not be decremented until they come into existence.
We could bias the new synapses towards neglected columns, achieving boosting and also partially adjusting for the above point.
Second proposalLazy creation would only happen while previously unseen input bits continued to appear. But random growth and death of synapses could also continue indefinitely (either eagerly or lazily), giving a boosting effect.
—
Reply to this email directly or view it on GitHub.
from comportex.
Boosting causes representations to be unstable, and to the extent they are unstable they are meaningless. I usually turn it off. I wonder if instead we could use the mechanism that we have for distal synapses (selecting winner cells in a column), but applied to proximal synapses (selecting columns):
- Set a stimulus threshold of, say, 10 proximal synapses, that will indicate clearly recognised patterns. The top 2% of columns become active if they matched up to the stimulus threshold.
- If no columns matched up to the stimulus threshold (or less than 2% did), choose random columns and have them grow new proximal synapses.
- Actually, first check for matches on disconnected synapses, and give those matches priority. That gives the stability necessary for tentative synapses to be reinforced.
- Column matches are by the number of active connected proximal synapses, but could also include predictive cell depolarisation (Fergal's "prediction assistance").
- In this scheme we don't need to initialise the HTM with a million proximal synapses, just start empty like we do with distal synapses. So fast start up. But running would be slower. Maybe a lot slower.
- There is a problem. Partial matches (below the stimulus threshold) are ignored. If we set a low stimulus threshold, previously matched columns would be adapted a lot, to anything remotely similar, losing discriminability. If we set a high stimulus threshold, each new stimulus gets a unique representation, but we fail to represent the similarity between them.
- One solution: select a fraction of the columns as partially-matching ones in preference to random ones.
- Actually this problem applies to cell selection in a column too!
- For local topographic connections, only grow within a radius. And consider that when selecting random columns.
from comportex.
Thinking of a tall hierarchy, it's interesting to think about how this would change things. Starting with an untrained model, the first region would start activating. Then the second. Then the third. And so on.
Currently, with random initial connections, the entire hierarchy might light up on the first input. Every region will do proximal/distal/apical learning right from the start, shaping a pile of random connections into something meaningful. With this new approach, it'd be more of a blank slate.
from comportex.
Thinking of a tall hierarchy, it's interesting to think about how this would change things. Starting with an untrained model, the first region would start activating. Then the second. Then the third. And so on.
Actually that's not obvious to me. I thought all layers should activate cells even if they don't have pre-existing proximal synapses - that is, even if those columns/cells are chosen randomly (and will then grow new synapses).
Maybe you mean, should we grow proximal synapses to bursting cells, or only to predicted cells? I'm leaning toward the former, given that first-level layers do not have predicted input (only sense input), but they still grow proximal synapses. The learning rate to predicted cells could be higher though.
On the other hand, it may not make much sense to learn a bursting signal since once the stimulus is learned/predicted in a lower layer it will have a different representation. But I think that could be OK if we have a low/slow learning rate. This paper (via Joseph Rocca) describes cortex as slowly learning to capture statistical properties of the world, in contrast with, and complementing, Hippocampus learning much faster: http://psych-www.colorado.edu/~oreilly/papers/OReillyRudy00_hippo.pdf
from comportex.
My experiments so far have shown that it is fatal to grow new proximal synapses directly to active sources. It results in column sets taking over -- masking -- multiple inputs particularly if there are subset / overlap relationships between inputs. I guess a solution would be to enforce a unique sub-sampling of "potential synapses" on each column; i.e. some sort of local topographic radius, even if the inputs are not meaningfully topographic: even if the inputs bits are in fact randomly shuffled.
Here's a completely different approach to the problem of boosting / decorrelating representations. Leabra's XCAL BCM rule is based on comparing the short term and long term average activations to apply a homeostatic stabilisation:
https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Learning/Leabra
the BCM contrast or normalization is all about the receiver long-term average activity y_l, with the sending activity serving as the "conditioning" variable -- you only update the weights if the sending unit is active, and conditioned on that, compare the current receiver activity relative to the long-term average.
from comportex.
Related Issues (20)
- Separate REPL code from HTM logic, stop forcing autoload, still make autoload convenient
- Readme out of date... HOT 1
- q_learning_1d demo showing "nth not supported" error in tgt->i arg destructuring HOT 1
- Just getting started - cljc files? HOT 2
- separated and serializable encoders HOT 13
- caching in encoders
- better hash / random number generator
- naming things HOT 1
- testing / validation strategy HOT 2
- learning functions more like STDP HOT 5
- random selection of learning cells from bursting columns
- Suggestion to try "substitution" or "context" pooling HOT 22
- select cells consistently, esp. when beginning sequences HOT 54
- allow algorithm alternatives HOT 2
- Rainbow order is incorrect in logo HOT 2
- Please export public API for javascript interop HOT 8
- Better docs please :) HOT 6
- composable layers only; get rid of regions
- Support for Clojure 1.9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from comportex.