Git Product home page Git Product logo

Comments (5)

cogmission avatar cogmission commented on June 3, 2024

I know Jeff once described it as having the consistency of tapioca, but are there any papers which describe biologically what happens with interregional communication that could perhaps provide a hint?

Sent from my iPhone

On Jun 26, 2015, at 11:42 PM, Felix Andrews [email protected] wrote:

HTM model creation can be extremely slow. The time goes into creating the huge proximal synapse graphs containing all potential connections.

The problem of explicitly representing full potential synapse graphs is more acute in higher level layers because their input -- from cell layers -- is extremely sparse: column activation of 2% with depth 20 = 0.1% (except when bursting). With such sparsity, each column needs a lot of synapses in order to reach a reasonable stimulus threshold: to reach 5 active synapses, an average of 5000 random synapse connections are needed.

This is mitigated to some extent by the learning mechanism which can grow additional synapses directly to the active inputs (same mechanism as on distal dendrite segments), but we still need a reasonable degree of initial connectivity to activate columns in the first place.

First proposal

Lazy creation of the proximal synapse graph: synapses are only created upon the first activation of each source bit.
That would be equivalent to the current behaviour except that lazy synapses would not be decremented until they come into existence.
We could bias the new synapses towards neglected columns, achieving boosting and also partially adjusting for the above point.
Second proposal

Lazy creation would only happen while previously unseen input bits continued to appear. But random growth and death of synapses could also continue indefinitely (either eagerly or lazily), giving a boosting effect.


Reply to this email directly or view it on GitHub.

from comportex.

floybix avatar floybix commented on June 3, 2024

Boosting causes representations to be unstable, and to the extent they are unstable they are meaningless. I usually turn it off. I wonder if instead we could use the mechanism that we have for distal synapses (selecting winner cells in a column), but applied to proximal synapses (selecting columns):

  • Set a stimulus threshold of, say, 10 proximal synapses, that will indicate clearly recognised patterns. The top 2% of columns become active if they matched up to the stimulus threshold.
  • If no columns matched up to the stimulus threshold (or less than 2% did), choose random columns and have them grow new proximal synapses.
    • Actually, first check for matches on disconnected synapses, and give those matches priority. That gives the stability necessary for tentative synapses to be reinforced.
    • Column matches are by the number of active connected proximal synapses, but could also include predictive cell depolarisation (Fergal's "prediction assistance").
  • In this scheme we don't need to initialise the HTM with a million proximal synapses, just start empty like we do with distal synapses. So fast start up. But running would be slower. Maybe a lot slower.
  • There is a problem. Partial matches (below the stimulus threshold) are ignored. If we set a low stimulus threshold, previously matched columns would be adapted a lot, to anything remotely similar, losing discriminability. If we set a high stimulus threshold, each new stimulus gets a unique representation, but we fail to represent the similarity between them.
    • One solution: select a fraction of the columns as partially-matching ones in preference to random ones.
    • Actually this problem applies to cell selection in a column too!
  • For local topographic connections, only grow within a radius. And consider that when selecting random columns.

from comportex.

mrcslws avatar mrcslws commented on June 3, 2024

Thinking of a tall hierarchy, it's interesting to think about how this would change things. Starting with an untrained model, the first region would start activating. Then the second. Then the third. And so on.

Currently, with random initial connections, the entire hierarchy might light up on the first input. Every region will do proximal/distal/apical learning right from the start, shaping a pile of random connections into something meaningful. With this new approach, it'd be more of a blank slate.

from comportex.

floybix avatar floybix commented on June 3, 2024

Thinking of a tall hierarchy, it's interesting to think about how this would change things. Starting with an untrained model, the first region would start activating. Then the second. Then the third. And so on.

Actually that's not obvious to me. I thought all layers should activate cells even if they don't have pre-existing proximal synapses - that is, even if those columns/cells are chosen randomly (and will then grow new synapses).

Maybe you mean, should we grow proximal synapses to bursting cells, or only to predicted cells? I'm leaning toward the former, given that first-level layers do not have predicted input (only sense input), but they still grow proximal synapses. The learning rate to predicted cells could be higher though.

On the other hand, it may not make much sense to learn a bursting signal since once the stimulus is learned/predicted in a lower layer it will have a different representation. But I think that could be OK if we have a low/slow learning rate. This paper (via Joseph Rocca) describes cortex as slowly learning to capture statistical properties of the world, in contrast with, and complementing, Hippocampus learning much faster: http://psych-www.colorado.edu/~oreilly/papers/OReillyRudy00_hippo.pdf

from comportex.

floybix avatar floybix commented on June 3, 2024

My experiments so far have shown that it is fatal to grow new proximal synapses directly to active sources. It results in column sets taking over -- masking -- multiple inputs particularly if there are subset / overlap relationships between inputs. I guess a solution would be to enforce a unique sub-sampling of "potential synapses" on each column; i.e. some sort of local topographic radius, even if the inputs are not meaningfully topographic: even if the inputs bits are in fact randomly shuffled.

Here's a completely different approach to the problem of boosting / decorrelating representations. Leabra's XCAL BCM rule is based on comparing the short term and long term average activations to apply a homeostatic stabilisation:
https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Learning/Leabra

the BCM contrast or normalization is all about the receiver long-term average activity y_l, with the sending activity serving as the "conditioning" variable -- you only update the weights if the sending unit is active, and conditioned on that, compare the current receiver activity relative to the long-term average.

from comportex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.