Hierarchy & Invariance


I may have another theory on why neurons have thousands of synapses - to classify many very different things as being the same thing.

A single cell represents a feature, eg one cell represents a nose, another cell represents a mouth, another eyes, etc. The union of these cells represent a face SDR (Cortical’s video is a great demo). The cells in this SDR can be combined as inputs to a cell higher in the hierarchy that represents the face feature. These cells learn to represent these features through feed-forward competitive self-organisation (unsupervised learning). However, this type of learning can only allow cells to represent similar inputs (dimensionality reduction). But how could a face cell represent a face when faces can be very dissimilar? What I mean by dissimilar is that a face can be turned to the side, tilted up, in bright light, in strobe light, far away, up close, etc. All of these different versions of the face need to activate the same ‘face’ cell.

A way to achieve this is by combining feed-forward (bottom-up) unsupervised learning with feed-back (top-down) supervised learning. The supervision is not coming from any external teacher, it is simply coming from the region above. If only most of the features of the face were apart of the input (eyes, mouth, hair) then the face cell will still activate. The face cell activation is passed back down to the lower region to the eyes, mouth, hair cells that were just active. It also passes activation to the other feature cells that were not active but usually are (nose, ears). Whatever feed-forward inputs are active during this feed-back get their synapses strengthened. It is essentially inferring that whatever feed-forward input occurred in placement of a nose - is a nose. If this happens enough then that input will become a feed-forward driver for a nose feature. This will allow for many different variations of noses to be learned by the nose cell - allowing for invariance. In other words, given a stable context (a face) the variations (nose) are assigned to a class (nose class/feature).

Imagine you have only ever seen one door in your life - it has a handle, window and a frame. When you approach it and you see only the frame and window you will expect to see a handle. However this time the handle looks completely different. It has been mysteriously replaced with some other object. However, given that object is within the context of a door you infer that it must be a handle. If the object were in any other context it could be something else entirely. This novel handle will become a new variant for the handle feature cell, contributing to its activity.

The “teacher” cell activates the feature cell allowing it to form and reinforce synaptic connections to novel input features. Classic Hebbian learning - “Neurons that fire together wire together”

I think this could be another reason why neurons have thousands of synapses - because the dendritic segments are learning hundreds of very different patterns so the cell can represent the same thing. In other words, they could learn up to hundreds of different variations of the same object. So all the very different variations of handles manage to activate the same handle cell. It is like a big OR operation.


This could explain how different lighting, positioning, scaling, etc. of the same feature can activate the same feature cell.

This kind of reminds me of back-propagation in ANNs except that the cortex is the teacher/supervisor, not another human. It is just mapping. Given a set of different inputs you want a single output (ie hundreds of different images of cats, to get a single output activation of a cat cell).

It is almost like learning similarities going up the hierarchy, and teaching differences going down.


In this model, you essentially have feedback connecting proximally to lower levels of a hierarchy (because this feedback must cause activation, not prediction). I’m not really versed in the biology, but my limited understanding from various posts on the forum indicate that this is apical feedback, which would not by itself cause activation.

I think I would modify this model just a bit (essentially the same idea, but a few more interim steps). I recall that Jeff mentioned once that both distal and apical input can combine to cause activations in certain layers. If that is the case, then you could assume the “nose” representation resides in a pooling layer (SMI “output” layer) that has long distance distal connections. The other features in parallel regions (eyes, mouth, etc) cause the "nose representation to be predicted from distal input. If enough of the parallel regions are activated representing features of a face, then the next level of the heirarchy will activate the “face” representation. This will then provide apical feedback to the lower level, and combining with the distal input, the “nose” representation will activate.


Taking this thought experiment a bit further, the new novel “nose” in this case would itself have caused other cells in the pooling layer to have activated, so the result would be activation of both the new and old “nose” representations, and both would connect to the “face” representation. These two combined representations would compete, and over time if the new “nose” is encountered often enough, a new representation would emerge which contains pieces of the original two representation. This would essentially allow the concept of a “nose” (or whatever other pooled feature) to evolve over time.


I’m feeling there are lots of collisions between our ideas currently.
Even if not "yes, but"ing at the exact same things you and me, it seems we end up with the same intuition.


Yes, I do not like the idea of random selection in the output layer. It doesn’t allow for generalization. Semantically similar objects, IMO, should have a percentage of overlap in their pooled representations, not either complete overlap or no overlap which you get with random/preset representations.


This fits in quite nicely! These lateral connections between features can also help support another model, which I could explain later.

It seems it is possible for apical dendrites to cause bursting (ideal for a teaching signal):

Top-down dendritic input increases the gain of layer 5 pyramidal neurons

Gain increases were accompanied by a change of firing mode from isolated spikes to bursting where the timing of bursts coded the presence of coincident somatic and dendritic inputs. We propose that this dendritic gain modulation and the timing of bursts may serve to associate top-down and bottom-up input on different time scales.

Pyramidal neurons: dendritic structure and synaptic integration

Amplification of backpropagating action potentials by dendritic EPSPs can lead to bursting

Action potential initiation in a two-compartment model of pyramidal neuron mediated by dendritic Ca2+ spike

Delivering current to two chambers simultaneously increases the level of neuronal excitability and decreases the threshold of input-output relation. Here the back-propagating APs facilitate the initiation of dendritic Ca2+ spike and evoke BAC firing.

Learning Rules for Spike Timing-Dependent Plasticity Depend on Dendritic Synapse Location

The propagation of APs into the apical dendrite of layer 5 pyramidal neurons is modulated by dendritic depolarization, which can lead to a phenomenon called BAC-firing, in which single APs paired with dendritic depolarization generate dendritic calcium spikes and subsequent AP burst firing. These findings suggest that dendritic depolarization may influence the induction of STDP.

I primarily approached this problem from a computational perspective. I’m not a neuroscientist so I can’t really back any of this up sufficiently. Regardless, this is an interesting problem to take on given a rough & incomplete framework of cortical circuitry.

David Schneider Interview

For my part, I’m not disallowing the view of having overlaps on such presets. Since the rationale behind those ‘presets’ is that they’d come from an already-formed abstraction, evoked by other means (which we don’t model, and over a very long learning time themselves which we can’t currently model anyway)… you could come up with as clever the semantical overlaps as you want. Eg in the ABC scheme you could also have mommy precising this is an ‘uppercase’ letter, with thus already a few overlapping bits for the SDR of ‘A’ vs ‘B’. And very large overlaps between possibly distinct ‘A’ and ‘a’.


Sorry I’m at work at the moment so I don’t have time to read that thread. I’m interested to know what your ideas are. Convergence can be a good thing.


The way I currently imagine semantics forming in an SMI output layer that can associate features across a physically wide network, is that first (chicken or the egg?) something is activated in the output layer from local activity in the input layer. This part of the activation is based entirely on the cells which are best connected to the input across multiple timesteps, with predicted cells having preference (these predictions could have originated from higher hierarchical levels).

Then additional activations in the output layer occur from predictions that are made both distally and apically (forming another level of context besides just the pooled feature itself). Varying percentage of differences between these two sources of activation will occur, and compete with each other. Over time, representations will emerge which encorporate pieces from both sources of activation. Thus the semantics of a feature are able to pick up associations with concepts that are very far away, which the local input itself never encounters.


(Sorry if any of the following is tangential to the current discussion. But that spawned some further thoughts…)

Barring your concerns of apical feedback inducing firing or not, we’re currently reasoning about that broad scheme:


where ‘L’ is an area processing at a quite low level on a given sensory pathway, and we’d wish to train it so that it evokes in ‘H’ a representation ‘c’ of our own choosing… so that we’re allowed to ‘bin’ the abstraction of all kinds of door handles together (seb’s example), or distinguish between letters when we don’t learn them as Mowgli (my example).

Now about your concerns, Paul, of having the SDR ‘c’ somewhat “too” fixed and/or unable to semantically overlap by itself, we could imagine, instead:

  /   \
 /     \
L       P

Here, ‘L’ still stand for the same thing. 'H ’ is still some higher area, but now his ‘integration’ role is part of the actual, dynamic simulation : we bring your dislike for the constant ‘c’ somewhere down, on a fixed ‘P’ area intended to represent the parallel pathway (audio in my example).

Now, prior to booting the model and trying to have L sort things out, H has already pooled over its input in P. So each fixed SDR in P is already processed to a somewhat fixed (but not constant, from now on !) SDR in H.
When we do turn learning online for L, H is part of the sim and ‘pools’ over both P (holding c) and L (experiencing the sensation we wish to wire to an understanding of c).

Could this work better ?


Yes I think we are imagining the same mechanism. I would only shy away from the idea of “turning learning on”. Learning should be continuous (IMO turning learning off is a hack to get around important pieces of a complete intelligence system missing from the model)


Sorry about that way of putting it.
Every simulated area here is in my mind ‘learning online’ while the sim is running.
So… i was speaking about starting to run the whole thing here (ie input layers of L experiencing some sensation/input, and both L and H ‘learning’…)
Still in opposition to maybe a previous run of H pooling from only P… (P is not a sim, it is abstracted away as arbitrary SDRs which happen to be other inputs for H).
Or we could even imagine prior pooling from both P and L, but with no input to L. (rationale : the H area was already there, and already processing stuff when we learned about the ‘A’ sound, even where there was no associated visual experience of the ‘A’ glyph)


I’m currently building a spatial pooler stack that acts as a hierarchy. I’ve revisited this thread as I believe the top-down teaching signal will be vital to practically deal with position/scale/rotation image variance. (I plan to use MNIST)

I remembered what you said here. Although it was advice to how this idea could be more ‘biologically fitting’, I feel there is more to it. The lateral connections could provide a simple mechanism for auto-association. In the context of HTM I’ve only really thought that lateral connections function for TM. If they had other functions they could perform auto-completion (or auto-associative memory). For a ‘face’, given an incomplete set of features ‘eye’, ‘mouth’, ‘ears’ are fed in. It is likely the higher-level layers will still pool the input as a ‘face’ but at the level of the ‘eye’, ‘mouth’, ‘ears’ the lateral distal connections will naturally recruit the missing parts (‘nose’,‘eye brows’,‘hair’,etc) over a number of recurrent excitations, much like a Hopfield network. These completed features can then participate in both bottom-up and top-down signalling, leading to further completion of other missing parts, or teaching of new features.

Maybe this is what you were hinting at. I’ve wondered for a while if/when HTM will include auto-associative memory. Its just not really occurred to me that on one of a column’s layers the distal connections could be used for TM while another layer could be used for lateral pattern completion. There is so much out there that prescribes the idea of lateral auto-association, and it makes good sense too.

Hopefully, with some luck, I can demonstrate this in code, other than just pondering about it in theory.

EDIT: I had just realized that distal connections are only meant to depolarize the cells, not hyperpolarize. However, there would be top-down biasing which would be enough for activation.


While it would be more computationally expensive, this could be done by incorporating the idea of cell grids as a way of combining local distal input with self-reinforcing grid inputs for auto-association capability.


I have yet to catch up fully on cell grids. How will they help self-reinforcement over more ‘traditional’ models?


The main use in this case would be to allow longer distance activations (versus predictions). It is just another tool that could be utilized. I’ll draw up a visualization for how it could be applied here.


If I am not wrong, the majority of top-down influence on cortical layers from layer 1 are inhibitory. I believe this is to filter out the contexts that should not be considered [1]. It narrows the search space. While encouraging a potential context can work (via depolarizing that specific activation), discouraging the wrong ones allows for more exploration. I just do not think we can get away without some sort of hyperpolarization, top down or lateral.


I’ve not heard that before - that there are inhibitory top-down signals. That makes sense to exclude activity, as it is the opposite of excitatory biasing. It seems that, depending on what you read, there are many types of top-down signals (inhibitory, plateau-potentials(to cause bursting), and biasing). Maybe that’s why there are so many feedback connections?


I suppose at some point you have to ask whether having the model completely biologically accurate is important for your particular use case. Are you testing a theory for how it is done in biology, or abstracting some things that don’t (for now at least) add value to your model.


I’m not going for biological plausibility, but taking as much ideas from biology as I can. My main goal is to model unsupervised feedforward learning using a stack of spatial poolers. I’m approaching this with practicality as a priority while keeping biology secondary.