Hierarchy & Invariance


Cool, that’s how I typically approach my own experiments as well. It can sometimes rub folks the wrong way when I describe connections that are completely wrong from a biological perspective, so figured I would clarify :slight_smile:


I’ve always thought of modeling this inhibitory action using a “winner takes all” strategy. So top-down influence would bias certain possiblities, confirmed by the local context to cause activation. These activations would compete with the FF activations in a winner-takes-all competition, and the losers are inhibited.

This should be roughly equivalent to top-down influence negatively biasing non-possibilities, but computationally less expensive.


I’ve stumbled across an interesting paper that has similarities to the ideas being explored here.

They have produced a model that is inspired by the visual cortex and learns invariance from internal error signals.

A few snippets that grabbed my attention:

The LVis model builds upon this feedforward processing foundation, and learns a very similar hierarchical solution to the object recognition problem. In our tests on 100-way object classification with reasonable levels of variability in location, rotation, size, and lighting, LVis performs in the same general range as these established feedforward models. Interestingly, it does so using a single unified, biologically based learning mechanism that leverages bidirectional recurrent processing between layers, to enable signals from other modalities and brain areas to shape visual object recognition during learning in important ways, supporting a form of error-driven learning

Recent evidence indeed suggests that neurons in IT cortex reflect significant higher-level “semantic” influences, in addition to the expected stimulus-driven similarities among objects
… we show how recurrent processing provides a mechanism via which this higher-level semantic information can be integrated with visual information during object processing, providing a mapping between perceptual and conceptual representations
… top-down signals should shape lower-level representations. For example, Kriegeskorte et al. (2008) showed that visual representations in inferotemporal (IT) cortex reflect semantic influences, for example, a distinction between living and non-living items

From this I read that IT has an ‘understanding’ of the input stimulus in context over the overall scene. From this higher-level context it can influence down-stream activity to ‘align’ to the semantics.

We hypothesized that these non-classical organizational properties of IT cortex are due to constraints imposed by recurrent connectivity with other neural systems over the course of learning. Simply put, recurrent connectivity allows error-driven learning signals about object properties to be circulated between neural systems, causing the similarity structure of non-visual systems to be reflected in visual areas. Semantic relationships between object categories have been suggested to be maintained by the anterior temporal pole (Patterson et al., 2007), which sends descending feedback to high-level ventral areas, and is thus a candidate structure responsible for the semantic organization observed in IT responses.

Have we found our teacher?

Thus, recurrent processing allows the visual properties of objects and non-visual semantic properties to be concurrently represented in the same neural substrate by simultaneously satisfying multiple bottom-up and top-down constraints during learning.
… the shaping of IT representations according to semantic structure enables the model to bidirectionally map between purely visual and purely semantic similarity spaces. Importantly, semantic similarity spaces have been shown to be distinctively non-visual (Kriegeskorte et al., 2008) and might very well contradict them. Thus, the relative position of IT cortex in the ventral visual hierarchy uniquely allows it to represent a balance of visual and non-visual properties and serve as an important translation point between these knowledge domains.

So IT is the middle man between the non-semantic (similarity-driven) visual stimulus and semantic (context-driven) domains?

This dual mapping between semantic and visual information enables the network to understand the semantic implications of visual features, properly generalizing semantic information based on bottom-up visual features of novel object categories

This is similar to my OP but different in that I propose the high-level representation is stable due to feedforward alone (without any context from other regions). Both working together would be even better.

these dynamics contribute in a meaningful way to the brain’s robustness to visual degradations like partial occlusion by reinforcing probable “hypotheses” about the underlying stimulus through rapid recurrent processing. For example, an image of an occluded fish will weakly activate neural populations that are tuned to fish features (e.g., the dorsal fin, the tail, etc.) as well as neural populations that are tuned to other visually similar, but irrelevant, features (Wyatte et al., 2012b). Our model suggests that the brain could resolve this ambiguity via excitatory top-down connections by amplifying and filling-in neurons that are tuned to additional features that are consistent with the bottom-up inputs, but may not have been present in the actual stimulus. Competitive influences are equally important, which serve to suppress spurious activations that do not constitute valid category representations.

Hmm, this sounds very much like what we were discussing earlier.

Our results indicate that recurrent processing indeed modifies perceptual representations by allowing non-visual information from nearby associated brain areas to be incorporated into learning signals.

The more context, the better. If you hear, smell and feel a fish, and it kind of looks like a fish, then it is probably a fish. The lower-level features that were not apart of the fish feature will now be learned to support the fish feature in the future.


I feel both excitation and inhibition are useful together. For example if you were to look at the image below:


If I were to say “that drawing is of a continent” then that would be excitatory top-down biasing. Of all the possible squiggly drawings it could be, the possibility space has shrunk to continents. (So far this is what you’re talking about). However, if I then said “the drawing is NOT of Australia” then that would be top-down inhibition. So the possibilities of the squiggly drawing = all ‘continents’ minus ‘Australia’ (I’m sure there’s a nice Bayesian expression for that). So this is sort of like a union of positive overlap and negative overlap. Given that only a few other continents have similar bottom-up visual features to this it narrows down the search. However, if I were to give another top-down excitation (cue/clue) to you, then it might click immediately (if you haven’t already seen which continent it is). My top-down cue is ‘one of the hottest continents in the world’ and … ‘elephants’… and ‘lions’.

That’s just an intuition though. I’d like to hear @sunguralikaan’s thoughts.


Personally, as soon as you said it was not Australia, I would immediately see Australia :grin:

But I get your point. I can actively push one concept that I have already formed out of my head allowing me to explore other possibilities.

Another thing that comes to mind is that an extensive inhibitory network would also be useful for one of the things David Schneider was talking with @rhyolight about, which is inhibiting irrelevant sensory inputs that I am generating myself, like the sound of my own footsteps.


Italy maybe ? ^^

I smiled at this good-heartedly. Then to the reflection, there is probably more truth than fun to it.

I’m interested by the idea (and evidence in L1) that inhibition is important to feedback. Yet if this turns out so important and ‘core’, I wouldn’t bet that it also has to solve such a high level deduction process. Reasoning by elimination, I mean.


Maybe we could add the concept of “exterminator” to Calvin’s evolution theory – an agent which goes out and actively lays waste to one species making room for others to emerge :rofl:





Yeah I probably should have described that example in less of a rush. Hopefully the points got through. Everyone thinking Africa though right? :stuck_out_tongue:


Nope. You inhibited my neurons to seek for countries only :grin:


I think the example you gave is spot on and is inline with what @Paul_Lamb said:

If you interchange ‘irrelevant’ with ‘out of context’ it would naturally imply hyperpolarization in Temporal Memory terms. The main question here is whether depolarization + global inhibition is functionally equal to hyperpolarization as @Paul_Lamb says.

Hyperpolarization may be computationally less expensive depending on the case. Let’s say the layer is at state A which can be followed by B, C, D, E or F in the next time step. If you somehow knew that the next state cannot be F because of the higher level context, you would want to consider only B, C, D, E. Hyperpolarization can target F selectively just like depolarization.

If you wanted to achieve the same thing with depolarization + global inhibition, you would have to depolarize all of B, C, D and E to make sure the winner is among them after the inhibition. So you would have to make all of them easier to fire relative to A. In this case, it looks like depolarization + global inhibition would require more synapses and computation to narrow the search space compared to hyperpolarization.

On the other hand, I do not have an example to showcase any functional difference between the two at the moment. That said, if the layer was running on local inhibition, achieving the same functionality with depolarization may be more inconsistent or complex. Although I am not sure about this, needs more thinking.



(I’m actually going to go back and edit that…)


Newbie question…
Speaking of hyperpolarization… isn’t it also hypothesized to favorize subsequent bursting in some case ?
If so, the mechanism would not only prevent some signals but once again, prepare some to fire strongly in the near future.


Timing is another thing that comes to mind. An input knows it is coming up, but needs to hold off for a bit.


Are you talking about if neurons are constantly hyperpolarized their threshold is essentially lowered? The opposite of fatigue. … I forgotten the actual term to describe that.

EDIT: I still can’t think of the term, but this is the general idea behind boosting in HTM poolers. Every cell is actively trying to participate in representing a feature. The homeostasis type of idea. If a cell/column is sufficiently inhibited then it is more ‘primed’ to activate to stimulus. Whereas cells/columns that are sufficiently excited have a greater level of fatigue.


Well the electrochemichal matters are not my strong point if I ever had one… but it could be something like that.


Thinking about how to implement active inhibition (not the preparation for future firing bit), I suppose you could do this without having to model a whole separate network of inhibition cells. Instead you could add concepts for “inhibited cells”, “previously inhibited cells”, etc. and have two types of synapses (one for activation and one for inhibition). I think I’ll play around with this idea and see what impact it has on the TM algorithm.


You mean… directly within the TM, or do you have a hierarchy for modelling the idea of a L1 ?


Initially I am thinking just within traditional TM to get a feel for how to work with it and how it affects the behavior of the system (before trying to add it to a more complicated implementation of hierarchy). It is essentially adding a NOT gate into the mix, which could have interesting effects.


Not a newbie question for me anyway.
If you mean bursting like two or three spikes in rapid succession (rather than bursting meaning the whole column activating), bursting can cause a long lasting hyperpolarization in the distal apical dendrite (which must be above threshold for the cell to burst), so hyperpolarization might help it burst again. I read on wikipedia today that slow hyperpolarizations (specifically slow AHPs) last a few seconds, hopefully referring to burst-triggered apical hyperpolarizations, which is interesting.

If bursting is for precise timing/sensory onset (followed by a plateau potential which contributes to subsequent firing), maybe hyperpolarization is like a reset for stuff starting.

Maybe bursting represents the first feature, then as more and more features are sensed, inhibitory input to the apical dendrite ends a feedback loop between voltage and apical calcium influx through voltage gated calcium channels sustained by metabotropic receptors, allowing calcium-activating potassium channels to end the feedback loop and also make their resulting slow afterhyperpolarization noticeable at the soma (since the feedback loop ended). This prevents the cell from firing until attention shifts elsewhere (even briefly when it stops attending the object) and everything is inhibited, resetting both plateau potentials (known parts of the object) and slow afterhyperpolarizations (known impossible parts of the object).

I don’t know if hyperpolarization by inhibitory input would remove the longer lasting burst-triggered hyperpolarization and end the calcium spike (a.k.a. plateau potential since I’m using those terms loosely), but it seems likely to me that it works that way.