HTM and Negative Reinforcement


You mentioned reinforcement learning. The amygdala is associated with the hippocampus. The Basic HTM theory is coincidence - the amygdala can form negative reinforcement. How does HTM theory accommodate “negative patterns?”


I don’t have any knowledge of neuroscience myself, but I am experimenting with this particular topic and can talk about how I am approaching the problem.

The approach I am using, is to use temporal memory to remember that a certain action in a certain context leads to a negative outcome. In my implementation I have separated the cells which represent actions into a second layer, distinct from the layer which does normal temporal memory on sensory input. And I have separated cells which represent reinforcement (positive or negative) into a third layer, distinct from the other two.

The second layer grows distal connections with cells in the first layer (versus cells in its own layer), and the third layer grows distal connections with cells in the second layer. In this way, the first layer represents the sensory context and predicts future sensory input, second layer represents motor commands in the current context and predicts next motor commands, and third layer represents the current reinforcement and can be used to predict the positivity/ negativity of an action in the current context.

This is certainly different than how it works in neuroscience. In particular, the only way to make “positivity/ negativity” predictions with the third layer is to manually activate columns representing motor commands in the second layer to predict how good or bad those motor commands might be in the current context. I’m sure neurscience will provide an elegant solution to this problem, but for now this is a relatively simple way to use pure high-order sequence memory to remember how good or bad something is.

If you are not interested in motor commands being part of the system (i.e. if you just want a purely predictive implementation), you could do away with the second layer, and grow distal connections from the third layer to the first layer instead. This would eliminate the most biologically infeasible element of the design, but it could no longer be used to execute actions. It could be quite useful in an anomaly detection system, though, in which the system can not only identify anomalous input, but also learn how “bad” the anomaly is and predict it in the future – for example, are we about to have minor latency, crippling latency, or is the server about to go down?


Hi Bitking,

I think HTM is concerned specifically about functionality of neocortex which does not have a mechanism by itself to “pick” an activation on its own. For that you need other brain structures, most importantly a model of basal ganglia.

The amygdala or basal ganglia (specifically striatum) cannot do any reinforcement without the input from hippocampus, neocortex or the cerebellum which all represent the possible set of actions in varying levels of abstraction. It may be better to think of HTM as the possible action pool generator for a given state in this context.

Still, this does not mean amygdala or ganglia cannot be modeled after multiple modified HTM layers as @Paul_Lamb described at an abstract level.


I agree that the executive function to “drive” the HTM will come from the limbic structures.

The division of labor is not as clear.

The model I have at this point is a really stupid boss (the limbic system) with a really smart advisor (the neocortex) presenting very simplified choices for the boss to pick from. The boss, in turn, initiates an action which is elaborated and executed in the Neocortex.

I take it as a matter of faith that there must be activity in the cortex to enter consciousness; we are only aware of the bosses decision after we have already begun to act on it.[1][2][3]

I suspect that trying to create behavior w/o this arrangement using “just” HTMs will be fruitless.

I am guessing that these executive functions (evaluating choices & initiating actions) are the exclusive realm of the limbic system and it needs positive and negative judgment to train whatever kind of computing structures are contained within. This implies a somewhat different organization than the basic HTM model - hence my question. I have read that the Hippocampus is somewhat like a 3 layer version of the cortex. I have not seen how the Amygdala is constructed but study of that is on my ToDo list; I think the naughty/nice mechanism is found there.

If we had the same sort of breakthrough in the understanding of the limbic system that HTM brings to the cortex the pairing could be very powerful. For example - this nice biological programming example [5] that described the Amygdala modifying the learning rate.[4] This powerful adaptation means the meaningful events (such as a battle or courting) are remembered more clearly. This suggests an obvious modification to the HTM theory to captures data that is flagged some way for rapid learning. Perhaps the only drawback it that overlearning is subject to saturation (PTSD) but learned more quickly. The “saturation and obliteration of competing memories” thing is the biggest problem in this human cortical-limbic interaction as is often noted in stressful situations such as pitched-battle or sexual assault.

The nature of my question is mostly can any of this evaluation of good/bad weighting-filtering be part of the HTM computation; can the Limbic system’s projected (sampled?) activation function be context for forming patterns in the cortical sheet?



Hopefully my comments are not too annoying given my ignorance of neuroscience, but from a naive perspective, I had a thought on how a boss and advisor system could work.

First, the boss wouldn’t need to understand anything about a particular representation, but could have its own temporal memory of outcomes. The advisor could provide a relatively dense representation of everything it could try (this representation would also need to encode contextual information).

The boss could then use its own temporal memory of outcomes to rank and output a sparser representation containing the cells which it remembers to have the best outcomes. The others cells could be inhibited. This output SDR could then be translated into specific actions.

Exploring Reinforcement Learning in HTM

I see the underlying algorithm of HTM as a general computation method rather than a specific solution for a specific task. So modeling other structures should be a lot more easier once neocortex is understood. Also, hippocampus can be thought as an early version of neocortex with a faster learning rate as Hawkins once said if I am not wrong. I have read on multiple places that neocortex evolved from hippocampus.

I am not sure why are you focused exactly on Amygdala but the major boss is Basal Ganglia. It modulates the thalamus and indirectly changes the activity of L5 through L1 apical dendrites [1].

To my knowledge there have been over 15 computational models of basal ganglia on the literature [2]. One of the most popular and older one that you see references everywhere utilizes a variant of what I would call neural networks [3].

So there is already a huge research going on trying to model the bosses. One major problem is most of them do not have the smart advisor modeled and they mostly try to mimic pathological cases. You break some parts and look for whether the results mimic the real biological anomalies in a functional way. Evaluating the boss without the smart advisor is hard.

I have spent the last couple of months extracting the functionality of the pathways of ganglia and try to mimic them by HTM layer variants. So far it looks promising. Though it is not rainbows and butterflies because one major missing part of HTM theory is it doesn’t take into account selective inhibitory connections. As a result you cannot really pull of reciprocal connections in a biologically plausible way. I believe that these connections pingpong information between structures within a loop and allow some invalid/missing representation to converge to a valid one as time passes. So these kind of things need to be functionally emulated with only excitation and it seems possible from my experiments so far.

The point is every step you take understanding neocortex is also a step understanding the bosses.



Sunguralikaan, I understand the attraction to pulling on the puzzle from the cortex end. I also have been thinking that the HTM model seems to be a greatly simplified model of the cortex and as such key pointers to function and interaction with the limbic system may be missing.

A prominent example of “missing” parts is the interplay between the Hippocampus and Cortex in memory consolidation in sleep/dreaming.

I have been working from the other direction - what the limbic system does could suggest what the cortical sheet is doing to interact with the older brain. The old brain worked fine for lizards; these older structures were good decision makers and pattern drivers. The older brain has always directed activity through much of the evolutionary path - I don’t see any reason why it ever would have stopped. It senses the body needs and after processing can project that as a sort of a goal-directed sensory projections to the front edges of the napkin. The “interface” between the sensory hierarchy and the motor selection seems to be about the level of the Amygdala.

What I have read agrees with your statement that the Hippocampus seems to be a precursor to the modern cortex. Perhaps a better understanding of the Hippocampus circuit could shed light onto the function(s) performed in the cortex?

The Amygdala is particularly interesting to me as it has a unique location in relation to the “highest” level maps in relation to the sensory hierarchical chain - AND - it is directly adjacent to the Hippocampus. The Hippocampus connections seem to be important in consolidating memory. Multiple pointers suggest that the Amygdala is important to the consolidation of memory.[1][2]

I hold it as a matter of faith that anything that can be experienced in consciousness must have a cortical representation. The cortical connections may make it possible for the Amygdala to participate in the consciousness experience. [3]



I definitely agree with your points. Maybe deciphering hippocampus would help but neuroscience research seems to focus more on six layer theory of the cortex. In addition, there are no direct connections from hippocampus to motor neurons as far as I know in human brain. It means that you cannot create an sensorimotor inference system with only functionally imitating the hippocampus as the advisor and ganglia as the boss. So there’s that.

I understand that from your perspective amygdala seems intimidating because of its heavy involvement in memory consolidation. The connectivity on the cortex is the memory itself in my understanding. So when I read memory consolidation, I imagine function of hipocampus (episodic memory, place cells, consolidation, experience replay etc.) and amygdala (novelty detection, emotional arousal, memory-emotion relations etc.) are just using what is on the cortex and trying to get it into a better shape using emotions, desires as the guidance. Therefore understanding the cortex first seems the obvious pick for me.

It all depends on what your end goal is. The goal for Numenta is to reverse engineer the cortex. For me it is a simple but plausible autonomous agent.


I want Max Headroom.

I may not have made the point clearly - I don’t think that the Hippocampus drives motor function - just that it’s simpler structure may be key to understanding the cortex algorithm. If you can describe the Hippocampus to the level that we can with the Cerebellum then we can compare and contrast to see what is different with the cortex.

I see the Hippocampus as mirroring the episodic memory of the highest levels of the sensory hierarchy with the capacity for about a days worth of experience. The learning rate in this structure is modified by projections from the rest of the limbic system, with special reinforcement from the Amygdala.

My current working theory is that the Hippocampus holds a type of “copy” of the of the Cortex it is mapped to. The content of learning is a delta between what it “knows” and what is projected from the cortex.
During sleep the experience that is coded as the difference between the related cortex sheet and the Hippocampus map drive a “flattening” process where the two structures are brought into alignment by stimulation of the cortex to grow synapses in the areas where the Hippocampus learned something. The delta is also used to flatten the local representation in the Hippocampus.

I see the Cortex as being modified primarily by a modulated Hebbian learning, where the Hippocampus is capable of good modulated one-shot learning with the Amygdala being a powerful modulating factor.

So in summary: The function of the Hippocampus in modern Cortex dominated brains is to capture a day’s worth of experience and play it back at night.

I think I understand what you are saying with the cortex being memory for the limbic system.
Hopefully a clarifying question: How did that work in evolutionary “older” brains w/o a cortex?
Occam’s razor suggests that the structures still do what they always did with modifications projected in from the cortex. This raises some interesting questions about what the Hippocampus used to do before the Cortex came on the scene.


I have a wild thought to put out there.

You may know that I am looking to see how I can get a “negative pattern” as several interesting algorithms work better with that.

We all agree that pyramid cells are exciting cells - not a negative bone in their body!

And I know that everybody has their own pet theory on what the various cortex layers do; let me add this one to the pile.

  1. In HTM theory a central tenant is that a layer one dendrite having a hit on its SDR/receptive field primes the related cell body cell to fire faster than cells w/o an active predictive state. Call this the “GO” layer.
  2. When this speedy cell fires it triggers the inhibitory cells in the fabric of the cortex, preventing the rest of the column from firing.(thus achieving temporal pattern matching) The GO layer then projects to both faraway maps or motor efferents and also splits back up to the surrounding same map that it lives in.
  3. Assume a pyramid cell STOP layer w/o a white matter projection (just active in this map) was sampling a layer other than layer one, such as the layer 4 internal granule layer. Call this the STOP layer receptive field. It could be layer 2/3.
  4. Further, assume that this STOP layer cell was trained to receive SDR/dendrite patterns in that inner STOP receptive layer and has the priming function to fire quickly.
  5. When that speedy cell fired it could trigger the surrounding quenching cells to inhibit the white matter projecting pyramid cells and prevent any cells in the column from firing. STOP!

The overall effect is that these STOP layer pattern sensing cells could provide an inhibitory function on GO layer one sensing cells. The inhibitory STOP and excitatory GO patterns and related receptive learning are superimposed in the same map space.

An unaddressed problem is how these different layers would be trained differently; my best stab at it is this:
6) Patient HM showed that the cortex is capable of straight Hebbian learning without the hippocampus portion of the limbic system. Assume that this true for STOP and GO layers although they may have different learning rates.
7) The straight Hebbian learning/consolidation rate could be modified by chemical signals from the limbic system.
8) This chemical learning-rate-modulator-messenger could be different for each cell layer. For example - the standard HTM model (GO layer) could be reinforced by an interesting episodic outcome - say during dreaming; the inhibitory STOP layer could be driven by a bad outcome - with the good/bad call coming from the limbic system in an “emotional state.”

I am looking at references like this one to sort out the menagerie of cells in the cortex:


Am not sure I understand the theory but there seems to be some confusion about “negative patterns”. Negative reinforcement does not mean inhibiting behaviors as far as I am aware. It is not something that avoids. Assume you always end up in a painful situation and then you discover some behavior that removes this negative outcome. Therefore learning reinforces this particular behavior that removes this negative outcome. So it is not a punishment or avoidance of some behavior leading to a decrease in that particular behavior. I hope I did not misunderstood.

On the other hand what you re talking seems like avoidance to me which also happens through inhibitory systems but I could not connect all the pieces.

Can you clarify this a bit.

Are you talking about a cell in a layer that is in a different region?

Additionally, there is a paper suggesting that L1 cells (yes L1 has cells) are inhibitory cells mostly, so that it reverses the activation in thalamus.


by faster - I mean the single matching predictive cell. If none the entire column goes into bursting.

Think of a “self-contained” layer that is using a different dendrite tangle than layer one - I picked layer 4 at random as it has the right kind of horizontal interconnections. The cell bodies for this layer are any other layer different from white-matter projecting cells. I’m not picky at this point.

No - the cells are in the same region - just at different layers within the same area in the cortical sheet.


Oh I see, so the first 2 points are a recap of HTM theory understanding.

I assume this speedy cell here is the cell on your point 2. White matter projecting cells are not same with these speedy cells? Or are you talking about all the white matter projecting cells that going to be ihibited? We assumed the pyramid cell layer does not have a white matter projection in point 3. Do you use the word projecting to layer X in the meaning of sending input to layer X or getting input from layer X? Maybe you can name these imaginary layers with names.

Maybe it’s me who is not getting it so sorry for that.


I think I get it now but what is the purpose? Are you just trying to come up with a possible inhibitory system that could override the modulatory activation in L1? Also for this to happen, there has to be a loop between these 2 layers.

Also why are we trying to inhibit the activation? For the justification of an avoidance system?


Punishment/Reward modification of learning. I have edited the post with some labels.
No loop required - the connection between the STOP and GO layers is the surrounding fabric of non-pyramidal inhibitory cell; either layer can trigger the quench. Only one (the GO layers) triggers a projecting output.


I think it would help for you to advance your ideas by reading on the direct and indirect pathways of ganglia rather than trying to invent one. Some of the papers literally use GO and NOGO to describe inhibitory and excitory commands on L5 through L1. What you are describing is already done by those pathways. Neocortex layers on their own cannot selectively excite or inhibit activations. They also do not have a direct way to get reward signals to do that from midbrain or brain stem where they are originated. Without reward signal connections, STOP layer cannot adapt it’s synapses to learn the activations that need to be inhibited.

Even if we imagine a layer was capable of this, that layer should only learn the “negative patterns”. Something has to modulate that what to learn and what not to learn. So that would implicate there is a connection to this layer from a boss directly or indirectly.

In general what you want is done by the Cortex->Ganglia->Thalamus->Cortex circuit. Decision making structures are decoupled from the model of the world (cortex) in some sense.


I was just reflecting on Paul Lambs posts and some of the problems I am trying to solve and how they are related to my questions and this popped up.

In terms of coding - I am working to get to the point where I can code the cortical-io retina entirely in HTM/SDR neurons.

Since I am coding up the HTM model from scratch I can modify any part of it at will and consider that a degree of design freedom; I have been doing that for years with standard neural network models and old habits are hard to break.

You do have to agree that different layers in the cortical sheet interacting with each other via inhibition using the surrounding cells and having separate memory patterns is a nifty hack!

I suppose I am as guilty as everyone else in trying to tart up a single cortex map instead of working through the entire hierarchy of maps and the Stupid Boss/Expert Advisor interaction even though that make more sense. It does make it longer to work out the answers I am looking for but they will be better answers in the end.

While I am looking at the functions that the cortex can form and the needs of the underlying algorithms inhibitory functions would simplify some to the functions. Inhibitory layers or functions are over in the “that could be nifty” with some algorithms but very weak biological support.

Using a dendrite pattern library ranges over to the “it just makes sense” category like the basic HTM/SDR model. It’s biologically plausible and algorithmically very useful for making SOMs.

BTW: I have literally hundreds of papers in the reading list and I am working through them as fast as I can.
The Hippocampus/Cortex connections and transfer/“consolidation of learning” is a very high priority in that reading list.