Right way to get output from an HTM system

The image that comes to mind is driving a nail with a Rolex watch.
I am sure that you can do it but it is really the best tool?

There are many ways to get output from an HTM, but the simplest is to use a statistical classifier.

Both Nupic and htm.core contain a “classifier”.
The classifier takes SDR’s as input, and outputs a Label.
Inside of the classifier, it is a simple 1-layer NN that learns with backpropagation. There is a weight from every bit of the input SDR to every category of label.

See htm.core’s MNIST example: htm.core/mnist.py at master · htm-community/htm.core · GitHub

1 Like

Thank you for the replies and insights.

After I posted I also found this presentation where Subutai describes the CLA classifier: Getting Predictions out of HTM (CLA Classifier) - YouTube

An adjunct classifier module definitely accomplishes the goal I outlined in the original post. But it wasn’t the answer I was driving at. It feels a bit like “inserting probes into a neocortex” as opposed to creating a simplified whole brain.

Here is a different formulation of the question: What is the architecture of the simplest HTM-based system capable of taking an action?

That action could be as simple as setting a one-bit output based on what it recognizes, but, as far as I understand, the neocortex itself isn’t capable of movement / aka output at all. 1000 Brains likens the neocortex to a map, and therefore there are some missing old-brain components that interpret and act based on that map.

So is there an information-flow schematic for a hyper-simplified system that uses HTM in a complete biologically-inspired sense-process-act loop?

Also, Thanks @mraptor for the paper suggestions! I can definitely see how displacements, path integrations, and reference frames become critical for any reasonably complex set of actions.

However, if we stub out the system’s ability to move down to just a tiny handful of actions… Like “move left” and “move right” like Jeff suggested in the book, it seems like I can postpone the implementation grid cells and reference frames for a little while.

I’m currently groking “A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex” and will begin on the papers you recommended shortly.

Thanks again!

1 Like

Yeah i had a post and my current thinking is to not implement numenta-location layer, but instead hardcode/implement Cartesian-location layer that receives a Cmd/Action and emits SDR-based-Location /also no feedback Sense–>Loc/

This saves alot of time…

Then I have to invent a mechanism to combine Sense+Location ==> TPooling and use this SDR as a Label later for Classification

1 Like

TBT does not have RL yet.

My current idea is :
TM-like structure.
Dendrite Input : State
FF : ??
Prediction: use Action SDR (the predicted SDR == actual Acion SDR bits) and a REWARD to figure out which dendrites to boost from t-1

The update rule has to keep the last step Active neurons and dendrites /in a buffer/, so that you can do Q-value calc like you do in RL.

Sort of like Ensemble RL that predict active bits of SDR which should match Action-SDR

Then pass this Action to CC. Sense is State.

So what is left is how you represent and apply a GOAL.

1 Like

A relevant thread from 2017:

1 Like

Thanks for the responses.

I’ve been pondering reinforcement learning in the context of HTM and I’m not convinced RL is inextricably linked to motor control. Don’t get me wrong, RL is definitely a necessary feature of a complete and competent intelligent system with agency. But my conjecture is that reinforcement learning and motor control are “peers” in the information-flow diagram. i.e. one doesn’t require the other. (How RL makes any sense at all without action probably needs its own post because I’ll need to be very clear about terminology. But the essence is what Jeff wrote in 1000 Brains: “Thinking is a form of movement”)

But back to motor control, Some relevant things I read / heard:

“invariant representation in motor cortex is, in some ways, the mirror image of invariant representation in sensory cortex”
-Jeff Hawkins, On Intelligence, Pg 54

“We don’t move to a position, we move to where we feel a sensation.”
-Joshua Brown

“layer 5 activity represents both the state and the action…”
“…the state maps itself to the action that results in itself.”
@sunguralikaan 's Master’s Thesis, pg. 40 & 41

So back to my previous question about a minimal HTM system with output - I think I see part of the answer now:

  • There needs to be a looping connection, connecting some columns’ activity state back around, effectively muxing it together with some components of the input vector. (the components associated with the output / motor commands)

  • I think Temporal Memory / Pooling is needed, but I’m not sure. Something may be possible with just spatial pooling, but I just think it makes more sense to include temporal memory.

  • Even without reinforcement learning, the HTM system would still learn associations between input patterns and the states of output neurons.

But now to part B of my question in the original post: How do you get the HTM system not to rely on (or learn not to bother setting) the output, once you take away the training support? I think, for that, we would need some additional mechanism to be present.

In REM sleep, my brain blocks (most) motor impulses. But if my dreaming self were aware of that fact, the subject of every dream would be: “Oh no! I’m paralyzed!!!” So somehow we have some brain structures capable of modeling - if not our entire body kinematics - at least integrating our motor impulses and adjusting the perception of our body position / muscle extension. Of course this could simply be spatio-temporal HTM circuit itself, that learns the way my body moves when I’m awake and simulates it when I’m asleep. Meaning my brain would need to tell that specific circuit not to learn (that I’d been paralyzed) when I am asleep.

So the missing mechanism is the ability to selectively disable learning on a fine grained region-by-region if not column-by-column basis.

I’m not totally sure about that last part… Does this resonate with you guys or have I gone into the weeds?


update to my diagram:

                     Action         Reward
                      ^                V       
      -- State --> [  TM  -  State:Action  ]
                       Union of Actions 

Prediction happens by State + Lateral-tm-connections.
TM/S:A act like Temporal-Memory as Predictor/Selector/Filter for incoming union-of-actions.
The neuron permanence is substituted with Q-value and the update is using TD-algo.

The difference with RL is that we have context based SA i.e. multiple Sx:A pairs with the-same State-X’s … which differ by how you got to this Sx state.
Dont know the implication of that except it will learn slower, but will be more context specific i.e. multiple policies at the same time


Cortex probably has the full sensy-motor simulation outside of the old brain … so simple switch off will be enough

I don’t follow. So I agree that there is part of the cortex modeling the body and capable of integrating motor impulses into an updated sensory state.

Where I get lost is - wouldn’t that part of the cortex predict changes to the actual sensory inputs in response to the observed motor impulses? And wouldn’t it then start freaking out (bursting) when those predictions weren’t met? i.e. when the actual sensory input didn’t change in response to the movement impulses?

“But…” begins an imaginary voice in my head, “Couldn’t that cortex learn that the body behaves differently when I am dreaming? Learn two modes when predicting the body’s movement, based on the is_sleeping context?”

If that were true, however, wouldn’t the is_sleeping context bit also cause that cortical system to learn to predict no change in sensory input in response to movement? So now everything makes sense to that part of the cortex. But what then for another higher-level part of the cortex?

We’ve just kicked the problem up to another level. Now another piece of cortex encounters a paralyzed body that doesn’t move inline with predictions. Following this to the logical conclusion, dreaming, and perhaps also constructing hypotheticals, and other imaginative processes don’t work.

Turning off learning on a fine-grained level feels like it addresses this, but I really want to understand your thinking because there may be a simpler / better way.


Dreaming is the process where the EC/HC contents are pushed back onto the cortex to solidify long-term memory.

Indeed. But it seems to me that the mechanism by which that happens involves stimulating different cortical regions in a manner very similar to the way they’d be stimulated when awake. So how do some cortical regions learn what they are meant to learn, while others don’t learn that the body no longer responds to movement commands?

See spindle waves. It is not the same as awake and attending.

I’m aware of the existence of sleep spindles / sigma waves generally, but I don’t understand how differences in brain-wide activity patterns necessarily imply what’s going on within a small region of cortex.

Can you elaborate a bit more on which research you are referring to and how it fits into HTM / TB theory? Apologies if this is a big ask - I’m just trying to develop an internally consistent understanding of TBT that explains the phenomena that I observe my own brain doing, and this (motor control, but more generally the interfaces between macrocolumns) is an area I can’t quite reconcile.

Forget dreaming, because that’s taken the conversation into the weeds. Sensory gating generally is an important function for brains, both asleep and awake. So how does sensory gating fit into TBT? That’s my (sub)question in a nutshell. It seems like sensory gating would require a concomitant “learning gate”.

1 Like

You got there before I did.

Sorry - I was just responding to post #9 where you brought up REM sleep.

This is where the push-back from the EC/HC is happening.

The connections to the EC/HC are bidirectional and the learned events of the day are being played back to the cortex to consolidate the learning from recent experience.

I don’t pretend to explain how that fits with the sensor/motor theory, just that many experiments seem to indicate that this is what is happening.

To the best of my knowledge all mammals have a requirement to sleep and this seems to be necessary to form long term memories.

While you may say that this is “taking the discussion into the weeds” any successful theory will have to include and explain this consolidation of long term memories.

1 Like

Tangentially related to needing some equivalent of sleep, is the potential need to define and assign a reward after-the-fact… as I’ve been getting deeper into spiking neural networks and reading a lot of the work of Eugene M. Izhikevich (seems to be an expert in neurodynamic systems), found a paper of his that puts forward a decent attempt to show how the brain at least assign rewards for different actions which have taken place in the recent (past few seconds) past. Anything that makes it into a dream might first need to clear this reward threshold as well (thus my suggesting this is tangentially related to the topic of dreams and getting output from HTM).

With Hebbian learning rules (see: spike timing dependent plasticity), neurons only learn when they activate. So inhibiting a neuron will also prevent it from learning.

  • Note: I’m ignoring the HTM learning rule’s “predictedSegmentDecrement” because it is often zero, and always at least an order of magnitude smaller than the other parameters. *In general*: neurons only learn when they activate.

The brain uses Reinforcement Learning to control which motor neurons to inhibit, thus controlling the motor behavior. For more see:


Thanks @dmac for clarifying that for me. I had been assuming (incorrectly it seems) that dendritic segments learned whenever they succeeded in putting a neuron into a predictive state, regardless of whether that neuron’s prediction was validated or not.

Basically I was applying the same learning algorithm as the spatial pooler to the dendritic segment. So the proximal connections learn whenever a column activates, I was assuming the distal connections would learn whenever a segment caused neuron became predictive.

If a neuron’s dendritic connections don’t learn anything unless the neuron actually activates, I see how it solves that problem I mentioned earlier about “learning to ignore the input”. But it replaces it with a different problem - I don’t see how dendritic segments get “reined in” and don’t start over-predicting in incorrect contexts, given they only learn when their neuron activates and their neuron only activates when the column activates. (either with a correct prediction or a burst)

So what causes a penalty / down-regulation for a segment that continually predicts the neuron will activate in the next time step?

(I’m specifically speaking about the software neurons modeled in HTM)

Thank you for your help and patience.

1 Like