How to incorporate goals in HTM: discussion

Hoping to start a discussion on this (IMO) interesting topic:

What I know of HTM all focuses around learning to recognize, represent and predict patterns. But these same systems that process incoming data also send output signals to the motor neurons.

At a high level, these commands have to incorporate a goal of some kind, even something as simple as ‘move towards food’ or ‘move away from danger’.

But how do you make the system learn to produce action outputs in response to its sensory inputs that are in accordance with those goals? It seems to me that you need some kind of supervised learning mechanism that modulates the HTM algorithm. How does the brain accomplish this?

Spitballing here… from my tiny, anecdotal knowledge of neuroscience, the brain uses neurotransmitters to help interpret different kinds of inputs as ‘good’, ‘bad’ etc and these somehow affect the resultant learning after an experience. (For example, our brains are hardwired to enjoy sugar, however the actual ‘taste’ qualia are encoded)

Similarly, perhaps we could designate certain HTM inputs as desirable or undesirable a priori and use reinforcement learning methods. E.G. determine the sign of the permanence updates in the motor control region (layers 5 and 6?) based on the desirability of the outcome.


In this post I suggest an organization that shows the relationship between the cortex and subcortical structures:

In this related post I suggest how the subcortical structures provide the goals you are asking about:

Please let me know if this is what you are looking for or if you have further questions.


The Hopfield network is a very interesting idea for how goals are defined and acted upon. If I understand correctly, (correct me if I’m misinterpreting) you’re suggesting that most basic survival-type activities (I.E. eating, jerking hand away from hot stove, etc) are governed by these lower ‘old-brain’ systems which have different structures than the cortex. But I recall reading that humans have a greater than normal share of motor control given to the cortex, so there must still be some learning regarding the execution of actions happening there. Which brings us back to my original question–how is the growth and decay of synapses in the cortex affected by goals–and are these goals defined in the lower brain, or do they sometimes arise in the cortex itself, or both–and how can we translate that into the HTM algorithm?


I think this post should answer your questions on how the subcortical structures train up the cortex:

And this little bit …


This right here, this is what I’m curious about. Specifically, learned the rituals that keep it alive. That learning right there. I understand that the system uses feedback between the motor outputs and the sensory inputs to learn about its body and discover action sequences. But how does it learn which action sequences are “good”? What is the mechanism the brain uses to tell itself “this is an action sequence that you should repeat because X” and “this one should not be repeated because Y”, thereby reinforcing the neurons that contribute to X and suppressing those that induce Y?


i think emotions (mind-body) play the role here.

  • you touch something hot -> hurts you
  • you are starving, find something to eat, feel good.

These are driven by the old, “reptilian” brain, and we use it to survive/not to harm ourselves. HTM then connects the dots and learns the actions and their +/- consequences.

And if you ask how was that mechanism learned, I think that’s already (neuro)evolution.

This is a very interesting topic, my very first thesis failed (on lack of this support), because I wanted an “(HTM) agent that learns to operate in an environment”. Basically agent-driven HTM that learns explore env and learns to perform good actions (eat) and not bad actions (step in fire).


another interesting point of view is, imho, from the engineering / case-driven perspective (I don’t want to use “goal” here :stuck_out_tongue: )

reinforcement learning (RL) is very popular in many applications recently, and a whole interesting area is RL for sequences. I have done some applied research in that area and the used algorithms, Soft Actor-Critic (SAC) models are not very good.
What would be the ways to incorporate supervised-learning and goal-driven learning to HTM? Emotions is one way…


Exploratory behavior to improve prediction?

Assume the “world of places and possibilities” is divided into “known territory” and “the big frightening void”

Agent should be able to recognize its current state as being “known”, “unknown” and “in between”.

  • “known” is boring - try to move towards unknown,
  • “unknown” is scary - find a path back into known (roll back your past steps? use a compass?)
  • “in between” is interesting (rewarding) keep exploring

Generic goal is to transform interesting into boring territory by exploring new paths and to stitch newly unknown states into known territory.

Sort of explanations:
Current state is within “known territory” if agent can predict consequences of its own various actions.
That means e,g. “if I make 3 steps to the left I know what that future state will look like”

A path is simply a repeated sequence of states followed by actions

Different states should be somehow differentiable.

A boundary state is one in which only a limited set of (action, consequence) tuples are predictable


I totally agree and do think that the hippocampus is the structure that combines episodic memory and the subcortical good/bad judgement/flavor to be pushed back onto the cortex. It is necessary to combine the hardwired learning bestowed on the subcortical structures by evolution with the flexible and adaptable cortex.

This does about the same thing as reinforcement learning but is distributed over a much larger system than the usual simplistic local RL methods.


One approach is to have goals as part of the feedback going to every level. A paper by ‘goodAI’ tries this out (see ). They don’t use HTM specifically, but they talk about spatial poolers and temporal poolers. Each level clusters the inputs coming in to it, then passes a vector representing the winning cluster to a temporal pooler in the same layer, which recognizes Markov sequences. Each level gets feedback from a higher level, which includes goals.

One problem I have with HTM theory and this theory is that the brain seems to have a specific place for location data and for goals. Location data is in the parietal lobe, and goals are in the dopamine reward circuit. So saying that every cortical column has location data, or has goal inputs, seems to contradict that. But the article is interesting, and probably could be combined with HTM theory.

1 Like

There are other areas of the brain (underneath the cortex) which perform reinforcement learning. See basal ganglia and globus pallidus. The RL organs can control your actions indirectly by: inhibiting or uninhibiting the motor areas. The RL organs filter your actions by turning areas of your motor cortex on/off.

For example: My motor cortex is an expert at the task of pushing buttons. Every time I see a fire alarm, my motor cortex tells my hand to push the big red button. However a different part of my brain recognizes that setting off the alarm (in the absence of a fire) would be a bad thing and turns off my hands whenever they try to set off the fire alarm.

how do you get the goals at the intermediate levels? (I’ll read the paper but would be nice to have the idea distilled here for discussion)

nice example. to incorporate with the above, is it actually a different part? I’d see this as long-term planning: 1st layer: “Push the button, YOLO!”…5th layer “we need to pay the bills” (could get us kicked off work).

So this would indeed be modelling env+actions (consequences) and evaluating the state’s value → feedback.
Now, how is the evaluation done? Could there be a region where state+actions are correlated with “feelings”=hormone levels=personal wellbeing?
Ie pushing the red button - joy+1; being homeles - joy-20;

I’m going to try messing around with an overly simplified model to get a handle on how this might work.
Maybe build a model with a pooler or TM that reads in a simple ‘location’ and ‘badness level’, and an interpreter that decides if the pooler wants to go ‘left’ or ‘right’, then provides feedback on whether that was a good choice. See if the pooler can figure out what to avoid doing.

@Andrew_Stephan you are probably thinking about BDI.

There is a couple of python based examples:

For Java based HTM the option for the AgentSpeak layer would be ASTRA language. In fact I would suspect, a HTM based perception-> belief layer alone for that java based approach to BDI, might be a clever way to reach out to a wider audience.

Note, when I have on another thread asked about how agent based approaches might be weaved into HTM, as a means to tackle distributed HTM. Many responses jump straight to Erlang (or for some reason FPGA) rather than entertain python based agency such as Thespian and despite the maturity of the Python based HTM and Thespian frameworks. No matter, for nutters that want to code up HTM+Agents in Erlang, there was a hack of BDI based agency in Erlang as well, though not mature. I say nutters because I have played a little with eJason myself :sunglasses: :peanuts:

If starting off on a the incorporation of goal driven AI using HTM, would a hybrid be a start? My guess would be the Belief and Intention layers might come from HTM and the Desire level would be coded in an AgentSpeak like language (in Java with ASTRA or in Python with PROFETA or SPADE BDI) until the means to allow for spontaneously defined goals was discovered. As a staring point at least.

Note, you start with goals and may then incorporate emotions or planning. So, much research in and around BDI based AI that might provide a leverage for a hybrid approach to support research ahead of a purely cortical model.

1 Like

In answer to your question about goals, I think (not sure) the paper is saying that at the most abstract level there is a reward signal.
The paper then says this: “As the parents [my note: higher levels] are not connected directly to the actuators [my note, actuator could be a robot arm, for instance], they have to express their desired high-level (abstract) actions as goals to their children which then incorporate these goals into their own goals and propagate them lower. Experts on the lowest levels of the hierarchy are connected directly to actuators and can influence the environment”
From a level to the level below it, a vector is passed. This vector has “the expected value of any rewards that the architecture will receive if in the next step, the input falls into a particular cluster (interpreted as goals).”. So if an upper level has 5 clusters, I think the vector would have 5 numbers, an expected reward for each cluster, and it passes that vector down to the level below it.
Its an interesting way of modifying the sequences that each layer follows, based on reward. There’s other info that gets passed down as well, which helps the lower level ‘know’ what sequence it is in, and what transition to make.

Yes it is. I what I described is called the NO-GO pathway and is not part of the cortex:

@Bitking pain and hunger aside, cognitive science seems to partition goals into action verbs of attainment, maintenance, cessation and avoidance (also leveraged in the goal oriented requirements engineering world). Noting also goal can also include attain wealth, cease smoking, maintain a healthy relationship, avoid public speaking, there is still a strata of “goals” from the raw visceral up to the abstract.

1 Like

I would point out that those are macro goals.
I think that the interaction with the subcortical structures is at the micro level.
The subcortical structures interact at the level of directing ongoing alpha activity much the same way that the sensory streams drive the sensory hierarchy.

1 Like

I think it is Reinforcement Learning via the Basal-ganglia.
Exactly how ? I don’t know.

then on top of that we have to have Planner ?