Temporal unfolding of sequences

I’m currently thinking through how an inhibition signal could be used to solve this problem (need to go on a long walk to focus on it for a while I think). If cells naturally “want” to fire, and given a chance will do so randomly, and if the sequence I learn is not only the motor commands I want to activate, but also the motor commands I do not want to activate, then when I reach the end of what I have learned, both the prediction and inhibition signals will end, and something random will occur (leading to bursting minicolumns, and new information learned to model the results of my actions)


Are you guys (and anyone else) up for trying to achieve this goal as a project? It could be quite exciting to have a general-purpose unsupervised model that learns something like a zerba finch song within the framework of HTM.

If so we could probably get audio samples of some ‘dad’ zerba finch songs and use an encoder to feed into the model and go from there.


Sure, I seem to have accidentally split the discussion into two threads that have started bleeding into each other. Probably the more aptly named other thread would be the best place to consolidate the information for a project like that (with other threads focused on tangential details of specific sub-systems required along the way).

I wonder how we could do that without cluttering the forum?

1 Like

True. Maybe we could set up a slack channel for the more verbose interactive brainstormy stuff, and use the form to summarize things as they become more solidified and have discussions with the broader community along the way? This is something I’ve wondered in the past (you either get these uber long threads that nobody wants to read through, or you have lots of little threads that spread pieces of the larger conversations around and are difficult to organize.


Exactly. Well, I guess we should get slack on the go!

Who’ll set it up, and what should we call the channel?

1 Like

I created it. It is called dadssonghtm. Join here.


I’m not sure I have followed that.

If one feels it is too brittle overall to initiate the sequence, one’s not far from dropping the overall idea as too brittle also :x

I see all distally-accessible-info as available at similar level. And except when specifically apical, all potential info as distally accessible. That’s maybe why I don’t see the same problem here. If “standing still” is info as much as t-1 is, there’s no difference between start of the sequence and any other part of it.

Entering deeper into brainstorming realm, this triggers several things:

  • There is probably some truth to that somewhere. At least at the very interface with motor, where there seems to be lots of primarily inhibitory systems if I’m not mistaken.
  • Anyway. It is believable that we could envision the system as always doing ‘something’. “Not walking” should maybe not be seen as “zero”. “Not walking” could be “actively processing on any other plan than walking”. Even being idle could be a motor plan. Dunno. In which case is really lowers the “too brittle initiation” concern you raised above. If I understood correctly.
  • Waves from thalamus seem to provide clocked pushes to sensory-pathways. My bet would be that they are part of proximal signal, required in most cases to go beyond threshold ? Why wouldn’t they also push an incentive to motor also, in order to ensure it’s always doing something ?
1 Like

In a simpler TM example, let’s say you have learned the following sequence:


The I am playing it back:

A -> B -> C -> D

At this point nothing is in predictive state, because I haven’t yet learned a state after “D after C after B after A”. Since nothing is receiving distal input, this means that you cannot combine distal + apical input to activate anything, and thus the concept of a continuous unfolding stream of activity has failed… Something else needs to start the activity back up (such as a random activation, etc.)

It’s not that it is too brittle to initiate the sequence, but rather that it is too brittle rely solely on for unfolding a sequence (my experience with HTM tells me that you frequently find yourself in a context that has not previously been encountered). You cannot rely on there always being distally predicted cells for a “next element in the sequence” (especially in a new system that hasn’t learned much yet). I expect there is some redundancy or randomness injected into the process to handle unknown states (and state switching)

That does sound like a candidate for the source of “something additional injected into the system”. Still need to figure out what information that signal is transmitting, how it is learned, and its overall strategy for integrating it into the system. And is it just random when the current state is not recognized?

I have used this exact strategy in the past in my RL experiments (two mutually inhibiting motor actions, one for “press A button” and one for “don’t press A button”). I guess what I am proposing here is instead of having two motor actions, instead there are two possible synaptic connections to the single motor action – one which puts it into a predictive state (or activates it when combined distal + apical) and one which inhibits it.


I think these two are connected. It cannot be assumed that the system knows a sequence without going through forming necessary connections. What I mean is, the way a sequence is packed together holds the answer to how to unpack it. The agent I worked on never reliably encountered a sequence A->B->C->D without first learning to do C->D and then B->C->D (expanding back from the reward). Random actions would not cause predictable sequences for the HTM to capture. Naturally, the higher level did not start from a stable A->B->C->D. The system needs to learn a single transaction, do that reliably and expand it from there. This goes both for top down and lateral learning.

In this context, the first element in the sequence is whenever the higher level apical depolarization can influence the lower level. This can even be mid sequence from the perspective of the lower layer.

I believe it is better to brainstorm starting with an empty layer rather than known sequences as the folding shapes how the unfolding happens.

With the approach above in mind, I imagine the higher level does not capture discrete subsequences. First, it captures some transitions here and there. Then expands these to cover the larger portions of behavior. The actual transition of higher level would be driven by the patterns that the lower level encounters (assuming it is the highest level). I picture the higher level in my mind as a very elderly person whose memories of experiences are fuzzy. They come and go. The reason a memory comes in the first place is something she encounters that she could associate with. Then things make sense… At least for some time. Then it is a blur again.

That came out more vague than I hoped.


This sounds a lot like Go, No-Go circuitry of ganglia.


It looks to me like you reached the development stage where it becomes necessary to add a cerebellum to your model. This is what I have for information:


This of course implies a TP strategy which relies on RL during initial learning (folding). I hadn’t thought of TP in that way (I’ve always thought of it as taking only temporal information from the inputs and forming higher level abstractions from that alone). Do you see this as specific to pooling motor behaviors, or do you imagine something similar likely to be going on in say a pooling layer which learns object representations?

1 Like

As far as what starts a motor action - I see the need to look at this from a system point of view.
In the beginning the lizard brain explores the body through babbling in all the motor circuits.
As this process continues the cortex learns the most direct connections to the somatic cortex.
As time goes on maps between the motor and the sensory cortex starts to fill in with learned patterns.

Later, when we have explored and learned the world, we sense more of the world and this activation diffuses through the related maps.

Meanwhile - the frontal lobe has been learning body sensors and the motor programs that have resulted in rewarding activity.
This learning also spreads to related maps.

Fast forward to a critter that has explored the environment and learned the rituals that keep it alive.

Now we have the senses signaling what is around the critter. Likewise - we have the body sensors signalling internal needs. This might be thirst or hunger or sexual or shelter or boredom (need to explore some more).

As these activations spread from map to map they may meet somewhere in the middle.
One direction is feedforward and one is feedback. Where the two meet in the middle that map now has two activation signals and is strongly excited. This enhanced activation will ripple both up & down the maps, one direction to enhance attention and the other to add voting on the selected motor program and start it running.

The is the essence of the global workspace theory.


I certainly agree with that analysis, but the difficulty of course is in breaking down that macro-level goal of what we want the system to do into the repeatable circuits of a hierarchy. From my perspective the implementation doesn’t have to mirror the biology, but I keep finding that when I start working out the details for a system I’m working on, there are little gotchyas which nature has undoubted already solved :smile:

I definitely expect that temporal unfolding must leverage functions that exist outside of the cortex… just trying to work out some of the specific circuitry required to accomplish this behavior.


I kind of have a goal oriented approach to AGI; there is nothing to learn if some states/outcomes are not more favorable than the rest. If there is an opportunity for a decision, options need to differ in some sense to make the decision meaningful from the perspective of the intelligent system. Abstractions are the same for me because they are learned too. Without some sort of goal, surprise or reward they would not make sense or have any value. How would the agent or intelligent system know what to abstract and with respect to what without any of those? Of course we can come up with a folding mechanism without any goal orientation for the sake of stability and abstraction (would be very helpful no doubt) but I do not believe it would be the universal and modular packing that the cortex does. However, this is just a speculation on my part.

So yes I imagine a goal oriented object representation too. Actually, the pooling layers in my architecture (if works as intended) do not represent objects, they represent goals/subgoals. The system only represents objects with respect to goals (in my case negative or positive surprises) because there is an inevitable abstraction to objects too which should be valued with respect to something.




Is it safe to say that the brain stem including amygdala and other local circuits like cerebellum are good enough to get us walking then running like an athlete but at that stage of development we of course only have the mind of a salamander?

I had good results simply by changing the (wave starting) attractor locations in response to signal pulses that increase in frequency as a need increases. After it learns how to feed itself and is no longer hungry the virtual critter is free to do anything it wants, and will then spin in circles or do nothing at all for awhile. An occasional hunger signal might distract and over time lure it to play nearer to where the food is located but that alone is not enough to completely switch behaviors. There is no cortical multistep planning system that may require mapping out of moving machine parts for something we are building, but without a prefrontal cortex like we have that can be expected.

Although I like to use the word awareness instead of consciousness this is ironically very similar to what I have been modeling. Each “place” in the (also old and only 3 layer) hippocampal based map is being acted out by neural subunits that use waves to mimic the properties of what they are pretending to be.

It’s as though neural subpopulations have a mind of their own and enjoy doing what we (as a whole) end up wanting to do as a result like singing, dancing and theatrics.


This post is a bit long so it’s going in a spoiler. It’s mostly about framing the problem, trying to apply the object pooler to behavior, and timing.


This doesn’t really change things, but a lot of movements like walking are generated by central pattern generators, which let the brain generate common sequential or rhythmic movements with a constant signal, I think. Most mammals are born able to walk. Humans just come out a bit early.

Maybe the plan signal activates the first element in the lower level sequence. Maybe later and later steps have weaker predictive input which gradually increases as it gets closer to the time of its behavioral execution, and the prediction for the first element is strong enough to cause firing (see the rant below if you want to read some disorganized thoughts.)

I think of behavioral plans as branching sequences, like some sensory sequences in sequence memory, because of uncertainty. It would be handy to prepare for possible outcomes of behavior and take the best path down the planned tree of branching behavioral sequences. It might be useful to also require proximal input related to the sensory result of recent behavior for some steps but not all steps.

A rant about using the object pooling layer to unfold sequences

Predicting each action in the sequence more strongly if it is sooner in the sequence is useful. It depends on the details, but it allows flexible overall execution speed and

Maybe use the object pooling layer mechanisms to narrow down possible plans as results of ongoing behavior occur, rather than a completely stable plan representation.

This allows the feedback signal to predict the next action most strongly, planned possible actions two steps in the future second most strongly, and so on. Each possible sequence is a particular path on the branching sequence plan. Since there are more branches for possible actions further in the future, more possible sequences include actions planned for sooner in the future. This means more neurons provide feedback for the next action, less for the possible actions after that, and so on.

Even if there is just one planned action sequence without branches, this still works. In the object pooler, each new feature makes the representation more sparse even if only one possible object remains*. The same applies here. If it learns feedback connections by hebbian rules, the neurons which inactivate earlier do not synapse on the neurons for later behaviors. Therefore, neurons for later and later behaviors receive less synaptic input and are predicted less strongly. Also, neurons for earlier behaviors receive weaker synaptic input after their behavior executes because their synaptic inputs reduce in number as the plan representation sparsifies.

*That ignores the macrocolumn voting mechanism which narrows down the representation fully, I assume even with just one macrocolumn. Maybe that’s not how voting works. Besides that, this is just the object pooling layer with different input/output connections.

A fading inhibitory signal, such as generated by the previous action, causes the next action to execute first because the cells which trigger it reach threshold first, inhibiting the other cells once they fire. Adjusting the baseline depolarizations of all cells adjusts plan execution speed.

One question which may or may not be useful is how behavior/hierarchy relates to the newer ideas about perception/hierarchy, where the only difference between hierarchical levels is the size of the features/objects they handle and/or the size of their receptive fields, as I understand it. I don’t think that is directly applicable to behavior, at least not elegantly.

One way to perhaps reconcile the two views of hierarchy is, higher levels of the hierarchy know more about what options are available so they are usually in charge. The fact that the basal ganglia probably select between behavioral options suggests figuring out those options is important. The cortex could suggest every behavior to the basal ganglia, but that would be impractical. Because the body can’t phase through or see through objects, there are a lot of impossible behaviors at any moment, in a sense.

Since each level sees the world at a different scale, but movements operate near the scale of the whole body, different hierarchical levels need to communicate about which movements are possible. High levels need to ask the low levels if a movement is possible based on details of the object, and low levels need to ask high levels if it could execute a behavior without the sensor crashing into something on the way to the target.

Let’s say a low level region wanted to move the sensor from one low level feature to another, whether to generate behavior (press a button) or just as part of a sensory-oriented process akin to attention (e.g. to test if it is actually there and resolve the object’s ambiguous identity). That low level region might have a representation of the whole object, but that representation isn’t very useful for hopping the sensor from one point on the object to another point because it is represented in terms of a lot of small details, making it hard to infer possible movements. It also needs information from higher regions to do that well.

You could think of this in terms of the options each region has. The higher level’s planned next element is an impossible behavior until the lower level finishes (depending on how it defines a motor element), so it doesn’t try to execute the next element. While the lower level is executing the sequence, it has other options, like telling the lower level to end the sequence early and do something else.

Jeff Hawkins has talked about precise timing using thalamic signals to layer 1, and he wrote in a post when the forum was email that layer 5 might represent input onset. There was probably more to that, but I don’t remember.

Neuroscience stuff about precise timing

My guess is he was talking about layer 5 bursting. L5 cells burst* when they receive distal apical input alongside proximal input, but they can’t burst afterwards for a bit. The burst is mostly triggered by a calcium event in the distal apical dendrite, which lasts longer than most neuronal events besides metabotropic ones. This suggests a means of retaining representation of stimulus onset details after the burst.

*I think bursting is usually just an increased firing rate for the first few spikes, e.g. 40 hz, and the typical bursts shown in studies seem to not reflect normal neuronal activity.

The same layer 5 cells synapse on subcortical sensory/motor structures and the thalamus. Via the thalamus, they send a signal to layer 1, including in the same region. Since bursting requires apical input and since a single layer 5 cell cannot drive a thalamic cell to fire after the first spike but probably can with a burst after a spike or two, this is a mechanism for detecting timing and what I call sensory onset dynamics for lack of a better word. I’m not sure about some of the neuroscience I just mentioned, though.

The first input causes L5 cells to fire single spikes, which cause thalamic cells to fire with their first spikes. This sends a signal to some apical dendrites, causing them to respond to the initial sensory input as it progresses (like the sequence of sensory input when you poke a surface and your skin sequentially contacts the surface rapidly). Only cells responsive to the instantaneous sensory input tens of millseconds after sensory input started can burst because proximal input is required, and only those with apical input. There also aren’t many cells firing single spikes now because inhibition caused by sensory input comes pretty soon after the layer 5 sensory response compared to in other layers. As a result of the newly bursting cells, some thalamic cells fire, and cause new layer 5 cells to burst. The process continues. After the initial sensory response, the set of cells which bursted depends on the precise sequence and timing of the new sensory input. These cells have enhanced firing rates for the rest of the sensory input. This enhancement might continue for much longer because there are metabotropic responses evoked by higher order thalamus.

My guess is that precise timing for behavior works similarly since L5 cells are the motor output (and output to subcortical sensory structures, don’t forget).

For precise timing, my take away from what I’ve read/watched is that propagating signals of some sort are used.

For longer duration timing, different intervals might just be different numbers of repeated steps. That would require a way to track the place in the sequence besides the feedback signal, but that is probably necessary anyway for repeated actions.

Flexible control of execution speed is needed, not just timing.

If it’s just copying motor sequences generated by subcortical structures or central pattern generators, then I think sequence length would match the length of those behavioral sequences. Cortex-original behavioral sequences could be chunked like you describe for sensory sequences if the sequences are sensorimotor or maybe at least reactions to sensory events.

Main source:


Is it safe to say that the brain stem including amygdala and other local circuits like cerebellum are good enough to get us walking then running like an athlete but at that stage of development we of course only have the mind of a salamander?

It is safe to say that the addition of the limbic system to the lizard brain already gives us more built-in judgment and function than the basic lizard has.

But in general principle - yes

I imagine the RL system tuning behavior at every level of the hierarchy. A particular high-level concept for “walk forward” has a lot of varying elements to it, but RL at this higher level can say “Best action to do here is to walk forward”. In the next lower level, when “walk forward” breaks down to the lower level elements,
the RL system tunes this layer as well, saying “Lift the left leg is the best action to do here”. And so on, cascading down the hierarchy. Tuning the behavior at all levels finds the best course of action to achieve the global goal, and can quickly adapt to randomness.

I think timing can be abstracted a bit, rather than relying on cell bursting. At the end of the day, what is being encoded is something like “Element A for 0.2 seconds → Element B for 0.1 seconds → etc.”. This can be coded a bit more efficiently in a computer model than can be done in a biological system, I think (as long as the abstraction is functionally equivalent at the end of the day).

1 Like

This seems to directly address the question.