Temporal unfolding of sequences


This of course implies a TP strategy which relies on RL during initial learning (folding). I hadn’t thought of TP in that way (I’ve always thought of it as taking only temporal information from the inputs and forming higher level abstractions from that alone). Do you see this as specific to pooling motor behaviors, or do you imagine something similar likely to be going on in say a pooling layer which learns object representations?

1 Like


As far as what starts a motor action - I see the need to look at this from a system point of view.
In the beginning the lizard brain explores the body through babbling in all the motor circuits.
As this process continues the cortex learns the most direct connections to the somatic cortex.
As time goes on maps between the motor and the sensory cortex starts to fill in with learned patterns.

Later, when we have explored and learned the world, we sense more of the world and this activation diffuses through the related maps.

Meanwhile - the frontal lobe has been learning body sensors and the motor programs that have resulted in rewarding activity.
This learning also spreads to related maps.

Fast forward to a critter that has explored the environment and learned the rituals that keep it alive.

Now we have the senses signaling what is around the critter. Likewise - we have the body sensors signalling internal needs. This might be thirst or hunger or sexual or shelter or boredom (need to explore some more).

As these activations spread from map to map they may meet somewhere in the middle.
One direction is feedforward and one is feedback. Where the two meet in the middle that map now has two activation signals and is strongly excited. This enhanced activation will ripple both up & down the maps, one direction to enhance attention and the other to add voting on the selected motor program and start it running.

The is the essence of the global workspace theory.


What's the difference between layers and levels in the context of the HTM theory?
HTM & columns Working and biological consistence

I certainly agree with that analysis, but the difficulty of course is in breaking down that macro-level goal of what we want the system to do into the repeatable circuits of a hierarchy. From my perspective the implementation doesn’t have to mirror the biology, but I keep finding that when I start working out the details for a system I’m working on, there are little gotchyas which nature has undoubted already solved :smile:

I definitely expect that temporal unfolding must leverage functions that exist outside of the cortex… just trying to work out some of the specific circuitry required to accomplish this behavior.



I kind of have a goal oriented approach to AGI; there is nothing to learn if some states/outcomes are not more favorable than the rest. If there is an opportunity for a decision, options need to differ in some sense to make the decision meaningful from the perspective of the intelligent system. Abstractions are the same for me because they are learned too. Without some sort of goal, surprise or reward they would not make sense or have any value. How would the agent or intelligent system know what to abstract and with respect to what without any of those? Of course we can come up with a folding mechanism without any goal orientation for the sake of stability and abstraction (would be very helpful no doubt) but I do not believe it would be the universal and modular packing that the cortex does. However, this is just a speculation on my part.

So yes I imagine a goal oriented object representation too. Actually, the pooling layers in my architecture (if works as intended) do not represent objects, they represent goals/subgoals. The system only represents objects with respect to goals (in my case negative or positive surprises) because there is an inevitable abstraction to objects too which should be valued with respect to something.





Is it safe to say that the brain stem including amygdala and other local circuits like cerebellum are good enough to get us walking then running like an athlete but at that stage of development we of course only have the mind of a salamander?

I had good results simply by changing the (wave starting) attractor locations in response to signal pulses that increase in frequency as a need increases. After it learns how to feed itself and is no longer hungry the virtual critter is free to do anything it wants, and will then spin in circles or do nothing at all for awhile. An occasional hunger signal might distract and over time lure it to play nearer to where the food is located but that alone is not enough to completely switch behaviors. There is no cortical multistep planning system that may require mapping out of moving machine parts for something we are building, but without a prefrontal cortex like we have that can be expected.

Although I like to use the word awareness instead of consciousness this is ironically very similar to what I have been modeling. Each “place” in the (also old and only 3 layer) hippocampal based map is being acted out by neural subunits that use waves to mimic the properties of what they are pretending to be.

It’s as though neural subpopulations have a mind of their own and enjoy doing what we (as a whole) end up wanting to do as a result like singing, dancing and theatrics.



This post is a bit long so it’s going in a spoiler. It’s mostly about framing the problem, trying to apply the object pooler to behavior, and timing.


This doesn’t really change things, but a lot of movements like walking are generated by central pattern generators, which let the brain generate common sequential or rhythmic movements with a constant signal, I think. Most mammals are born able to walk. Humans just come out a bit early.

Maybe the plan signal activates the first element in the lower level sequence. Maybe later and later steps have weaker predictive input which gradually increases as it gets closer to the time of its behavioral execution, and the prediction for the first element is strong enough to cause firing (see the rant below if you want to read some disorganized thoughts.)

I think of behavioral plans as branching sequences, like some sensory sequences in sequence memory, because of uncertainty. It would be handy to prepare for possible outcomes of behavior and take the best path down the planned tree of branching behavioral sequences. It might be useful to also require proximal input related to the sensory result of recent behavior for some steps but not all steps.

A rant about using the object pooling layer to unfold sequences

Predicting each action in the sequence more strongly if it is sooner in the sequence is useful. It depends on the details, but it allows flexible overall execution speed and

Maybe use the object pooling layer mechanisms to narrow down possible plans as results of ongoing behavior occur, rather than a completely stable plan representation.

This allows the feedback signal to predict the next action most strongly, planned possible actions two steps in the future second most strongly, and so on. Each possible sequence is a particular path on the branching sequence plan. Since there are more branches for possible actions further in the future, more possible sequences include actions planned for sooner in the future. This means more neurons provide feedback for the next action, less for the possible actions after that, and so on.

Even if there is just one planned action sequence without branches, this still works. In the object pooler, each new feature makes the representation more sparse even if only one possible object remains*. The same applies here. If it learns feedback connections by hebbian rules, the neurons which inactivate earlier do not synapse on the neurons for later behaviors. Therefore, neurons for later and later behaviors receive less synaptic input and are predicted less strongly. Also, neurons for earlier behaviors receive weaker synaptic input after their behavior executes because their synaptic inputs reduce in number as the plan representation sparsifies.

*That ignores the macrocolumn voting mechanism which narrows down the representation fully, I assume even with just one macrocolumn. Maybe that’s not how voting works. Besides that, this is just the object pooling layer with different input/output connections.

A fading inhibitory signal, such as generated by the previous action, causes the next action to execute first because the cells which trigger it reach threshold first, inhibiting the other cells once they fire. Adjusting the baseline depolarizations of all cells adjusts plan execution speed.

One question which may or may not be useful is how behavior/hierarchy relates to the newer ideas about perception/hierarchy, where the only difference between hierarchical levels is the size of the features/objects they handle and/or the size of their receptive fields, as I understand it. I don’t think that is directly applicable to behavior, at least not elegantly.

One way to perhaps reconcile the two views of hierarchy is, higher levels of the hierarchy know more about what options are available so they are usually in charge. The fact that the basal ganglia probably select between behavioral options suggests figuring out those options is important. The cortex could suggest every behavior to the basal ganglia, but that would be impractical. Because the body can’t phase through or see through objects, there are a lot of impossible behaviors at any moment, in a sense.

Since each level sees the world at a different scale, but movements operate near the scale of the whole body, different hierarchical levels need to communicate about which movements are possible. High levels need to ask the low levels if a movement is possible based on details of the object, and low levels need to ask high levels if it could execute a behavior without the sensor crashing into something on the way to the target.

Let’s say a low level region wanted to move the sensor from one low level feature to another, whether to generate behavior (press a button) or just as part of a sensory-oriented process akin to attention (e.g. to test if it is actually there and resolve the object’s ambiguous identity). That low level region might have a representation of the whole object, but that representation isn’t very useful for hopping the sensor from one point on the object to another point because it is represented in terms of a lot of small details, making it hard to infer possible movements. It also needs information from higher regions to do that well.

You could think of this in terms of the options each region has. The higher level’s planned next element is an impossible behavior until the lower level finishes (depending on how it defines a motor element), so it doesn’t try to execute the next element. While the lower level is executing the sequence, it has other options, like telling the lower level to end the sequence early and do something else.

Jeff Hawkins has talked about precise timing using thalamic signals to layer 1, and he wrote in a post when the forum was email that layer 5 might represent input onset. There was probably more to that, but I don’t remember.

Neuroscience stuff about precise timing

My guess is he was talking about layer 5 bursting. L5 cells burst* when they receive distal apical input alongside proximal input, but they can’t burst afterwards for a bit. The burst is mostly triggered by a calcium event in the distal apical dendrite, which lasts longer than most neuronal events besides metabotropic ones. This suggests a means of retaining representation of stimulus onset details after the burst.

*I think bursting is usually just an increased firing rate for the first few spikes, e.g. 40 hz, and the typical bursts shown in studies seem to not reflect normal neuronal activity.

The same layer 5 cells synapse on subcortical sensory/motor structures and the thalamus. Via the thalamus, they send a signal to layer 1, including in the same region. Since bursting requires apical input and since a single layer 5 cell cannot drive a thalamic cell to fire after the first spike but probably can with a burst after a spike or two, this is a mechanism for detecting timing and what I call sensory onset dynamics for lack of a better word. I’m not sure about some of the neuroscience I just mentioned, though.

The first input causes L5 cells to fire single spikes, which cause thalamic cells to fire with their first spikes. This sends a signal to some apical dendrites, causing them to respond to the initial sensory input as it progresses (like the sequence of sensory input when you poke a surface and your skin sequentially contacts the surface rapidly). Only cells responsive to the instantaneous sensory input tens of millseconds after sensory input started can burst because proximal input is required, and only those with apical input. There also aren’t many cells firing single spikes now because inhibition caused by sensory input comes pretty soon after the layer 5 sensory response compared to in other layers. As a result of the newly bursting cells, some thalamic cells fire, and cause new layer 5 cells to burst. The process continues. After the initial sensory response, the set of cells which bursted depends on the precise sequence and timing of the new sensory input. These cells have enhanced firing rates for the rest of the sensory input. This enhancement might continue for much longer because there are metabotropic responses evoked by higher order thalamus.

My guess is that precise timing for behavior works similarly since L5 cells are the motor output (and output to subcortical sensory structures, don’t forget).

For precise timing, my take away from what I’ve read/watched is that propagating signals of some sort are used.

For longer duration timing, different intervals might just be different numbers of repeated steps. That would require a way to track the place in the sequence besides the feedback signal, but that is probably necessary anyway for repeated actions.

Flexible control of execution speed is needed, not just timing.

If it’s just copying motor sequences generated by subcortical structures or central pattern generators, then I think sequence length would match the length of those behavioral sequences. Cortex-original behavioral sequences could be chunked like you describe for sensory sequences if the sequences are sensorimotor or maybe at least reactions to sensory events.

Main source:



Is it safe to say that the brain stem including amygdala and other local circuits like cerebellum are good enough to get us walking then running like an athlete but at that stage of development we of course only have the mind of a salamander?

It is safe to say that the addition of the limbic system to the lizard brain already gives us more built-in judgment and function than the basic lizard has.

But in general principle - yes



I imagine the RL system tuning behavior at every level of the hierarchy. A particular high-level concept for “walk forward” has a lot of varying elements to it, but RL at this higher level can say “Best action to do here is to walk forward”. In the next lower level, when “walk forward” breaks down to the lower level elements,
the RL system tunes this layer as well, saying “Lift the left leg is the best action to do here”. And so on, cascading down the hierarchy. Tuning the behavior at all levels finds the best course of action to achieve the global goal, and can quickly adapt to randomness.

I think timing can be abstracted a bit, rather than relying on cell bursting. At the end of the day, what is being encoded is something like “Element A for 0.2 seconds -> Element B for 0.1 seconds -> etc.”. This can be coded a bit more efficiently in a computer model than can be done in a biological system, I think (as long as the abstraction is functionally equivalent at the end of the day).

1 Like


This seems to directly address the question.