Temporal unfolding of sequences

Starting a new thread to avoid clutter on the David Schneider Interview discussions.

I haven’t actually read On Intelligence yet myself, but I have some thoughts on the idea of temporal unfolding. As the system is performing motor actions, online learning will be happening through the hierarchical mechanisms which we have been talking about extensively on other threads. Those are rolling up through the hierarchy and forming higher-level concepts (“walking”, “jumping”, etc.)

But the system isn’t just passively observing and learning motor sequences (“chicken or the egg” situation – motor sequences can’t be learned without also generating them). This must mean that another signal is injected into the system to trigger the unfolding of specific actions (something in addition to the FF, distal, and apical connections that are being used for learning). This is the “plan of action” I mentioned in another thread (cell grid mechanisms seem to be a likely candidate for this signal).

When a higher level concept (“walk forward”) activates from this signal, this would generate an apical signal to the next lower layer, predicting all of the lower-level elements in the sequence (“lift left leg”, “extend left leg forward”, “set left leg down”, etc). The first element in the lower-level sequence would need to be activated (I still haven’t worked out a good mechanism for how, but set that problem aside for a moment). This activation would generate a distal signal to the next element in the sequence (which is already predicted apically). The combination of apical and distal would trigger the next activation, which would trigger the next action, and so-on, unfolding the sequence.

An RL signal would also be involved at each level to tune the actions along the way and adjust to randomness. This process would cascade down the hierarchy until the lowest-level motor commands are activated in sequence, moving the body.

There are obviously still some major fuzzy areas left to resolve. Some of the obvious questions: What causes the first element in an unfolding sequence to activate? What ensures the lower level sequence for the current higher-level element has completed before the higher-level moves to the next element? How is timing accomplished (perhaps this answers the previous question)?


This is a tricky one. My first instinct is that the last element in the previous sequence on the same level predicts a distal union for all the possible next sequences. With top-down only one of the first elements of any next sequence will be activated, leading to the progression of that sequence.

I suppose that if there are three sequences on level 1: ABC, DEF, X,Y,Z and there are 2 representations on level 2 N & K that have top-down biasing for level 1. N will bias for ABCDEF, while K will bias for ABCXYZ. If neither N or K are active then C in ABC will depolarize D and X as they are the first elements that tend to follow after C. However, if N was active then DEF will have a bias allowing for D to have a top-down and lateral excitation.

This may not be the best approach but something to consider.

Possibly the same sort of mechanism as explained above? I guess to associated beginning and end points of sequences using unions will mean bursting at either end. That might be tricky.

No idea, haha.

I agree these questions have to be answered before moving any further. It needs trial and error. Kinda why we need a open model to play around with.


I’d advise you to do this asap. I did that the other way around, so I don’t know if some of it would sound outdated to you now. Yet, I may have interrogations reading current HTM, but reading Jeff’s book was a throughout love experience.

There is the idea, possibly developed in ‘On Intelligence’, btw (can’t recall precisely where), that the more familiar you’re with a given topic, the closer to the lower levels the whole thing gets encoded (something that rings back to parts of their SMI hypothesis, imho). So… the decomposition thing must not be too fixed. Walking would certainly involve very high/frontal areas while painstakingly learning it… cuz we have to more consciously activate “left leg” program… then “right leg”… and “wow… take care of that”… But arguably, once you’ve mastered it, your higher levels are more concerned with truly higher abstractions like ‘walk to this’, while the lower levels have themselves integrated that it decomposes down to “left leg/right leg/balance like this”.

I’m quite found of the idea of an ‘index in sequence’. Maybe acting like their proposed ‘position on object’


Actually. Scrap this.
You’re never operating in an empty aether. Raising that right leg to begin initiating the thing has to be done in the current context. Current context is always full of information : whether you’re currently standing still, etc. To unroll the rest of the sequence you’re still sampling from relevant such proprioception signals.

Here, “standing still” needs to be as much ‘t-1’ info as ‘previous motor’ in current TM-like versions. And then you can initiate the sequence perfectly.

I don’t know if this can be linked back easily to ‘expectations’ listening to a song. Clearly you have a clue what the first note should be and expecting it to come… but you don’t know precisely when. To know when, you need to have started the sequence with a ref point. (counting “two, three, four” can be such a clue).

Another fundamental question: what might be the number of elements per subsequence?

Initial thought: As we discussed here we kind of agreed that the number of elements in a sequence at any level of the hierarchy is arbitrary but shaped statistically by the noisy inputs. That seems reasonable enough for sensory input, but I can’t see this working for motor output. Unless, of course, the exploration motor outputs are noisy during learning (which we are going for anyway). Hopefully with noisy motor output combined with reinforcement feedback the representations will self-organize from smaller to larger parts (going up the hierarchy) over time because the motor outputs will only get small parts correct at any one moment. These small correct fragments will combine together naturally and the size of the sequences may settle into arbitrary but reasonable sizes.


To me it is becoming more and more striking that there must be something of a ‘confidence’ on a higher layer, that it holds on a pertinent representation of what’s happening (below and overall), for lower levels to know that they need to learn “in the context of that higher-layer-held representation”.

Then, your subsequences can be arbitrarily long indeed… as long as the higher representation has good reasons to think that you need to learn them as components of itself.

(Note that I find this view consistent with the possibility that those “split-where?” concerns are also somewhat plastic through learning, and that knowledge can be compressed down to lower levels themselves after a while)

1 Like

Yeah that’s a point. My fuzzy thinking at the moment indicates lateral connections between regions. I’ll try to pull these thoughts out in some kind of sensible description using the song-bird example:

The dad’s song is held in memory as a distributed representation in the auditory hierarchy. If dad was to sing his song then there will be a very high level of overlap in the auditory hierarchy representation of dad song (obviously). At the top there will be a representation very actively responding to the whole song (stability). When dad leaves our bird starts producing random motor output. Most of it sounds like blabbering but some small snippet are similar to dad’s song. Statistically only the smallest features will get overlap as there is much more of a probability of overlapping with small features than with large features. In order to actually get an overlap score the dad song representation has to be constantly active at the top of the auditory hierarchy in order to bias all the cells down the hierarchy. So when there is top-down bias AND bottom-up stimulus then it has found an overlap. The cells in that overlap send outputs to some other region that then deals with the ‘teaching’ signals in the motor hierarchy. Or maybe there is no middle region… possibly it does directly to the motor hierarchy.

Let’s imagine the baby bird gets two notes correct, there are two elements in the sequence that have overlap with dad song. The cells in these elements will need to project to the cells in the motor hierarchy that were just currently active, directly or indirectly. If the bird then tends to repeat what it has gotten correct with variations (noise) in the parts it did not get correct, then eventually it will fill in the gaps/elements.

So what I’m trying to express is that there might only need to be lateral connections between the corresponding levels of the auditory and motor cortex. When one level of the hierarchy learns the sequences correctly it will move up to the next level in correspondence with the (features/sequences) in the auditory cortex. In other words, the motor cortex will match the level of organisation in the auditory cortex.

I hope that makes sense…


Yes, I have thought of this as well. However the problem is you eventually reach the end of what you have learned so far, which breaks the chain of unfolding. Relying solely on distal + apical input seems too fragile. It seems to me that there must be something additional injected into system which triggers a specific element of the sequence to begin the unfolding action. Also note that when I say “first” I don’t necessarily mean “the first note of Beethoven’s 5th”, but rather the first element to be activated following a state where nothing in context was activated and thus nothing in context distally predicted.


Perhaps the start/end motor elements are more tied to sensory input/feedback than to the top-down biasing.

For walking: you unfold a sequence to move your leg out. When you get sensory information that the foot has hit the floor correctly you then move onto the next motor sequence, etc. Full sensory-motor loop.


I’m currently thinking through how an inhibition signal could be used to solve this problem (need to go on a long walk to focus on it for a while I think). If cells naturally “want” to fire, and given a chance will do so randomly, and if the sequence I learn is not only the motor commands I want to activate, but also the motor commands I do not want to activate, then when I reach the end of what I have learned, both the prediction and inhibition signals will end, and something random will occur (leading to bursting minicolumns, and new information learned to model the results of my actions)


Are you guys (and anyone else) up for trying to achieve this goal as a project? It could be quite exciting to have a general-purpose unsupervised model that learns something like a zerba finch song within the framework of HTM.

If so we could probably get audio samples of some ‘dad’ zerba finch songs and use an encoder to feed into the model and go from there.


Sure, I seem to have accidentally split the discussion into two threads that have started bleeding into each other. Probably the more aptly named other thread would be the best place to consolidate the information for a project like that (with other threads focused on tangential details of specific sub-systems required along the way).

I wonder how we could do that without cluttering the forum?

1 Like

True. Maybe we could set up a slack channel for the more verbose interactive brainstormy stuff, and use the form to summarize things as they become more solidified and have discussions with the broader community along the way? This is something I’ve wondered in the past (you either get these uber long threads that nobody wants to read through, or you have lots of little threads that spread pieces of the larger conversations around and are difficult to organize.


Exactly. Well, I guess we should get slack on the go!

Who’ll set it up, and what should we call the channel?

1 Like

I created it. It is called dadssonghtm. Join here.


I’m not sure I have followed that.

If one feels it is too brittle overall to initiate the sequence, one’s not far from dropping the overall idea as too brittle also :x

I see all distally-accessible-info as available at similar level. And except when specifically apical, all potential info as distally accessible. That’s maybe why I don’t see the same problem here. If “standing still” is info as much as t-1 is, there’s no difference between start of the sequence and any other part of it.

Entering deeper into brainstorming realm, this triggers several things:

  • There is probably some truth to that somewhere. At least at the very interface with motor, where there seems to be lots of primarily inhibitory systems if I’m not mistaken.
  • Anyway. It is believable that we could envision the system as always doing ‘something’. “Not walking” should maybe not be seen as “zero”. “Not walking” could be “actively processing on any other plan than walking”. Even being idle could be a motor plan. Dunno. In which case is really lowers the “too brittle initiation” concern you raised above. If I understood correctly.
  • Waves from thalamus seem to provide clocked pushes to sensory-pathways. My bet would be that they are part of proximal signal, required in most cases to go beyond threshold ? Why wouldn’t they also push an incentive to motor also, in order to ensure it’s always doing something ?
1 Like

In a simpler TM example, let’s say you have learned the following sequence:


The I am playing it back:

A -> B -> C -> D

At this point nothing is in predictive state, because I haven’t yet learned a state after “D after C after B after A”. Since nothing is receiving distal input, this means that you cannot combine distal + apical input to activate anything, and thus the concept of a continuous unfolding stream of activity has failed… Something else needs to start the activity back up (such as a random activation, etc.)

It’s not that it is too brittle to initiate the sequence, but rather that it is too brittle rely solely on for unfolding a sequence (my experience with HTM tells me that you frequently find yourself in a context that has not previously been encountered). You cannot rely on there always being distally predicted cells for a “next element in the sequence” (especially in a new system that hasn’t learned much yet). I expect there is some redundancy or randomness injected into the process to handle unknown states (and state switching)

That does sound like a candidate for the source of “something additional injected into the system”. Still need to figure out what information that signal is transmitting, how it is learned, and its overall strategy for integrating it into the system. And is it just random when the current state is not recognized?

I have used this exact strategy in the past in my RL experiments (two mutually inhibiting motor actions, one for “press A button” and one for “don’t press A button”). I guess what I am proposing here is instead of having two motor actions, instead there are two possible synaptic connections to the single motor action – one which puts it into a predictive state (or activates it when combined distal + apical) and one which inhibits it.


I think these two are connected. It cannot be assumed that the system knows a sequence without going through forming necessary connections. What I mean is, the way a sequence is packed together holds the answer to how to unpack it. The agent I worked on never reliably encountered a sequence A->B->C->D without first learning to do C->D and then B->C->D (expanding back from the reward). Random actions would not cause predictable sequences for the HTM to capture. Naturally, the higher level did not start from a stable A->B->C->D. The system needs to learn a single transaction, do that reliably and expand it from there. This goes both for top down and lateral learning.

In this context, the first element in the sequence is whenever the higher level apical depolarization can influence the lower level. This can even be mid sequence from the perspective of the lower layer.

I believe it is better to brainstorm starting with an empty layer rather than known sequences as the folding shapes how the unfolding happens.

With the approach above in mind, I imagine the higher level does not capture discrete subsequences. First, it captures some transitions here and there. Then expands these to cover the larger portions of behavior. The actual transition of higher level would be driven by the patterns that the lower level encounters (assuming it is the highest level). I picture the higher level in my mind as a very elderly person whose memories of experiences are fuzzy. They come and go. The reason a memory comes in the first place is something she encounters that she could associate with. Then things make sense… At least for some time. Then it is a blur again.

That came out more vague than I hoped.


This sounds a lot like Go, No-Go circuitry of ganglia.


It looks to me like you reached the development stage where it becomes necessary to add a cerebellum to your model. This is what I have for information: