The coding of longer sequences in HTM SDRs

I’m not sure the full pattern I want to capture could be described as a ring. For a word, perhaps yes. For a word you want the pattern that the same path repeats many times. So a ring? But for the more interesting extension to phrases it may be a “ring” in the sense activity would be a recursion, but the structure will be defined by shared external connections rather than a repeated internal connection. So more of a diamond, perhaps, if you think of the network being “pinched” at each end of a “diamond” of alternative paths, sharing the same beginning and end point.

I don’t know, is that a ring attractor too?

It should cluster in the sense of multiple internal paths with constrained entries and exits. But the multiple internal paths of interest at the phrase level, should not be repetitions of the same sequence. The thing about these clusters will be that they consist of many different sequences, only grouped because they share the same beginning and end points. (Though that could transition to repetitions of the same sequence. Which could be a mechanism for lexicalization, the way words in any language start as new phrases, but with multiple repetition retain only habitual meaning, and eventually become words in their own right, like the French “au-jour-de-hui”. But in the beginning, the essence of syntax is not repeated sequences. Novelty is what distinguishes syntax from lexicon.)

I gave some examples of these “diamonds” my other thread:

I’m with you up until the point where you mention the system has been trained to recognize a specific feature pattern.

I’m trying to get away from the whole “training” way of thinking about the AI problem. For justification of this see the entire “Chaos/reservoir computing…” thread above. I think what we have been missing in AI is that meaningful patterns vary, potentially chaotically, and so can’t be learned.

That’s OK. We can still have meaningful patterns. It is just that now they must be generated. And that’s OK too. Because you can generate meaningful patterns, if you have a meaningful way of relating elements to do so.

And I think natural language is telling us that meaningful way of relating elements is by grouping them according to shared context, which equates to shared cause and effect.

Hence the “diamond shaped” clusters I describe above.

So these “diamond shaped” clusters (actually “small world” networks?) will be a bit different. I’m sure the system you describe would work. But it would work for repeated structure. By looking for sequential structure with shared beginning and end points, I think the solution I’m suggesting can both capture this constant, chaotic, change aspect which has been missing, and actually be easier to implement than the repeated structure algorithm you’re describing.

I also think the “shared beginning and end point” structure will be the one which is being captured by transformers. The failure to capture this structure, I would suggest, is the main difference between transformers and HTM. And the reason transformers now dominate. But with the twist, that transformers also try to capture this structure by training. So they are also assuming intelligence is a kind of repeated structure. And their form of training, by gradient descent, is just the type that HTM has always rejected. With reason. HTM is right about that. It has been right to reject “learning” by gradient descent.

So HTM actually has a slight advantage here. Transformers are stuck with gradient descent. Trapped by success! Nobody is going to stop doing something that is working. They are stuck with it because for them it is inextricably entangled with the extremely effective shared cause and effect structure which gradient descent has accidentally been finding for them! It’s an accident for them. They don’t know what they’re learning. They only have their learning methods. Just the perspective on the intelligence problem that HTM rejected. That the learning methods are (partially) successful, traps them into thinking “learning” itself is key. It’s not. HTM was on to that early. Intelligence is not “learning”. Or at least HTM rejected that particular, back-prop, form of learning which it was clear was not happening in the brain. (HTM has become trapped by its own “learning” paradigms… But at least not back-prop!) Having rejected gradient descent makes HTM more open to capture the same structure transformers are, but by the more flexible method I’m suggesting.

For a contrast between the gradient descent method for finding this “shared beginning and end point” structure, and the network resonance method I’m suggesting, a good summary might be either the head post of the “Chaos” thread:

Or this post contrasting the “algorithm” with that of LLMs “The “algorithm” is to find prediction energy… minima. In that sense it is the same as transformers/LLMs”:

1 Like