Chaos/reservoir computing and sequential cognitive models like HTM

I would understand “randomness” here as the lack of a definite first time derivative for every aspect of the matters we care. Such derivatives mathematically manifest the “causal relationships” as time drives the world to proceed. Randomness is intuitively “change(potential)-with-no-cause”, do we still have reasoning power after causal semantics banned?

Such a “choice” is about picking/fitting an approximating time derivative function against the observed randomness? Yes the candidates can be infinitely large in number then. Some definite derivative functions would predicate the lines never meet, others not necessarily.

I can assure you that in the software industry, FP usually imposes way stronger and finer-grained limitations / structural-constraints, by leveraging stronger type systems, toward mathematically soundness; on the contrary OOP languages usually limit the expressiveness of its type system (C++/Java/C#), if not go rather loose (see duck typing) in typing (Python/Lua/Smalltalk).

That’s FP’s secret sauce for easier complexity management in large software projects: force participants to speak more clear and explicit about the premises and assumptions of a piece of software, as it’s handed out or get deployed. That’s to say, you usually get much less “choices” with a typical FP language (Haskell/OCaml/Scala) than OOP.

The reality is but LISPs (Scheme/Clojure) have a different story, stripping off semantics from S-expressions. OOP has no way to do this, and non-LISP FP don’t do this either.

And the reason is not very hard to guess, such limitations are need for principled software engineering, as a profession. Because a Von Neumann computer (even a universal Turing machine), has fixed semantics, all the software industry can do is mapping human daily business processes onto the fixed ISAs (which is essentially the lowest level programming language) of computers set to run. You don’t have any more interpretation power than the ISAs can do. Unlimitations can only exist in a programmer’s dreams.

But my pessimism like said above only covers programming qua “computer programming”, I’m in active search for the paradigm shift to “business programming”, and are faithful by far. I think the key is to maintain an adaptive “meta execution system” at runtime, never see exact literatures on this topic, but once execution/interpretation of one piece of surface code can lead to different interpretations of the same code at later runs, we can open new doors for such “choices” as you said. Currently as with designing a “computer programming language”, unstable interpretation of the same code, semantics-wise, is a BIG taboo.

Yes, “totally restructure on the fly” is the idea in my mind. Though I’m set to solve existing business problems atm, reaching novel problems is dreamful though.

1 Like

I don’t really mind the philosophical digressions! Especially if people find them motivating. And it’s not like there’s a lot of other discussion to interfere with! I was just a little worried that people coming casually to the thread might think it was all esoteric, and move on.

Actually I really like the challenges @complyue presented, which helped me think through the “completeness” idea in the context of object oriented and functional programming. And I love the continuation of the Dao Yi-Ching to 无名天地之始﹔有名万物之母。For which I prefer the translation: “In the beginning of heaven and earth there is no name. Name is mother to the multitude of things.” It says to me that there is no inherent structure in the world, and that an active “naming” process is the core not only of perception, but cognition. I find that entirely consistent with what you’re pointing out about Buzsáki too.

Yes. Thanks for that. I think you’re right. The focus on generating representation rather than responding to a single external structure to the world, is similar.

I hadn’t come across György Buzsáki before.

Looking at his ideas now. I found this article by Scientific American:

If that’s a fair representation, the randomness followed by selection reminds me of Gerald Edelman’s “neural Darwinism”. I believe Edelman won a Nobel Prize for understanding the immune system produced random variations and selected in this way. It seems he proposed cognition did the same. Teams around him made robots Darwin I-IV(?)

That’s a specific random selection method for generating novelty.

It’s good to have more examples of people looking at cognition as generating novelty. Really there seems to be quite a lot of this idea about. We need to bring it together and make it a focus.

Quite likely! Certainly it seems significant that current ML for language entirely ignores the field of theoretical linguistics. Which I agree is a mess today. But I think if they looked back at some issues around “learning” grammars which shattered linguistics and created the current mess, way back in the 1950s, they might have pause.

Thanks for that.

Yes, I think it is a theme.

Since you found that you might like some others I’ve come across over the years. For the perception as generation idea, anyway, if not the contradictions. Do you see the common thread in them too?

The main one I think of these days is Karl Friston’s Active Inference. That’s actually very compatible with what I’m proposing. Friston is even open to a chaotic generator. But he doesn’t have a structural “prior” to do that. So he is stuck with learning. And in practice uses the more generative focused statistical framework of Bayesianism.

But the reversal in conception from reception to generation is there.

The best exposition of that I’ve found for Active Inference is in this talk by Maxwell Ramstead:

Maxwell Ramstead — A tutorial on active inference
Feb 26, 2020

It’s also a theme of neurobiologist Walter Freeman. He talks a lot about “affordances”, which I understand to be more of an active, easing in, to the environment by the organism. He also traced roots waaay back in philosophy.

Nonlinear Brain Dynamics and Intention According to Aquinas Walter J. Freeman

I also see this in the contrast made between representational vs. structural theories by Romain Brette, here:

Roughly “representational” models are sets, while “structural” models are assembled according to grouping principles.

Brette attributes this to “Gibson’s ecological theory”.

There’s surely others. But you might find those interesting.

1 Like

No, no. It’s an axiom choice. It’s not something you can predict. One defines Euclidean geometry, and the other non-Euclidean geometry.

I was thinking after our conversation that “incompleteness” might apply both to objects which do vary, and objects which don’t vary, depending on whether you see it as proof completeness, or completeness in the sense of allowing variation. This might be the source of your sense that it is OOP which is “complete”.

There’s something in this randomness of choices as the creator of objects which I haven’t got fully to the bottom of yet.

This is also the core of Category Theory, as you may know.

It seems to be a very deep principle.

I learned it as the root of subatomic structure with gauge theories in physics. You can think of all particles as symmetries in one or other system. This is a bit like ripples on a pond. The ability of the water to move up and down is what “creates” the particles.

So, I’m now stuck wondering how the freedom of maths to have parallel lines meet and not meet, fits this “object creation” sense. Perhaps it creates the entirety of a mathematical system as a single “object”. With resolution of the world into different mathematical axiomatizations as the creation of different mathematical “objects”.

Trivially you can think of this equation of invariance/symmetry/randomness, with “objects”, in perception, by saying, for a ball for instance, that an invariance of being able to move all over the ball and keep the same cause-effect motion, might be thought to define the ball.

If you are deep in computer language structure, you might appreciate this work by based around deriving “meaning” by refactoring computer languages done by one Sergio Pissanetzky:

“Structural Emergence in Partially Ordered Sets is the Key to Intelligence”

I feel his derivation from different orderings of sets is very elegant.

Though I don’t think he captures the idea of contradictions.

1 Like

Rob,
I have been posting about this on the forum for years. This may have all been while you “were away” but I think you might be very interested in some of what I have been putting down.

First things first - you really REALLY have to read this online book. Darwinism plays a predominant role in computation this book, and chaos theory is mentioned. Don’t cheat, read the whole thing:

Based on the proposals there, I added lateral connections to HTM to implement Calvin tiles well before Numenta added TBT to the HTM canon.

See this post for mixing the ideas of Calvin with basic HTM:

In a nutshell, this replaces the HTM spatial pooler with hexagonal coding, or more properly, Calvin tiles.

If you have missed my posts I have a digest of the more important ones on this thread:

4 Likes

Thanks for this too. That’s a lot of reading! But passing over it quickly to get an overview…

So he’s taken Edelman’s idea, and fleshed it out with a detail of proliferating and competing hexagonal neural assemblies?

As I understand it the significance of hexagonal assemblies is that he wants large numbers, and copying, in order to get darwinian selection.

Was there some other motivation for hexagonal assemblies? I think you said in one of the threads you linked that there’s a tie in with grid cells for spatial location (as described in a video by Matt Taylor.) Is there some motivation for thinking (hexagonal?) grid cell patterns might be a mechanism for concept representation more broadly?

How has this idea been integrated into HTM since?

Anyway, in general, lots to like. I have some common points with Edelman’s Neural Darwinism in the first place. And I’m surely not going to object to folding in some of Walter Freeman’s chaotic attractors.

(Although the role of chaos here seems to be limited. There is flipping between competing states, but his attractor states are learned. The example of the “washboard” road, is trained into the road by the motion of the car. There is chaos in a flipping between states, but all the states have been trained in first. The chaos is not creating new states. I wonder if there is some iconicity assumed here: the structure comes from the world, rather than cognition creating new structure which says something new about the world. Also he talks several times about classifiers being abstractions. Here, chapter 7: “One should not assume — as I did, fresh from set theory — that individuals or episodes are the primitive unit memories, out of which classes are built.” This question of what we mean by abstraction, I don’t feel is adequately addressed.)

In general I’m going to hazard a criticism, which is my criticism of Edelman’s theory. That the sources of variation discussed don’t satisfy my sense for the “meaning” of how new structure is created. At best I think he is vague about how new representations might form: “cluster”, “concatenation”, “composite”, “superposition”?

But I don’t need to criticize his mechanisms for finding structure. He does it himself! In the books last chapter he admits the possibility of other structural mechanisms. Notably for language:

“It may be that some shortcuts, cortical or subcortical, are absolutely essential in order for the darwinian process to operate quickly enough to produce useful results within the time span of short-term memory.”

So, “shortcuts”? A separate structuring mechanism?

Nice that he sees syntactic “contradictions” (“special cases”):

“special cases abound. The intransitive verb sleep cannot tolerate an object-patient-theme (He sleeps it is a clanger). The verb give insists on both a recipient and an item given.”

But his ideas on structuring language too are very general.

To make a positive point, the general framework of dynamically competing associations that compete and reinforce each other, is very compatible with my ideas:

“Phrase structure is presumably a matter of the coherent corticocortical links to contributing territories, having their own competitions and tendencies to die out if not reinforced by backprojecting codes.”

“Competitions and tendencies” which “die out if not reinforced by backprojecting codes” would fit fine with my more dynamical sense of cognitive structure.

So perhaps I could insert my ideas firstly at the level of language structure.

Beyond that, there’s a lot to like! The general focus on constantly reforming structure as chaotic attractors is an exact fit with me.

So, to be positive. You might just insert my mechanism for finding structure in language, under whatever mechanism he posits for finding new, meaningful, structure. And beyond that, the idea that structure can be constantly new, and selected between as chaotic attractors, would fit fine with me.

To be even more positive, what problems have you met with trying to implement the competitive structuring mechanisms described in the book? How might my structuring mechanism help?

1 Like

This is the result of physical packing. If you draw circles around a given cell the intersections with other cells at a fixed distance you get hex grids. It’s in the reading material.

I’m not aware that HTM applies any hexagonal topologies in TM or SP.
It is used in the grid cell modelling naturally.

Then again, neither TM or SP really have any topology - so it’s just set theory really on SDRs. (I’m ignoring the ‘topology’ spatial inhibition model, which seems rarely used).

2 Likes

Exactly - which is why what I am doing is an innovation.

Based on the connection scheme as described in the Calvin book, this replaces the spatial pooler mechanism with the hex-grid tile structure. The tile are dynamically formed; there are no predetermined tile boundaries. This dynamism includes angle, spacing, and phasing.

Each formed hex-grid tile represents an discreet recognized state. They form a continuous recognition surface where the local recognition is smoothly integrated with surrounding recognition, forming a large area recognition with only local operations.

The stored states are intimately intermingled spatially where many objects are stored without interference.

One of the more important properties is that a recalled tile should correspond to a learned feature or object. This is distinctly different from nominal HTM where it is very different to match an activation state to a prior stimulus.

1 Like

HTM-scheme has experimented with hexagonal topology in TM. Code for distance calculations in a hexagonal lattice.

4 Likes

This papers list mentioned on Reddit seems related to this thread

I’ll avoid linking my own, but [0] below is a paper by another group that discusses analog memory conditions, chaos, and fractal basins in coupled oscillators, and [5] discusses reservoir computing with a single node having time-delayed self-feedback. [1]-[4] are papers that provide more context on analog computation, parameters, and chaotic dynamics; below I summarize them and the way I think about the human brain.

https://www.reddit.com/r/MachineLearning/comments/11t421q/r_memory_augmented_large_language_models_are/jchpn37/

3 Likes

I saw comments that it’s natural to adjacency, but didn’t know if there is structure like this observed in the cortex. Matt Taylor had an aside in sub-window at some point in his grid cell talk about how he didn’t want to say hexagons were everything, but they were everything, so I didn’t know the significance of that.

I’m sure I’ve read somewhere that the V1 visual cortex is structured as hexagonal arrays… Or perhaps hexagons of hexagons… And that the hexagonal(lly oriented??) elements are “oriented line detectors”. I have a vague memory the different orientations of moving lines were ordered in some way.

1 Like

Very interesting links. Though the poster seems to concentrate on efficiency advantages rather than the flexibility and constant change of a dynamical system. The claims for efficiency advantages meet some push back in the reddit comments.

I think there is a separate benefit in the flexibility and constant change. For instance, the poster points out that an “analog computer” aka network is “especially suited toward Constraint Satisfaction Problems”, which we all know from neural networks. But neural networks do not currently use the ability of dynamical systems to re-order themselves chaotically. Particularly weighted networks don’t use this advantage, with weights only changing gradually.

1 Like

Remote imaging has detected 6 way (hexagonal) patterns in the hub regions of the brain. Not just the “grid cell” areas. This has been mentioned in the recorded Numenta meetings.

Hexadirectional Modulation of High-Frequency Electrophysiological Activity in the Human Anterior Medial Temporal Lobe Maps Visual Space

The hub regions paper eludes me at the moment but I will edit it in when I locate it.

So this has moved beyond conjecture to observed behavior. Now the task is to resolve how that arises in the cortex.

It may just be an artifact of packing.

My elaboration of the mechanism posited in Calvin’s work certainly could be an explanation and it has the charm of fitting known wiring. It also provides very interesting computational properties.

1 Like

The key question in both, to me, is how they form structure. The Calvin, tile idea is by (darwinian) selection over variation. What the current structural parameter of HTM is, I don’t know. The only idea on offer in 2016 was repeated sequence as I recall.

Meanwhile transformers have eaten everyone’s lunch, using attention back along a sequence, and cause-effect prediction.

(And the zero-theory assumption of leaving theory in a black box. Particularly the zero-theory assumption of not excluding contradictions. Who would know? Nobody knows exactly the structure of the network generated. And they happily ignore that it seems to grow without bound. Growing without bound is good. It means you can get paid forever with continual improvements just throwing more data at it, and it means only the largest corporations can afford to play.

That growth without bound should be transformers’ Achilles heel. It implies there’s something wrong with the idea of “learning”. It suggests the reality is dynamic.)

HTM should have been the one to excel with cause-effect prediction. But it never found the right parameter. Maybe language would have driven it to the right parameter as it did for transformers. HTM could have combined the clear structural parameter of language, which is shared context, with the dynamic structure which is clear from the biology, and is the key quality of Calvin’s book, any amount of work by Walter Freeman, and apparent to anyone who glances casually at the dynamism of the cortex. But it didn’t. Language processing in HTM remained limited to cortical.io. And cortical.io have the same historical ANN focus on, “learned”, structure. Without the benefit of “attention” back along the sequence.

Calvin’s book emphasizes dynamic structure. That’s its real quality. That’s the biological content. Neural Darwinism is just a theory for how that dynamism might arise. HTM too should be open to dynamic structure. Dynamic structure is what the biology tells us.

The right parameter is latent in the history of language structure. It is just shared context.

So we have the ingredients. We have cause-effect prediction, we have dynamic structure, we have shared context as the clear cause-effect structural parameter from the history of linguistics. We should be able to surge past transformers.

3 Likes

I tried but failed to figure out the parameter space (thus the representation / data-structures within a computer’s memory) for such a simulation, maybe the design space (variety of different set of data-structures) is itself a solution space to be explored by a darwinian process?

I have also been figuring about computer simulation along your (@robf) line of thought, no clear output so far.

P.s. I do industrial software engineering, so neuronal simulation is alien technology to me, matrix based ANN (including back-propagation etc.) is under my radar, but far from my expertise.

I eagerly anticipate a computer simulation of “dynamic structure”, but computers have limited memory capacity, linear memory address space, and natively support “static” only structures.

1 Like

Ha. There’s a guy who believes that is the case for cognition. Richard Loosemore believes the only way to solve cognition, because it is chaotic, is to build a framework for trying lots of things randomly. He’s been trying to get funding for years.

There are a few people trying evolutionary approaches within an Alife (artificial life) framework which much come down to this too. There’s another guy who moved back to Czech Republic from Google to follow this approach as a pure research angle. I noticed him because he also gave a talk at AGI-21.

Tomas Mikolov - “We can design systems where complexity seems to be growing” (he says it somewhere in there.)

I liked especially Mikolov’s statement about “systems where complexity seems to be growing”, because it matched the title of my own talk “Cognition a compression or expansion of the world?”

Of course, I don’t think we need to do evolutionary selection, either within the system or to evolve the system. I think language, in the simplicity of its structure, points directly at grouping on shared context as (at least one) parameter. So the simplest path forward is just to try that parameter (basically synchronization of spike times in a sequence network.)

I think too much experience with current techniques could be more a hinderance than a help.

Trying to learn different neurosimulator platforms can be a barrier though, it’s true. I’ve tried a few. I hacked around a bit on the Human Brain Project’s Python Spinnaker API (Spinnaker is one of the few massively parallel spiking hardware implementations.)

The easiest I found was Charles Simon’s Brain Simulator II. He has a nice GUI which lets you connect up a network using drag and drop (as I recall he says some nice hardware acceleration in some Windows classes allows you to do this. The only downside is that it trapped me back into using Windows! )

If you want to play with something, I’d be happy to send you the XML for the trial network I got oscillating. You could probably just download Charles Simon’s latest executable, and load the XML.

Almost all of that is just standard Windows programming.

Charlie even has GUI functionality to generate raster plots in his Brain Simulator II code. But they are not designed to handle 10 of 1000s of nodes. So it needs some more hacking to tweak the displays.

Charlie is not interested because he has ideas about the best way to use his neurosimulator. So it’s down to me to do the display modifications I want.

So my most immediate need is not at the neuro-simulation level. It is hacking those Windows GUI display classes to try and project out some kind of hierarchy from the raster plot.

Probably boilerplate for someone experienced in C#. As I say, if I can get some money together I may pay someone to do some mods.

Did you see the submission I made to Singularity.Net’s funding round last year. I sketched the kinds of display on raster plots I think may reveal hierarchy in that:

2 Likes

A nice piece, it is open source!!

I’ll spend some time to see if I can port the core engine to Julia, and have some HTML5 GUI (rendering at least, if not interactive enough). Months maybe, as I’m not on this full time.

(I hate Windows™ too, btw ;-))

2 Likes

Great. You should feel free to reach out to Charles Simon. He was very helpful to me. It’s just GUI programming is not my thing.

His project as such is quite interesting. Quite a massive thing for him to have done by himself. I forget how he motivates a need for neurosimulation.

They have a Facebook group:

Unfortunately in practice he seems to be seduced by algorithmic shortcuts immediately. A classic reversal of not seeing the woods for the trees, where you see only the woods, and fail to see the important properties of the trees. Or could we say, with a more properly dynamic analogy, not seeing the fish for the school. Or a swallows swarming analogy, not seeing the swallows for the swarm. It’s such a temptation, easier to draw just the nice simple outlines, and miss the essential quality of the system.

bird-swarm-gif-1-156636765

2 Likes

Yes, this is indeed quite a piece of work. I hadn’t seen it before.

Can you explain in detail what aspects you need of the Sim II - are you creating your own modules? Using the GUI or in code (c#)?

The neuron engine claims to be distributed across LANs… but that may be at a higher (module) level only. Is it highly optimized for speed?

I guess the question is: do you need the whole thing or just the neuron model with a better renderer?

2 Likes

@DanML Could you pls give us a short comparison between neuron models in Brain Simulation II and HTM neurons?
Thanks

1 Like