I haven’t kept up with either. But my impression is that Charles has gone for much more of a “cognitive architecture” type approach, actually moving away from the neural level, and plugging in “modules” which simulate function rather than form. HTM is much more focused on trying to learn from the biology.
On an implementation level, Brain Sim II strikes me as much more general. Whereas HTM is more of an attempt at theories of biology.
Charles may have originally thought that getting down to the neural level would be necessary. But I feel he has moved away from that. The main remnant of it might be some ball park estimates for computational requirements for a simulation.
That’s what I took away from his AGI-21 talk.
In particular, I don’t think he limits the neural model. As I remember his neurons were quite simple. Nothing like complex dendrites and such. Maybe I’m wrong. Maybe he addresses that somewhere. I wasn’t using them for what I was attempting, so I didn’t look.
Correct. That’s my impression too. He may have some different spiking models. But as I recall he went in the direction of implementing high level algorithms in a computational automata kind of way with basic “neuron” like elements. And then as time has gone on he has moved even further from the biology by abstracting stuff that can be done with such a model, as straight out serial code, where it seems convenient.
I think this is a mistake. That’s the “doesn’t see the swallows for the swarm” comment I made above.
Mind you, I don’t think HTM is “seeing the swallows rather than the swarm”, either. In the sense that HTM also ignores the dynamical aspect possible with neural interaction. In that sense HTM is also trapped in the ANN static “learning” conception.
What I wanted was just a connectivity. I am interested in modeling cognitive structure (starting with language, a language model…) as a dynamical system rather than a static learned structure. For that I just need a certain connectivity.
Initially I played using his “drag and drop” GUI connectivity feature. But for realistic texts that’s impossibly time intensive. So as I recall I sketched the logic of the connectivity (which is very simple, it is just tracing sequences from a text directly into sequential connections in a network) and Charles was kind enough to knock together a module to do that. I don’t know if that module is in his current download. I could check if people are interested to try it.
That was about it. I used the automated language connectivity module Charles added to wire up a network as the sequences from a moderate sized text. And then I used some GUI features to add global inhibition of varying strengths.
As I recall the raster plot module was just standard to his download too.
However, the standard raster module is not designed for multiple thousands of nodes, so they all get illegibly compressed together. And there was no kind of functionality to try and seek patterns, such as hierarchy, in the different spike time synchronizations of the raster plot. Those kinds of display modifications would be the next step to exploring this approach on Brain Sim II.
Then, probably, modifying the basic language sequence network connectivity module to create more advanced connectivity. At the moment it just connects words as discrete nodes. Something closer to an SDR representation would have multiple neurons for a word “node”. That will probably be necessary so that distance relations (“attention” in the transformer jargon??) can be coded.
I haven’t looked into the speed angle deeply. Charles claims it should be able to handle large numbers of nodes. I think I may need parallel hardware. He thinks his code can handle it.
I’m sure you could ask him about that, perhaps on the FB group. I believe he did some clever speed optimizations. Possibly Windows hardware driver dependent (from memory.) But that’s not my area, so I haven’t been into it.
I just need any kind of spiking neuron model and a render, yes. And a way to code the connectivity I want into it. The Brain Sim II project as such is something different. It has only been a convenient and accessible way to code up the dynamical network I want to experiment with, to add inhibition, and to verify that such a network oscillates, as a first step.
To emphasize. To explore the possible dynamic structuring I want to explore, I only need any neuro-simulator that will let me code up sequences of words in language as a sequence of neurons. And moving on, something to play with structuring the raster plots such a network produces, to seek hierarchical structure appropriate to language.
So “just the neuron model with a better renderer”. Yes.
It’s very appropriate to HTM as a sequence network. I’m sure it could be done in an HTM like way. And probably HTM conceptions like SDR’s will be necessary to get the right distance connectivity (“attention”) functionality. But the HTM platform was not accessible enough for me to try it with that.
If anyone wants to try it on whatever the platform is currently for HTM, all the better.
The vision would be to move on from this “structured” language network to make better predictions and implement a DYNAMIC language model - in contrast to the current state of the art LLM, LARGE language model.
This is how HTM can leap-frog transformers. LARGE is ugly. It’s impossibly compute intensive, which excludes small companies, and LARGE, while large, can never be large enough, if the actual structure is dynamical, and so infinitely reconfigurable, like the swallows swarming. Most importantly a model which is merely LARGE can never generate truly novel structures. That failure to produce novelty, being related to the current transformer problem of not producing interpretable structure at all.
Those being the shortcomings of the current “transformer” model, with its ANN “learning” pedigree and all that implies (the actual mechanisms of that “learning” being rejected by HTM at an early stage.)
So the goal at the very least can be seen as to leapfrog past transformers.
That wouldn’t be where it stopped. I don’t think this is only appropriate to language. Language is just a simple system where the right way of structuring sequential data becomes obvious. But language would be a good start. We now know that modeling language can achieve impressive results. That is a favour transformers have done for us. And tranformers are doing this with zero theory. Just “scale”. That’s the limit of their theory. Scale! LARGE. They don’t even know that the scale is likely to be because they are attempting to enumerate a sizeable proportion of chaos! If we replace this zero theory, black box, brute force, chaos enumeration, of transformers, with a simpler generative mechanism, well, at the very least it would be a lot cheaper to implement, and open the whole “language modeling” AI paradigm which is currently sweeping the world to a wider range of companies.
Python/Julia are more of data science tradition that visualization is essential and made easier.
Though in rendering/exploring millions of data points, https://bokeh.org/ has extraordinary capability overwhelming traditional Python/Julia plotting libs.
Can we consider the number of neurons/connections in human brain “large enough”? How is that number compared to successful LLMs so far?
Brian? Brian might be fine. Yes, I came across it in my searches for a platform. It just wasn’t obvious enough for me how to code it to get the connectivity I wanted.
Looking back at my notes, I see I attempted first to replicate a basic oscillating network on Brian using code examples from this book:
And I also hacked Spinnaker code, as I say. I see this old link in my notes:
That was running on the SpiNNaker machine in Manchester. This is a very special machine. The biggest version recently reached 1 million cores:
There was some kind of public access. I think it was only through the European “Human Brain Project”.
The European Human Brain Project is now defunct I understand. But I think there is a reheating of some kind of public access. Unfortunately I missed the change and they zapped what code I had got together when they rejigged their access framework.
(When I explored recovering that lost account I actually talked to some of the guys who provide tech support for the project about paying to get them to code an implementation. Would probably be easy for them. But they said they couldn’t do outside work because it would violate the terms of their employment. So I couldn’t pay public employees, to code public facilities, for a project of public benefit, because they can’t do anything that isn’t mandated by a government organization! That’s Europe!)
To be fair, I believe there is still limited public access. Though likely, as with the original access, it has to be run in batch mode. And, clearly, you have to do all the coding yourself.
Bokeh looks great.
Finding hierarchical structure in the raster plots may not require much sophisticated visualization. The main issue will be to find it at all! So it may not matter if we don’t have absolutely the best visualization tools.
Always good to have the most powerful tool available just in case, of course.
Well, the human brain is not going to be large enough in the sense of infinite either. But different potential orders over the connectivity is always going to be much, much, larger than the number you can realistically ever enumerate. And the idea is not to enumerate them all, anyway. The idea is to be able to generate any one you need at any time. So in that way - being able to generate any one you want at any time - a generative system does capture infinity.
Nice. Am I right that the main focus is this talk:
I need to look at it more closely, but at first glance I suspect their focus may be on a different way of using spiking networks. They talk about “training” them. I get that impression from most work on spiking networks. Also projects I’ve seen centered around Intel Loihi. They seem to be focused on ways to get spiking networks to emulate the training/“learning” regimes of current ANN techniques. Understandable, I guess.
There is also a fringe spiking network use case which focuses on using spike time to code images by spike frequency. Different again. Though they claim some biological motivation. I think this is the basis of Dutch Australian/US startup BrainChip.com.
(If it’s of interest, From my notes I decided Thorpe’s algorithms work on the basis that greater signal intensity causes neurons to spike earlier (this contrasted with other comp. neuroscience stuff which correlates image intensity with spike frequency.) Spiking earlier in response to image intensity causes images to have a time signature. They then have a separate recognizer (justified biologically as a “Grandmother cell”?) which learns that time signature and associates it with the object to be recognized. So it is really just translating a spatial signal into the time domain, and learning the pattern in the time domain with a separate “Grandmother cell” neural circuit.
As I recall, Simon Thorpe was working at some biological research institute in France, then started a company SpikeNet to commercialize what he decided was the biological basis of vision in spike time. And the IP of SpikeNet was sold to Brainchip.)
That Julia project might be good though. Even if they think they need to use the spiking net to implement “learning” and so just emulate back-prop! If they’ve implemented a spiking net, there is nothing to stop us ignoring their “learning” algorithms, just straight up coding the sequences from some text into that implementation (in principle easy to do, you just add a network connection for each word in sequence), and then applying inhibition to get oscillation, and work from there. It might be very easy if you’re familiar with Julia code.
Don’t be confused by the Darwinian tangent stemming from Edelman’s theories. I like that those theories embrace novelty. But selection over variation is just one theory for how novel structure might arise. I don’t think it is what is happening. I don’t think Edelman was right.
I believe the generation of new structure is deterministic.
But it is deterministic in the sense that chaos is deterministic. The minimal representation for its determinism is the system itself. (I think this is also the source of our sense of “free will”. Even the creator of the system does not know exactly what it will do in every context. Even the system itself does not “know” what it will do until it does it. Also related to the Turing Halting “problem”.)
Like the weather. It will do what it will do. And it will do it deterministically. But it doesn’t “know” what it will do in advance. It just blindly, deterministically, relates elements using the laws of physics.
For cognition too. The system does not know what it will generate. It only knows what it wants. And what it wants is to maximize prediction. It does that by grouping elements which share contexts (to minimize the “energy” of the prediction task.)
If you want to think of cognition “knowing” something that it wants in advance, that thing is to maximize prediction. And that I am sure that bias to maximizing prediction did evolve by random variation and selection, yes.
The parameters of such a system are is actually just like unsupervised learning. I am sure this is just like what is already happening in transformers. The learning algorithm of the transformer finds an energy minima for the network of observed sequences of language, in order to maximize the prediction of the next element.
The only difference and contrast between the system I am proposing and transformers, is that I propose we now accept these energy minima are chaotic attractors, not classically stable states. So we can’t find them by following nice stable gradients using gradient descent, as in the classic ANN paradigm. We need to find the energy minima of the network connectivity by some other means. And seeking the resonances of the network are a perfectly natural way to do find those energy minima.
What selects a particular attractor at a particular moment will be the context. Firstly, the context of a… “prompt”, to put it in transformer terms. The first context can be the prompt, a sentence which is submitted to the system.
I think a transformer is working in the same way here too. I think they are using the prompt to select between numerous enumerated contradictory structures. It is just that this selection is hidden in the transformer paradigm. All the different contradictory structures are intricately entangled and opaque (we don’t even suspect they are there!)
So the machine is not “deciding” what it wants. It knows what it wants. It wants to maximize prediction. And it knows how to do that (this bit no doubt was evolved by Darwinian selection.) It knows to do it, it has evolved to do it, by grouping elements which share prediction.
But the groups of elements which share prediction have not evolved. They were just latent in the data. We seek an energy minimum in the data, in response to a “prompt” (plus any other relevant context, the environment, the general goals of the organism at that moment, if it’s hungry, bored, etc.)
This selection of different attractors in response to the state of the organism, resolving the latent ambiguity and choice in the world, will be what Walter Freeman meant when he talked about “intentionality” being such an important determinant of “meaning”.
Look particularly at Section 3 “The intentional dynamic model for the action- perception cycle has its closure through the environment”
This can also be Bob Coecke’s “Togetherness”, relating to the way a quantum state depends on context and observation.
The answer is simple. The machine can “decide what it needs/wants, at any time”, in exactly the same way a transformer does. By responding to a “prompt”.
The only difference is, I don’t believe the prediction maximizing states of the network are stable. So they can’t be “learned” by gradient descent like in a transformer.
We need another way to find the prediction maximizing states of the network.
And oscillations are a perfectly natural way to do that.
Do you believe there is (at least) one exact maximum predictability of the world system, that we can approach infinitely closer and closer?
My take so far is, that all ways of “prediction maximizing” attempts are approximations without guaranteed monotonic convergence, in some sense of entropy-lowering (i.e. meaningful gradients to descend), yet Darwinian evolutions have directions for “overall” rather-than “continuous” improvements, but barely have a “maximum” head on.
How can “oscillations” get such directions? Toward some maximum or not?
Ah yes! Survived ones must have done things made the survival, and higher species have passed on / improved the knowledge of how to survive, generation by generation.
But in context here, I mean a concrete set of “connectivity” “it needs/wants”, and in this sense, I think the older brain just already “has” “what it wants”, instead of “knows” “what it wants”, by being capable to “generate” the connectivity.
There is “kick starter” hardwiring in the older brain that bridge from the drives to action.
We call these instincts. All learned higher level learned activities start from these.
No, I don’t believe there is any “exact maximum predictability of the world”.
Cognition has to guess.
I believe the way cognition guesses, is it generalizes. Very rude of it, I know. But things that work are like that. They don’t care about your feelings. So I believe cognition generalizes unforgivably, which prejudices it about things it does not yet know.
So, for instance, if a cognitive entity has observed a leopard and a lion eat a gazelle, and a leopard eat its friend, it might generalize that a lion is likely to be a meal threat too, and prejudice it against the gentle lion, even without certain knowledge.
In language this means what I observed earlier in this thread:
*pay effort
PAY attention/a call
MAKE a call/an effort
attention
/
pay
/ \
(?) a call
\ /
make
\
an effort
Now, in the case of language this generalizing function is here seen to be making an error. So not an “exact maximum predictability” at all. (If it’s not clear, to say *“pay an effort” is “odd” in English. It is the kind of disfluency typical of those learning the language, or any language. Just the kind where it is difficult to explain why, and you end up saying that’s just the way it is. Unless it’s said often enough… And then it becomes the language!)
But the error seems to be revealing how the system attempts to make predictions. And in speakers with a deeper experience of expressions, those predictions become more accurate. Because the deeper the learners experience of a language, the more it will come to be that the learners generalizations will be made on the basis of the same set of observations as the general run of speakers of that language. The full set, and not just the impoverished set of the early learner. And basically define what it is acceptable to say in a language.
If you look at that trivially simple network above. The one making the prediction error. Imagine you hit it with a hammer. I assert it is reasonable to imagine that the most tightly connected nodes may synchronize their oscillations. In this case those most tightly connected nodes are “make” and “pay”.
So it seems reasonable to say the network may be making generalizations on the basis of shared contexts (shared contexts in this case are “a call”: “pay a call”, “make a call”.) And it might be identifying elements which share contexts on the basis of synchrony under oscillation.
Yeah. I agree. You can guess that most of us are going to be motivated by the usual suspects, hunger, thirst, the no. 1 most popular subject on the Internet… etc.
But even below that, I think it is fair to say that prediction is adaptive even without explicitly evolving to desire it.
It’s just a fact of life that if you are able to make a good guess about what is going to happen, you are likely to be able to prepare better for it. And that preparation is usually going to help you survive.
I confess I’ve skipped a lot of details in your previous posts (might be feeling unable to grok them), can you please hereby describe the algorithm to encode/decode words/sequences into/out-of “synchrony under oscillation”? Like helping the software engineer (which is me here ) in writing the simulation code?
I don’t expect the core engine (LIF model?) to be complex (which should be simple as you’d said), but it puzzles me about how to feed the simulator with input data, and how to observe the output, data-wise possibly /w visualization, to figure out things going on there.
The subcortical modules are a hotbed of oscillation. Oscillation seems to be an important computation and communications method in the subcortex - I would not be surprised if that was the primary means of coupling to the cortex. The discussion is circling around motivations and drives, with mentions of how this motivation is communicated from older brain regions.
Since this thread contains significant content regarding oscillation in cortical computation I am taking the unusual step of posting things that I have not read yet. Understand that this may be completely off base to the discussion but my casual preliminary scans suggest that these papers have some relevance.
These are taken from papers that I have been meaning to read but have not gotten to yet.
(so many open tabs on my browser …)
On the chance that you are bored and want to dig in to this area …
How to feed the simulator with input data at least I think I can describe with some clarity. Here’s some pseudo-code I sent Charles Simon when I was trying to code language sequences into his project:
public override void Initialize()
{
ifstream corpus;
corpus.open(“corpus.txt”)
std::string word;
std::string nextword;
while( !corpus.eof() ) {
corpus >> word; //I'll format my corpus as word per line to save the effort of splitting lines
corpus >> nextword
while(nextword != ".") {
void AddSynapse(string word, string nextword);
word = nextword;
corpus >> nextword;
}
}
Clearly this didn’t work as written. But perhaps you can see my logic in it.
I also have the working code which Charlie wrote from this. I guess I could share that here too. I think it is part of the public project. But perhaps the underlying logic might be clearer with this fragment despite my bad syntax anyway.
Then add inhibition to this network until a random activation on any node causes global oscillation. Just inhibit the whole network, and vary the intensity until a random activation does not immediately activate the whole network, or completely die out.
For the output. It’s just a raster plot at this stage. To envisage a hierarchy we might start with something like the clustering displayed in the Brazilian paper. Fig. 2:
In fig. 2 you can see that spike times cluster into four distinct synchrony patterns.
Do you see that the shared context (shared sequence with “a call” in the example in recent messages) might cause “pay” and “make” to synchronize as a “cluster”?
attention
/
pay
/ \
(?) a call
\ /
make
\
an effort
There is no hierarchy in fig. 2. But that’s because there is no hierarchy in the community structure being explored. In the network being explored in that paper there are just four distinct clusters. I imagine if there were sub-clusters within the four main clusters, that would be visible as similar distinct synchrony patterns within each of the main synchrony patterns.
Then, to build a parse tree you might look at submitted sentences (prompts) and break them into sub-sequences which form clusters like “pay” and “make” do above.
So, for instance, given a (prompt) sequence “Place wood blocks under the wheel” you might find “wood blocks” participates in a cluster which synchronizes in the context between “Place” and “under the wheel”. Which is to say, the synchronization clusters which form might suggest a structure:
Place (wood blocks) under the wheel.
And so on to build a full parse. Looking for sub-strings which form synchronization clusters like those in fig. 2, above:
((Place (wood blocks)) (under (the wheel)))
This is the sort of meaningful structuring which transformers are surely doing implicitly, but do not expose. Which makes it difficult to interpret or guide the things they say.
Having that structure should aid us to make predictions about the next word in the sequence. The next element in a sequence is the current transformer output. That’s all the output. Transformers just poke out a continuation of the sequence, and if you don’t like the continuation you get, you have no choice but to try another prompt. Given such a structure we could influence how the sequence continued. For instance, we might have some specific protocol for what to do after you place wood blocks under the wheel (this example comes from an old machine translation task in auto manufacturers manuals.)
Prediction would be… whatever is most highly activated in the network as a possible continuation. Or, the best continuation might be whatever forms the closest super cluster about the clustering above. I’m not sure at this point. But the information would be in the network. We would just have to extract it.
Simplistically, HTM works by overlapping/associating bit space in SDRs, with enforced sparsity. However it has no true spiking and not really any time-dimension - merely synchronous queues.
Taking this into the LIF space changes the game entirely. Now everything is temporal dynamics, complex oscillators and attractors in phase-space. Time is now a first class citizen.
Overlap/association now is based on what? Matching network frequencies, assuming each network ‘holds’ a piece (or sets) of data? Presumably another mechanism notices the match? (To be fair - this is true in HTM too with WTA).
Perhaps there is a hybrid - An SDR keyed resonance? (GPT3 would be proud of such a word salad )
One good point is neither system has any embedded data (as in chars or sounds) - you have to translate in and out. It is all context - semantic meaning is a quality of the generator/encoder.
Naively, neither of these seem capable of generating new data however - they are association models of known tokens. Kind of like deduction - but not abduction.
Thanks for these. I have some But more never hurt!
I don’t want to obscure the narrow task I’m trying to perform at this point though. Most of what I have seen, and these too, seems to be focused on possible signaling roles. I haven’t seen anything associated with a possible neural code.
Except… that “binding by synchrony” was observed way, way, back, in the '70s? But no one could figure out why synchrony might have a meaningful code significance.
Well, what I’ve come to suggests shared sequential context could be that cause.
That oscillations might do it totally surprised me. It hit me like a ton of bricks. I had been of the way of thinking that oscillations were just a detail of brain biology, and unnecessary for emulation. Like neuron spikes. But, hey, if oscillations perform a practical purpose in the network processing I was being driven to otherwise, I wasn’t going to fight that!
Frankly I took it as a strong hint that I was on the right track.
So I am specifically interested in structuring sequences.
I’m sure they perform other roles too.
But my close focus at the moment is just structuring sequence networks. In the first instance for networks of sequences observed in language.
If you have any refs to other people considering oscillations for structuring sequence networks, now that, I would be interested to see!