I’ll trust your sense in making steps forward, honestly I have little sense in this area (again, neuronal simulation is alien technology to me).
Then too, I’ll rely on you to decide further SDR schema details, please decide one for me. And further, as I’d asked before, once a letter SDR is allocated, keep it fixed (i.e. no further adaption)? What parameters / threshold to use, to interpret columns/neurons firing, back to letter firing?
Then the letter code in isolation: For each letter (ellipting spaces, no cheating, the spaces are what we should be aiming to insert…) assign… I don’t know… randomly select 5-10 out of 200 columns??
Then the letter code in sequence context: Choose one of the cells in each of the columns chosen for each letter in Step 2, and connect a “synapse” from this cell to any cell in each of the columns chosen for the next letter.
The cell chosen to carry the synapse to a cell of the next letter, need not be the same each time the letter is seen. In fact I think it is necessary for it to vary. That will mean frequently seen sequences will have many cell to cell connections, and it will be those many cell to cell connections which should cause synchronization when we get to the test step.
Take a good number of inhibition cells and connect them widely over all the letter columns, and from the letter columns back to the inhibition cells (I think all.)
Test step: Take a “prompt” sequence, and drive the columns of the letters in this prompt sequence, one after the other, in the sequence presented.
See what the activation pattern looks like. If you have a GUI like Brain Sim II this is maybe easier. But a raster plot may work as well.
I expect initially there will just be a sporadic pattern of firing spreading from the driving “test” sequence. And it will either die out quickly, or quickly spread to the entire network. To tame this we need to vary the strength of inhibition. So that will mean… varying the threshold of the inhibition links?
We might also vary the threshold of the excitation links (sequence synapses from cell to cell) at this point. I don’t know if varying the threshold of the excitation synapses or the inhibition synapses will be most important to balance the activation and give us oscillation. Choose some mid-range value to start with, and if it doesn’t oscillate try increasing or decreasing. It should be obvious which direction to vary when we try it. If the network immediately “lights up” fully activated, then we decrease the inhibition threshold, and increase the excitation threshold, etc.
If we can get the thing oscillating, that will be a good first goal!
My apologies for not replying sooner. I’ve been away from the forums for quite a while. It’s nice to see folks getting back to thinking about new ways to implement complex dynamical systems rather than just endlessly talking about AGI and its potential implications. I’ve always preferred the former over the latter. I still need to go over some of the previous threads for additional context, but I figured I could start here with something constructive.
For early prototyping, you could probably get away with just about any reasonably sized text corpus. Recently I’ve been playing around with “War and Peace”, because I just happened to have a copy of the novel lying around in plain text downloaded from the Gutenberg project.
If you prefer, you can think of the above as a column-level SDR encoding of an ASCII character. In that case, I would encode a sequence of letters in context as multiple adjacent columns. Each column acts as an encoder for an array of ASCII sensors. This should allow you to saccade along the input sequence with a variety of step sizes.
Example scenario: The agent saccades to the beginning of a word and the N-character sensor array encodes the leading space and the first N-1 characters in the word. It then attempts to form a representation of the current word and form a prediction of the next few characters in the sequence. If the prediction is unambiguous, then the agent can saccade forward in the sequence to verify its prediction, or simply move ahead to a region where it has less confidence in it’s prediction (i.e. multiple predictions are possible) and sample the text again to disambiguate it’s predictions.
As @bitking alluded, there are a variety of inhibitory cells present in the cortex and each of them seem to be performing an important role in some aspect of feedback/control of local activations. The result is a level of activation that is typically at or near the critical point for the local network (i.e. approximately the same number of neurons firing at any moment in time). Too many neurons firing, and the network blows up (seizure). Too few firing and the network collapses (comatose). In the code provided above, the sparsity in the representation is explicit and fixed (at 50%) by the encoding scheme. Inhibition should not be necessary at this level, but should instead be applied to the layers downstream of the input encoder to ensure that the homeostatic equilibrium state for the neurons firing in those layers is at or near their critical point.
From this point, a temporal sequence memory would be appropriate to form the predictive states. There have been numerous discussions on this forum and elsewhere as to the practical limits on the capacity of the HTM model, both for representing certain types of sequences (e.g. involving repeating symbols or varying durations or intervals) and also being able to reliably store extended sequences. I working on a new approach that I hope will be able to resolve one or more of these shortcomings.
Thanks for the suggestion. I haven’t checked the logic, but if it is just allocating columns to letters, it might save @complyue some time.
You’ll probably see this as you read through the back threads, but the expectation here is that by coding sequences using synapses between sub-sets of cells in a column we can use oscillations to identify the beginning and end of a sequence without explicitly moving up and down the sequence, or, technically, checking predictions.
Everything is supposed to happen from the oscillations. So it’s a bit different to what HTM sequence processing has been up to now, training, and then checking, predictions. I don’t want to do any training. Instead of training, I want to get a network of observed language sequences to oscillate.
The main need for inhibition for this case will be to balance excitation to get oscillation.
It may seem like an extreme hypothesis that oscillations will replace all the training and sequence prediction checking logic.
For identifying words from sequences of letters it should reduce to multiple repeated sequences, and equate to what is done in HTM now. So in that sense it’s not so different at all. This sounds similar to the sequence mechanism you are sketching, checking to see if a trained sequence is repeating.
So why use oscillations to identify these repeated sequences? The real advantage should come above the word level, where synchronized oscillations should equate, not just to multiple shared paths now, but to multiple shared contexts. Two measures, one process. That’s where it will differ from prior HTM theory. Synchronized oscillations should capture both multiple shared paths (for words) and multiple shared contexts (phrase structure.) The latter being new.
Comments and suggestions on this proposed oscillation “pooling” process are welcome.
It sounds like what you are looking for is something like a ring attractor. A locally stable attractor which encodes a sequence of a specified length. Once a given input has converged upon an attractor state, subsequent updates will advance to the next token in the sequence. It’s a variation on an autoencoder, so it still needs to be trained on the sequences that you expect it to learn.
I can understand your desire to focus on the cyclic nature of the firing patterns. I’ve been considering a similar model for a couple of years now. I refer to it as nested loops of consciousness. Think of it as multiple ring attractors updating at different frequencies. However, for the purposes of this discussion, let me focus in on just one such loop.
At the lowest level and probably operating at the highest frequency (let’s call it gamma) is the sensory update loop. Let’s assume that the input sensors are detecting a complex set of overlapping feature patterns. Attached to a sensor patch is a collection of minicolumns. (For now, let’s assume that by minicolumn I’m referring to just the neurons in L4.) Each minicolumn is attuned to (has been trained to recognize) a specific feature pattern that has been observed to appear on the sensor.
The proximal dendrites from each minicolumn are sub-sampling the sensor array in a manner that is similar to a convolution. As such they begin accumulating evidence in support of the hypothesis that is the pattern they have been trained to recognize. Recall that nearly all of the (L4) neurons in a minicolumn receive similar set of proximal input. Thus, the first neuron within a minicolumn that achieves its threshold will fire, subsequently stimulating a nearby inhibitory neuron (chandelier cell) which then fires, temporarily blocking the neurons in the surrounding mini-columns from firing.
Once that neuron/minicolumn has fired, it goes into a refractory period during which time it cannot fire again. That gives the next minicolumn the opportunity to reach its threshold and to fire unimpeded. Assuming that some fixed number of minicolumn feature detectors can fire in this manner before the first one recovers and is ready to fire again, then it should be possible to encode some number of feature patterns that are currently being observed on the sensor array - essentially allowing multiplexing of detected features.
Once this process has been started, then there are two mechanism that could potentially establish a ring attractor, and potentially provide the oscillatory behavior that you seem to be seeking.
A traditional HTM temporal memory algorithm can be trained to predict that specific sequence of input patterns to repeat over and over again until the sensor moves or the sensed feature changes.
The use of a Calvin tile configuration may also be able to establish an echo chamber of sorts. In this version, when the initial neuron fires in response to a detected pattern, it’s excitatory output propagates just slightly further than the radius of inhibition of the chandelier cell. At this distance, a handful of cells respond to the combination of their own input and the distal signals coming from the original neuron, causing them to fire. If this firing pattern repeats, then it should set up a hexagonal grid pattern of cells that reinforce each other when presented with a specific input pattern. If there is a temporal delay in these feedback signals, then that could also introduce oscillatory behavior.
I’m not sure the full pattern I want to capture could be described as a ring. For a word, perhaps yes. For a word you want the pattern that the same path repeats many times. So a ring? But for the more interesting extension to phrases it may be a “ring” in the sense activity would be a recursion, but the structure will be defined by shared external connections rather than a repeated internal connection. So more of a diamond, perhaps, if you think of the network being “pinched” at each end of a “diamond” of alternative paths, sharing the same beginning and end point.
I don’t know, is that a ring attractor too?
It should cluster in the sense of multiple internal paths with constrained entries and exits. But the multiple internal paths of interest at the phrase level, should not be repetitions of the same sequence. The thing about these clusters will be that they consist of many different sequences, only grouped because they share the same beginning and end points. (Though that could transition to repetitions of the same sequence. Which could be a mechanism for lexicalization, the way words in any language start as new phrases, but with multiple repetition retain only habitual meaning, and eventually become words in their own right, like the French “au-jour-de-hui”. But in the beginning, the essence of syntax is not repeated sequences. Novelty is what distinguishes syntax from lexicon.)
I gave some examples of these “diamonds” my other thread:
I’m with you up until the point where you mention the system has been trained to recognize a specific feature pattern.
I’m trying to get away from the whole “training” way of thinking about the AI problem. For justification of this see the entire “Chaos/reservoir computing…” thread above. I think what we have been missing in AI is that meaningful patterns vary, potentially chaotically, and so can’t be learned.
That’s OK. We can still have meaningful patterns. It is just that now they must be generated. And that’s OK too. Because you can generate meaningful patterns, if you have a meaningful way of relating elements to do so.
And I think natural language is telling us that meaningful way of relating elements is by grouping them according to shared context, which equates to shared cause and effect.
Hence the “diamond shaped” clusters I describe above.
So these “diamond shaped” clusters (actually “small world” networks?) will be a bit different. I’m sure the system you describe would work. But it would work for repeated structure. By looking for sequential structure with shared beginning and end points, I think the solution I’m suggesting can both capture this constant, chaotic, change aspect which has been missing, and actually be easier to implement than the repeated structure algorithm you’re describing.
I also think the “shared beginning and end point” structure will be the one which is being captured by transformers. The failure to capture this structure, I would suggest, is the main difference between transformers and HTM. And the reason transformers now dominate. But with the twist, that transformers also try to capture this structure by training. So they are also assuming intelligence is a kind of repeated structure. And their form of training, by gradient descent, is just the type that HTM has always rejected. With reason. HTM is right about that. It has been right to reject “learning” by gradient descent.
So HTM actually has a slight advantage here. Transformers are stuck with gradient descent. Trapped by success! Nobody is going to stop doing something that is working. They are stuck with it because for them it is inextricably entangled with the extremely effective shared cause and effect structure which gradient descent has accidentally been finding for them! It’s an accident for them. They don’t know what they’re learning. They only have their learning methods. Just the perspective on the intelligence problem that HTM rejected. That the learning methods are (partially) successful, traps them into thinking “learning” itself is key. It’s not. HTM was on to that early. Intelligence is not “learning”. Or at least HTM rejected that particular, back-prop, form of learning which it was clear was not happening in the brain. (HTM has become trapped by its own “learning” paradigms… But at least not back-prop!) Having rejected gradient descent makes HTM more open to capture the same structure transformers are, but by the more flexible method I’m suggesting.
For a contrast between the gradient descent method for finding this “shared beginning and end point” structure, and the network resonance method I’m suggesting, a good summary might be either the head post of the “Chaos” thread:
Or this post contrasting the “algorithm” with that of LLMs “The “algorithm” is to find prediction energy… minima. In that sense it is the same as transformers/LLMs”:
A quick code scan shows a synchronous cell-cell inner loop at _simulate_lnet (within sim.py) using 500 inputs/spikes to achieve an optimal firing threshold, along with a refractory period (by negative weight of -0.1v) and a decay factor of 10 (0.1) per timestep. A pretty standard LIF network, so far.
At each timestep you force the prompt nodes ‘on’, by forcing the voltage to threshold, before allowing the natural flow of activation through the grid.
ensure_prompted_spikes also seems to have a random choice element which I can’t quite work out. Can you explain why you use a random please?
Opening in another window detected a kernel (base Python 3.10.10).
But I immediately strike other problems:
Clicking the first cell gets:
Running cells with ‘base’ requires the ipykernel package.
Run the following command to install ‘ipykernel’ into the Python environment.
Command: ‘conda install -n base ipykernel --update-deps --force-reinstall’
Opening a new terminal
$ source /opt/conda/bin/activate
/bin/sh: 1: source: not found
Tried running the command:
conda install -n base ipykernel --update-deps --force-reinstall
EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
environment location: /opt/conda
That behaviour was on both Chrome with Windows 10, and Firefox with Linux Mint.
Also getting following generic message in default terminal on startup:
EnvironmentNameNotFound: Could not find conda environment: oscb
You can list all discoverable environments with conda info --envs.
bash: jupyter: command not found
bash: jupyter: command not found
I’ll continue to try to get that to work. It may have to wait until I have reliable wifi to update my Linux partition.
But in the meantime. Just from looking at your output…
I’m expecting elements of a “prompt” sequence to have their spike times pushed together when compared to a driving signal.
I tried to sketch that in this post:
The driving signal should be a different time for each successive letter of the “prompt”. Otherwise there will be no difference in input spike time to be pushed together. @DanML says you “you force the prompt nodes ‘on’, by forcing the voltage to threshold” at each timestep?
Are your current prompt letter impulses all at the same time?
What is the prompt for the raster plot you posted?
This means your Gitpod workspace is not initialized as configured, try stop the workspace by menu item popped-up from the upper-left 3-dash icon, and open the Gitpod url again (which will create+initialize a new Gitpod workspace per configuration), make sure you wait long enough, before seeing & following the notebook url printed in the terminal view, don’t interrupt the process setting up the oscb conda env. The first time a Gitpod workspace is opened, the setup may take a few minutes to complete, wait long enough, and it’ll be much faster to open the workspace later again.
No, one letter at a time-step as the notebook currently is. And you can increase
prompt_pace=1, # time step distance between letter spikes
used there, to have it even more slower.
Several notebook cells are pushing the simulation with (or without) prompt, for more time-steps. Read the notebook content, you’ll see there are 'xxx', then 'defcafe', then 'quick', and subsequent cells would push the simulation with no prompt.
The network is capable of a fine time resolution however, with all parameters identical (path delays, voltage steps, fade rates) and with synchronized inputs (all changes in the same time slot), you get behavior more like a set of logic gates.
Add noise to the variables and you start to get more of the fuzzy matching/synchrony that you might expect but you also add many more arbitrary parameters without an obvious way to train then.