Yes, please! I put my thoughts here for anyone/everyone to try. I claim no ownership, and give them to the community as a whole. If it does turn out to be the greatest thing ever and somebody tries to patent it, I’ll chase them down and leave a bag of feces on their front porch, though.
They would learn trends and general directions, rather than full-on delayed rewards like you mention… yet if it also has TM, I may eventually pick up successful “paths” through the problem space. This idea is definitely rough.
I’m drawing inspiration from the brain, where no single piece of the brain has the full picture of everything. Our thalamus is somewhat like the neocortex in parts of itself, but if we think about what information it’s getting, it’s getting:
- Input information from various lower systems of sensory input
- Output from the neocortex
- Output from other parts of itself that manage other types of input/output → connects internally with different thalami nodes managing audio, visual, somatosensory signals
- IO from other parts of the ancient brain…
To me and my potentially uninformed view, the thalamus looks like HQ of the semi-conscious mind, the observer of most IO, and, when necessary, the director of communications and work projects.
Somehow (Hebbian learning?) it manages to learn how to modulate/coordinate these different signals depending on different needs, such as shutting off background noise in a room so that you can focus on the person talking in front of you, or letting “us” know we’re hungry and need to start planning how to fix that. In essence, the thalamus MUST be learning which IO goes to which area, as well as basic relationships between the input/output passing through it. So setting it up as a SP/TM object, of unknown dimensions, seems like the logical first step, or at least one that’s familiar/known.
It’s also likely, as in our brain, that various different “pyramids” (macro columns, really) would all be connecting through the thalamus, thus another reason to keep pools and connections fewer rather than denser.
What sort of depth we give it in this type of architecture, we’d probably have to experiment a bit with it, but when in doubt, let the constraints of biology be our guide.
Also, Just a random thought while writing the above… inputs from the state-change pools into this thalamus might also be in the form of a SUM’d surprise/anomaly score encoded as a scalar (or an inverse of it, to show when the outcome was predicted successfully or not).
Yes, in the both the initial input pool, as well as by the state-change pools above it. Whether or not those predictions are used to produce output would be decided by the modulator (our thalamus) in this architecture. These, in conjunction with our thalamus, learn to guide the whole system as to which output(s) should be activated to correct a given state in our system (hunger, pain, fatigue, etc.)
Longer term (think “WestWorld” or Asimov here), that thalamical area is where we’d insert our “Three Rules” .
I also think some component of spiking NN’s could help on the macro-level as well, where perhaps potential outputs from the cortical pyramids would be accumulated over multiple timesteps… For each state-change category, +2 for every correctly predictive state-change pool (as an inverse anomaly score), -1 for every incorrectly predictive pool.
Hopefully I didn’t make things more confusing.