I’d like to know some perspectives about the following question that’s bugging me for a while now.
If there is such a thing as a state of mind, then why do we only capture a single state of the mind and expect it to do intelligent things? For example, we expect it will generalize its inputs?
It is easy to imagine that an input can be exclusive to another input(s) with regards to its place in memory. However, we seem to ignore this and believe we can generalize these inputs by using massive amount of parameters - deep learning. Probability theory can help tell the probability of an input or pattern to occur but it is not representative to a dynamically changing state of mind.
Do you imagine “state” and “context” to be equivalent things, or is “state” something else? I think HTM begins to address this (at a rudimentary level) by introducing the idea of how to learn to represent the same input in many different contexts.
The global workspace is interesting. I was referring to a more abstract one since I have no idea about the NS of it. I think the GNW is a concrete instance.
In ANN or HTM, we capture one state of mind or snapshot and then tell it to predict things. Can a state of mind become a superset of previous set of state of minds? I don’t think so, but IMO and I can infer that most people indirectly believe the opposite as we can see in ANN or HTM we pick a snapshot and use it to achieve our goals in ML.
HTM though is a bit different as it can utilize online learning. There’s a little bit of realism IMO.
I imagine state as a snapshot, for example if the brain can be freezed without destroying any of its components (e.g. unfreeze brings back to previous state). I can imagine context would be a part of the state.
Previous “state of mind”: [A, B]
Current “state of mind”: [A, B, C, …]
If so, then that is how HTM works (assuming a significant part of “state of mind” includes “temporal context” – there are of course many other factors in biology that contribute to state of mind, which are not modeled, like different neurotransmitters, hormones, etc).
For example, if the following sequences were learned:
A, B, C, D
X, B, C, Y
Not only are the representations for B different (“B after A” vs “B after X”), but also the representations for C, D, and Y. In other words, the D is not merely “D after C”, but “D after C after B after A”. So each “state” (input in temporal context) represents a superset of all previous “states”.
I think you’ve just given a great example of a state of mind in the HTM context. It is crude but the best analogy/instance so far because the fact that it can be simulated. The reason why I mentioned online learning above because I also see that HTM is continuously learning and updating its currwnt state hence realistic.
By superset I mean the set of all retained or
memorable(sorry lack of terms) states at a certain brain state iteration. In the illustration provided, it only will remember one set from the superset. It can forget other states which might be important for a problem at hand.
Some analogy would be quantum mechanic’s superpostion property where many versions of a particle can exist at the same time. However we only consider one version of this particle during measurent by calculating probabilities. We don’t see multiple versions of ourselves we only see one.
Why not choose a set (ensemble of states) of this superset and and use it to predict or classify things? Optimization can then be applied for this process to get the best set. My first assessment tells me this can be a very processing intensive operation.
Do you imagine these different contexts being active simultaneously? Or would it be that the multiple contexts are biased but only one active at a time?
Also, over what time scale are you thinking for these multiple possibilities coexisting? Milliseconds, seconds, minutes, hours, etc?
These are really great questions, and I think of them as well.
Yes, and they all get the chance to tell something about the input.
I think multiple contexts will be biased if there is such a thing as state of mind. By biased I mean they can exist at the same time. These contexts I mean like the set of memorable states I’ve described above.
This is probably the most critical and interesting question. I do not have an answer, but I imagine the quicker it would be the higher the chance it might “accidentally” recognize inputs. I think the time-scale is some parameter that can be used in the optimization/search process.
Another way I can think of this state of mind is that they again can co-exist with other states the co-existence is the set of “memorable” states. These co-existences then oscillate between each other which are highly dependent on the input dynamics (e.g. sequence, time-scale, set properties).
In the case of the HTM (e.g. SP), if you may allow me to model it as a deterministic automaton. It oscillates on set of states (let’s call this state of mind or SoM) where SoM only contains 1 state. We then use this final SoM to solve the problem at hand. The time-scale is ignored and we use probability theory to approximate a function that may tell which SoM will activate when an input is ingested.
My first question was like why don’t we use an ensemble of these SoMs with multiple states within it to achieve our goals? Why most ML/AI practitioners can live with just probabilities? If there is free will, these SoMs would have their own decisions, however, we just highly rely on probabilities. Maybe we lack the facitilities, or maybe completely simulating these SoMs and allowing them to decide by themselves is a fool’s work?
How would you define “state” with respect to the SP algorithm? That algorithm seems to be all about learning associations between populations of bits which are frequently active at the same time. It seems a lot easier to apply this concept to the TM algorithm instead, which is all about learning transitions and establishing context. Of course TM works on the output of SP, so perhaps that is the relationship you are imagining?
The state is the set of current synapse values at iteration I. Each state is composed of N bits B.
Yes if you look at it at the bit-level they are all active at the same time. However, if you look at it in a state-level (described above), eventually only one state is actually tested/utilised.
Maybe. The reason for me using SP is really that I can intuit TM is really just an SP that “learns” the sequence of states, the SP, on the other hand, is learning the input patterns. I’m not quite comfortable with using the term “learning” as it is really just an emergent event due to HTM’s structural constraints and algorithmic rules.
I don’t think the TM captures the state-of-mind that I’m describing here.
Ok, and so when you talk about “one state” eventually being utilized, are you referring to the k-winners part of the SP algorithm? If so, then in the case of the SP algorithm, then presumably multiple simultaneous states would somehow incorporate the minicolumns which were not winners?
Almost. Multiple simultaneous states in terms of minicolumns would be a set of a set of winning minicolumns. Say for example,
I=0, 1100 (this is representative to a state)
I=1, 1000
I=2, 1111
I=3, 0001
I=n, xxxx
Simultaneous means either sets (1100, 1000, 1111), (1111, 0001, 1100) etc, will participate in the final ML job. This is just a simplification, of course, where 1’s and 0’s are the columns. The selection of these sets is what needs to be optimized then.
You may find this relevant to your question: Orbitofrontal signals for two-component choice options comply with indifference curves of Revealed Preference Theory
Alexandre Pastor-Bernier, Arkadiusz Stasiak, & Wolfram Schultz https://www.nature.com/articles/s41467-019-12792-4
Thanks. May I ask why do you think the idea is not compelling? Why is it that ML of today utilizes a static model instead of a live one when in reality inputs are changing and most importantly the brain’s synapses are changing. If we take a snapshot S1 of the brain at time t and we perceive an object A, at time t + k (k > 0) we perceive exactly the same object A and take a snapshot of the brain S2, why does ML of today imply that S2 = S1 or at least close enough? Why is the idea of taking all functional S’s not so compelling? I know there are ensembles out there, but most of the focus in researches today is on a single model. There are swarm or evolutionary algorithms out there that can take advantage of optimizing a combination of these S’s but it seems this isn’t an interest to the many.
I think this is due to most of today’s ML algorithms do not implement online learning (in general – some do, but majority do not). What most algorithms are good at is labeling things, where you want the labels to generalize.
With online learning, the objects learned are continuously updated. S2 may not be the same S1 (especially if additional features or properties were learned between time t and t+k).
The brain is not a static machine.
Start with the general process described in this post:
This describes a continuous process that senses, processes, plans, and acts; what happens in the sensed stream closes this loop. The interaction between the feed-forward and feedback paths enhance maps to strongly activate certain ones as anchors to the streams of information. Some of these contents are perception, some are need states or unfolding plans. As hinted at with the last picture - there are multiple loops combining the sensing to the planning stages, and feedback from the execution units to cancel self-motion and filter the senses - see the cocktail effect in hearing for an example.
In these general paths, there are “side chains” of connections that are formed dynamically to shape and engage maps as this signal path is used. Please reflect on this diagram from Dehaene as a meditation focus.
I said this before but it bears repeating: this all happens in a continuous stream and could be considered as a purely parallel process.
In my mind, I think of these links as snapping from node to nodes like the sparks in a lightning ball, being guided by the contents of the maps and the degree of matching in the maps in the path. I sometimes think of animating the Dehaene maps to look something like this …
Thanks for the content. This was actually part of my question. Why then scientists/engineers pursuit in ML/AI is focused on a static/single-state models?