That’s an interesting observation - I would have rather said that DL doesn’t approach a cortical column at all, but it still has discrete modules which increasing their quantity leads to direct increase in performance - very reminiscent of the TBT theory describing cortical columns.
I am not claiming that say attention heads are cortical columns - but rather what they seem are to be very weak approximation of a cortical column (while vanilla NNs are even worse). I would point out though, that with research we are slowly increasing this approximation through various methods - who knows what comes next?
In the meantime, I think you’d be hard-pressed to prove NNs not to be cortical columns anyways. Since they’re both doing prediction, just learning by different methods.
I’m thinking functionally as a unit concept (not code/data implementation style) and of GPT-3’s deep long window, it’s a singular type chain/path. Single columns can do the same type of task (sequence pattern recognition and prediction) but the magic is when different columns are wired together where different “types” of pattern prediction combine (i.e. pattern specialisation encased into sub networks / columns and not networks specialising in types of dog or car, rather more abstract type of patern specialisation grouping). This type of temporal hierarcical structure can’t be tought in a back propogation type manner basically because of the temporal variance in the hierarchy that back propogation can’t reverse into. Columns or groups can learn (through sleep) but the learning is more localised types of specialisation. That’s my conjecture with zero proof, lol. That;s sort of what I’m trying to code.
Think of math tricks, whereby the 11 times table can be done by a particular pattern. That does not do “math” as such, but a basic symbolic pattern manipulation. That abstract pattern does not have to reside with the math group of learning if the pattern more closely correlates to an existing type of pattern and more easily fits elsewhere. This initial pattern is embeded / learnt in a feed forward manner. Not all what we learn fits together as we would like for “retrospective concious simplicity”, its the patterns that fit together however abstract they may be and in the brain I think they are “very” abstract and mixed in ways we cant split out. Think of the effects and outcomes for various stroke cases as to what is then missing, the effects of removing specialisation gaps.
A lot of what I am saying it to try and create different thought patterns and ideas to try and create variance away from traps of us and them type arguments or following routes too far.
Do you have a video or some figures depicting what you’re working on? I’m having trouble parsing exactly how you intend the system to function, based on what’s written.
It sounds like you have some kind of sparse model (directed graph?) running on CPU that uses some kind of algorithm (hebbian learning?) to learn how to chain together subsequences of text strings into larger sequences (sequence learning?), and incorporates working memory constraints in some way (not sure about this part). Does it write new coherent phrases, or parse text into trees, or label the nodes, or something else?
My outside view of neuroscience is that the standard of evidence there is typically experimental, but I thought Numenta was not engaged directly with experimental work? More like, Numenta looks at the neuroscientific literature looking for constraints that let them better craft theories of cortical functioning.
If their goal is to discover the algorithms behind sensorimotor model-building in the cortex, then the litmus test to evaluate “Are we on the right track?” is “When we re-implement this algorithm ourselves and give it the right inputs/environments, does it in fact learn sensorimotor models (or capture whatever sub-property the algorithm is intended to)?”
No video, etc. I’m not an academic type (no offence meant to anyone here) and just trying my own thing so a cross between the local village idiot and caveman, lol.
I went down the POS rabbit hole a while back and realised that the furious green sheep actually had a real message, lol.
I don’thave anything working at a level I think is any sort of proof, just conjecture research at this stage which is this post. The example is learnt feed forward, single pass.
The resulting structure (using the split in a particular way) is what I think is critical as the resulting hierarchical structure appears to merge episodic type memory learnt feed forward with predictive pattern memory capability. This then allows continuous learning to occur, which in my mind an absolute must for any system. Still a looooong way to go though and I still know nothing.
What is amazing is that FMRI maps have shown words are mapped across the neocortex with some localization of similar words. There are also multiple instances of the same word for different meanings. Essentially the entire neocortex is a word map! This means columns can take on many different functions. Question is where does decoding for words occur and how is it routed to each region of the neocortex responding to each word?
More information on the cortical localization of words and the grounding to the various sensory parts of the brain:
This is part of what drives my question on “symbols” above.
If you think of phoenemes & words as basic learned tokens that are distributed across the cortex then it leads to the question of whether there is some underlying data structure that is more basic to cortical map/area function.
I think Numenta’s SDR is a reasonable assumption for the data structure for symbols (and more). And many different things can be stored at each location. The question I have is only one word stored at each location? Why only one word? Why spread out over the neocortex? What mechanism leads to spreading over the neocortex? And there are so many more questions. If the FMRI study could look at not just word maps, but also sequence maps of 2 words, 3 words, etc and see if sequences have a location. What about smells? Colors? Faces? Names? Face/name combinations? So many studies could be performed to gain insights.
Which leads to the re-examination of what exactly is an SDR?
According to the Numenta papers, in a single pyramidal cell, an SDR is a relatively small pattern along a single dendrite. Suggestions have been made that there may be voting to combine the recognition in multiple dendrites in a single cell. Depending on which part of the cell (proximal, distal, apical) it serves different purposes. This is the equivalent of a grandma cell as the pattern being recognized is very localized. If that cell dies then that pattern recognition dies with it; given what we know about memory, this does not seem likely.
The Thousand Brain Theory adds voting to support the joining of this very localized recognition into a larger, distributed pattern. TBT is agnostic on the organization of these lateral connections. This is still very localized pattern recognition. Various lines of research place the reach of the lateral connections to about seven mini-columns. This is very far away from a pattern that engages an entire map/region area.
The hex-grid coding that I have described elsewhere extends this voting into a larger regular structure that has a remarkable similarity to the 6-way symmetry that has been observed in actual brain imaging techniques. This agreement may be a fluke but any other explanation will have to fit these same observations in the lab.
These region/area codes are connected to other maps/regions via fiber bundles (the axons of the pyramidal cells of certain parts of the 6 layer structure of mini-columns) in a roughly topological organization. This is not strictly a series of 1:1 relays from sensory maps to the hub regions, the bundles form numerous skips and lateral connections. A given map/region may send out more than one bundle, each of which is some fraction of the total map output. This allows partially parsed sensory elements such as motion and color to be (re)combined and further processed at different levels of the hierarchy.
As I see it, this works somewhat like DL in that there is micro-parsing of the sensory stream where learning and recognition occur at different levels of feature extraction. This matches well with the FMRI data where various words are both distributed across maps/regions and are grounded at one end.
As to where any given word is located? It is distributed and reuses some local part of recognition in different maps - somewhat like letters in a word. The same spatial sub-pattern one a map/region can be part of many different words. The map/areas can form many different patterns so they can be part of a large number of words.
While this topology could be imposed on standard DL models, HTM offers an integrated combination of spatial and temporal patterns. The brain processes spatial-temporal patterns so HTM fits the task pretty well. DL as currently implemented is mostly stuck with spatial patterns.
Given that we typically learn what a word actually (supposedly) represents after we already have an abundance of sensory associative memories, once we formally recognise (associate) what a word represents to us, in a way it is no real surprise that the associations “appear” dispersed in the cortex or the word appears to be stored in a broad area. Words typically mean a multitude of things to everyone and inherently (uncontrollably) trigger all sorts of memory associations, especially based on their temporal proximity (priming effect) to other words / sensory events conciously or not.
I’m therefore not convinced that inputs are stored as dispersed as to what fMRI correlation scans show up, especially when you watch the sequential pattern of activity before and after word triggers.
What I’m still puzzled about (beyond zebra finches), is how would we store sequences like a sentance from a book ? If “words” are stored in a manner that is dispersed beyond the 7-minicolumn distance how are they sequenced / joined AND sustain connectivity coherence with memory performance of a savant as this is biological reality ?
The hex-grid resonance I’m still trying to wrap my head around, I read the grids as parallel active/sense and not sequential sequence chains or was fig X in one of your posts with 32-512 nodes the pattern / sequence of a grid activation ? I sort of get the grids but more likely 100% wrong.
The whole point is: for a rat it just doesn’t matter: it’s not about ‘specific’ it’s all about G for general.
The best game-playing machines, like MuZero, require a game with a very precisely specified set of rules, and after a million trials the machine is getting good at it, but only for that game. Changes the rule a little and it’s another million trials.
A quick web search will turn up plenty of things rats can be trained to do with just a few trials, requiring sophisticated cross-domain skills like number, colour, category and so on. And knowledge of one test (say pressing a lever) translates readily to others.
No, the rat is way out in front, at least for now.
I didn’t mean they’re neuroscientists, just that their theories are neuroscientific I guess. They give back to neuroscience, e.g. their papers have been cited by neuroscience papers a lot. Neuroscientists do modelling and create theories too.
Neuroscience experiments are tricky to interpret, so mimicking the brain requires dealing with that. Even if mechanisms work, the brain might not use them. It’s hard to get things right at a more conceptual level, too.
The main hex-grid post on this is already a wall of text. I did mention the limits of scope at the start of the post but as was mentioned, it assumed that you already were well versed in HTM to start with.
The hex-grid concept only applied to a limited part of the HTM/TBT theory - it extends the TBT part of HTM to describe the organization of the lateral connections that are described in TBT and builds to describing a higher-level organization that emerges. This only applies to the pattern recognition in the L2/3 layers.
It does not offer anything to the temporal portion of HTM theory which is more properly located in L5/6 layers. The hex-grid theory can be thought of as adding sequence tags - a unique pattern that is associated with a sequence. There are some important implications that are related to temporal pooling but this is much too involved to bring in here; it deserves its own wall of text.
On to the temporal sequences that you are mentioning.
Diving a little into sequence and HTM - HTM as described by Numenta only predicts ONE state change. It can use the context of the prior sequence to improve the accuracy of that prediction. It can predict several possible outcomes at the same time. If any of those predicted states do occur the column does not burst. Bursting occurs when a transition (a step of a sequence) was not predicted. This new, unexpected, transition is learned. This points to one of the most valuable features of HTM - surprise generating a signal that can drive attention.
So where does sequence generation come from?
I will make what is to many a very bold statement: cortex is passive! It responds to input signals but does not generate commands by itself. Cortex is driven by subcortical structures and responds to these drive signals. Some drives are by the senses as relayed through the thalamus, hypothalamic nuclei, limbic system, and interaction with the cerebellum. The response to these drive signals can be very expansive and propagate through most of the hierarchy once initiated.
Patterns, such as speech, are elaborated from the low-level drives from the subcortex through a sort of hierarchy leading to the motor drivers along the Lateral sulcus.
Speech, a learned motor behavior, has semantic content located at the top of the hierarchy, with syntax/sequence located primarily in the frontal lobe. So in this case, the sequence generator interacts with the object store to put semantic content into the sequence. Defects in either area have highly stereotyped speech patterns.
My dad had fluent aphasia and could make all kinds of fluent but meaningless speech sounds. Real words - strung together in what sounded like real sentences, but totally devoid of any real meaning. So the sequence generator was working but it was not being fed useful objects. The worst part was that he knew what he wanted to say, and could hear and understand that what he was saying was not what he wanted to express. I can only imagine how horrible that must have felt.
The named areas are Broca’s and Wernicke’s areas.
The patterns that are learned in the frontal lobe are transitions/sequences fragments. Interaction with the cerebellum strings these patterns together into proper sequences. Projections from the frontal lobe through large fiber tracts introduce sequences into the rest of the brain.
Why do members here think DL-based methods can’t achieve AGI?
To put the HTM/TBT theory into context, you have to see where it fits into a much larger structure that is the entire brain. This is the most complicated structure that I know of with hundreds of parts that have interactions that scale from microseconds to 100’s of milliseconds temporally, and from single-cell sub-components to vast system interactions spatially.
It is my personal belief that it will be necessary to understand most of this to build a functioning AI. HTM and DL are only parts and while they are very useful and powerful parts, they are not sufficient to lead to AGI.
Things like the elusive ‘G’ factor are emergent properties that will not be easily abstracted. To put it as simply as possible; what part of the drum makes the noise?
Nope, I think I know what I am talking about. The hex appears in real world during a prediction of being somewhere wrt some anchor.
But those cells are not ordered like that in the brain, all they do is predict a pattern of nearby neurons (probably listening L4) given a change coming from outside (probably motor neurons entering L5 or something I donno). They are all predictions of nearby neurons.
There is no symbol there. Just a couple of neurons decided it would be in their best interest if they could encode a metric space such that certain amount of change would come back to a previous state (like the Nokia snake game going to left side of the screen and come back from the right side) for the purpose of equal amount of activation (around 2%).
That happens if you can find a symmetry like translational, rotational, color, mirror, a tour around your arm (manifold) etc. just something to integrate over.
Then those grid cells are used as a sketchpad, some vector like path like feature uses their prediction to hold whole bunch of them, like an abstract trajectory.
Again, no symbols.
Appologies for that, I can be a little round about the houses and may get way more in a different way and way less in others == dislocated confusion.
TBT to me is the concept of colums structured in a hierarchical form and those columns recognising different dislocated temporal sequence fragments of sensory input streams dispersed among many columns. TBT does not have any coherent explanation as to how one column can work with another column at the other side of the cortex and seems to be struggling to get out of a single column.
I got that the grids were local(ish) but they do make a lot of sense and was thinking and wondering how any resonant type behaviour fitted in with ART and if a resonant search/path (attention) via the thalamic loop was plausible to alllow chain activation by inhibitory decay rather than just a whole parallel pattern resonance. The zebra finch research and probe data is where and why I’m thinking this (circa 10ms decay between fragments) but I do lack a few decades of prior research experience (tip of the hat and bow in respect) I have not seen the in vivo data of patterning that you previously mentioned which correlates with the grids and wondered about the measurement timescale/resolution (chaining plausible if they are not measuring propogation timing ?).
I understand SDR from a programming/data/pattern perepective, TBT 100% agree in principle. HTM “general concept” ok as it’s vauge beyone being defined as hierarchical but seems to lack allowances for hierarcy path recursion, which I think may be critical. I’m uncomfortable with the bit patterning being used in the way it is as it feels like DL type over application in some parts. Still lack an understanding of a massive section of the brain no matter how many papers, scans and brain dissections I look at.
The interpretation of the layering I’m still trying to settle with as the relevance of connection source / destination combinations leaves aspects of synchronicity untouched. I might be looking at the layers with a different prerspective / theory so don’t really want to mind meld into the existing way of thought just yet, keeping a bit of an open mind at the moment.
Also when looking at that really good video that was done for grids and the pattern behaviour, to me it looked like a phased array antenna in reverse (resonant focal point == source of signal). This reverse focal point (different timing delays from different sensory paths) would then sort of self select a random focal point to become the grid center that correlates with all of the activated sensory input, but this would be singular per mix of sensory inputs and compete with some of the SDR principles ?
I had not really considered it as passive as to the way you are thinking, but I don’t think it can be fully passive rather just very heavily inhibitory for signal propagation. I think “concious” (whaterver that means) thought is / are recursive activation loops / patterns in the cortext.
I have spoken with one stroke victim I knew before the stroke and tried to backtrack the areas and parts that were missing to try and guide the conversation but you start to realise that it’s not a coherent set of gaps when parts of the speech path is out. To me it was really interesting to talk with him for a couple of hours and he reciprocated just from the aspect that someone sort of understood him and was trying. I guess no amount or reading papers can convey some aspects and lessons of life.
Nope, you don’t.
You are confusing Moser Grid Cells that describe a particular stimulus-response property of single cells with my proposed hex-grid coding that describes the ensemble behavior of a certain sub-population of cortical columns. What makes this sub-population special is the configuration of the lateral connections that join these cells. Hex-grid coding is agnostic to the input stimulus.
Alright, so a major point has been the lack of hierarchical structure in DL models. As I interpret, the “hierarchical” part HTM presents, the cortical columns align in something like this:-
[#] [#] [#]
| | | |
[#] [#] [#] [#]
to draw something roughly - each column’s predictions being fed into a higher-up column that somehow takes this information as a prior and generates more complicated predictions…?
Is this the correct picture at least of the basic structural arrangement of cortical columns? Apologies as I don’t understand the complicated papers at all - so a simple explanation of HTM distilled down to a few core concepts would be very enlightening!