i am trying to understand how does HTM works and is it biologicaly consistent ? What i understood is that yes it is biological consistent, and how it works, so we have a given area like auditory area, then we get input whitch come in the area in cortical columns, each cortical column contains minicolumns. The goal of HTM is to predict future, so i give the auditory area for exemple the beginning of a word, then output of the area give the whole word. What i understood about the working of this structure is that a single neuron in a given minicolumn get inputs and outputs of every neurons in all minicolumns of the given cortical column, So for exemple i say a combination of minicolumns means the first letter of my word, then one neuron in each minicolumn will turn up and by that activate another group of minicolumns by activating one neuron in each again to get the second letter of the word. What a don’t understand is what are cortical columns standing for ? If all them share the whole input of auditory receptors and temporal prediction can only occur in cortical columns, How can a pattern shared in different cortical column be predicted while cortical columns do not send information to each others ? I mean may be my first letter is encoded in a cortical columns and the second letter in a another cortical column.
There are clusters of cells in the mini-column but each has its own dendrites that radiate from the cell body. As each dendrite snakes through the surrounding area it passes a small subset of the 220 or so mini-columns in the neighborhood of that mini-column.
The collection of mini-columns also contains the rising axons that are projected to this area and are part of what is sampled.
I have posted on the biological details here:
I hope this gives you some insight it what HTM is modeling.
This is an excellent and perceptive question. Due to how humans like to break down problems and combine them back together in disciplines such as engineering or physics, we are used to seeing hierarchy in a way that combines and condenses information as you ascend the levels.This leads to a central command and control node somewhere towards the top of the logical structure. Much of the classical literature assumes that this is what is going on in the brain and presents this as a fact without any actual support from the known wiring in the brain.
When we try to apply this model to the brain we can see there there are layers of processing but the wiring just does not support the concept of the information merging into some central node - it seems to stay mostly in a “parallel” format as it courses from area to area in the brain. I struggled with this for the longest time.
It is hard to grasp but it seems that the recognition is distributed as a cooperative effort using short-range lateral connections through each area of the brain. This allows the individual columns to recognize its possible bit of the overall picture and vote with its neighbor on which of many possible larger scale things that it may be part of. All the computations are local but they build to a global picture.
With sequential recognition the bit that is being voted on is also the transition between this current pattern and the next pattern. This adds temporal recognition to the spatial recognition.
In the example you provide - the eyes move around and keep placing a small group of letters in the center of the visual field. There, this macro-column recognizes a letter and the next macro-column recognizes a different letter. They are forming a guess that this is part of a pattern (in this case - word) that they have leaned so these two columns are voting on a 2 letter digraph.The second and third macro-column are likewise voting on a different 2 letter digraph. This process is happening over the entire visual field at the same time. None of the macro-columns know they are part of a particular word or phrase, just the little bit they can see. Jeff Hawkins describes this as looking at the world through a straw. The larger local group of macro-columns rapidly settle on some pattern that we might consider a representation of a word or phrase.
I am saying two letter digraph for this explanation but of course - it is all the surrounding mini-columns at the same time.
The transition/time element would chain a sequence of changing input patterns into a stable output constellation that stands for an object or word or phrase. This stable constellation pattern could persist for several eye or hand/finger movements building up to longer word groupings as you learn more patterns.
Keep in mind that each macro-column in maps after the primary sensing areas can center on a different mini-column that was the winner in recognizing the local pattern; in mammalian cortex this macro-column neighborhood is about 220 to 250 micro-columns. This means that the center of a macro-column is not in a fixed place - it depends on the pattern that is sensed and what mini-column was most certain that it recognized the pattern - and won in voting with the neighbors… This is true for all macro-columns so the output pattern is not fixed to a rigid location or grid. The output is the constellation of macro-columns that won in this competitive/cooperative process. Each global input state would result in a collection of local output pattern of bits. Matt Taylor has described this as a constellations of stars; I think that is an apt description. Every learned input pattern or sequence results in a different stable constellation.
Note: there are variations on this plan: the primary sensing areas and output drivers are anchored to the body structures that they are attached to so they have fixed processing structures like the cortical columns documented in V1. Function is defined by connectivity.
This general process is happening at all levels so higher levels are perceiving and voting on the groupings formed by lower levels. Keep in mind that this is not strictly a pipeline as there are huge numbers of fiber tracts crossing up and down these hierarchies and between processing streams. This gives additional things for the local columns to perceive and vote on. Since you are perceiving your entire environment at the same time these perceptions are likely to be different aspects of the thing you are perceiving. Some aspects could be sensor-somatic positions of the body or eyes, and sensations from the skin or retina.
As you go up the hierarchy you learn space/time sequences, then sequences of sequences with mixing from other areas, and so on, until you reach the association areas. The representations in the association areas are the fusion of all the sensory processing streams. Sequences of sequences for the most stable object representation in time. Still in a distributed form.
Numenta is working on explaining how this might work using the Thousand Brain model. There are many posts on this topic in this forum. Here are some examples:
I am personally pursuing a slightly different take on this problem. It is almost the same as the TBT but differs in that I think the information organizes into a unique internal structure I have been calling hex-grids.
As I said, there is considerable overlap in the basic concepts of hex-grids and the thousand brain model.
This will probably blow your mind but - how does this information come together to make decisions and initiate actions?
I maintain that the cortex contents are shared with the subcortical structures (the lizard brain inside us all) and these older parts (in an evolutionary sense) decides on and directs actions through projections to the forebrain. I suppose that if you could say that there was a executive spot in the brain it would turn out to be the thalamic nucleus. I think that modeling this area will end up being the part that finally makes AI react in a way that we consider having “intelligence.”
But where is the temporal dimension ? I mean when i was talking about the letters i meant letter vocaly, like i hear someone saying the first letter of a word for exemple “a…” then my brain automaticaly infers a word like “apple”. What i understood is that in such case, i say the first letter who is going to activate only one macrocolumn for exemple, this same macrocolumn will get signals from different areas whitch will be the context, so i get the context whitch will make the macrocolumn in predictive state, signal “a” comes here and activates some neurons in minicolumns then some neurons in a digraph. Ok but i am not able to recognize a word.
By the way i noticed that macrocolumn is not really necessary isn’t ? Minicolumns instead of having inputs and outputs to each other minicolumns in the macrocolumn we can remove the macrocolumn and give minicolumns an input and output fields that make the same but undiscontinuous.
Agreed - I like to think of the macro-column as a declaration of the winner of the competition and the spatial domain of the local inhibitory action it triggers. It seems that you are seeing the mechanism clearly.
I think you could say that Numenta feels differently about this. This is their house and I don’t go out of my way to annoy my host.
BTW: I am still working out how to explain the answer to your previous post. I know the answer but am trying to work out how get in across in a way that makes sense. The answer is the working of part of a much larger system and I am trying very hard to avoid a 10 page answer.
It has to do with a small local temporal/spatial match being a puzzle piece that fits into a much larger pattern and how that pattern forms. It does not just snap into being - there is a process.
Thank you a lot for all those informations. This algorithm is really powerful and is exactly what is needed to make an AI acting like a human, so i can’t understand why is there no robots with this algorithm walking in the street yet. What is braking that ?
A mini-column recognizes little part of a pattern in space, and transitions from one pattern to another in time. It is not just two letter digraph side by side but two letter digraph one after the other. This is happening in all sensory modalities at the same time. How does this fit into the overall structure of a vast sea of mini-columns to do useful things?
The brain consume vast amounts of energy compared to the rest of the body. Evolution has built in some optimizations so that the population of cells firing is kept sparse as much as possible. In nature there is a lot of the same things going on in your environment most of the time. You need to notice changes (Surprise!) as these are most likely to be something good or bad. And mostly recognize and ignore the familiar things.
So - with this in mind what tasks are being supported by this mini-column? You have to recognize that something has been seen before and somehow group it with a decision of naughty or nice. This should trigger some action - eat it or drink it or run from it or have sex with it. Some action. You have to know when you have NOT seen this before and learn it as rapidly as possible. It would be handy if while you are learning it you learned if it was good or bad - how you felt about the encounter.
In all this our little mini-column is sampling the stream of sensations. All by itself all it can do is learn a very limited size pattern (about 500 um of cortex space) and burst if it is not able to match any of the dendrites in any of it’s cells to the time/space pattern being sensed. Also - if any of the mini-columns within its neighborhood do recognize it bursting should be suppressed.
Some parts of the column (L2/3, L4) are selective to the stream of information coming from the senses, some parts (L5, L6) are responding to the stream coming from the bodies command and control centers. I should point out that the concept of a mini-column extends from the cortex down to the layers of the thalamus. There are important command and control functions there, including gating of sensation and routing of activation.
Focusing on the part related to your question, the incoming stream part - it is like a tuning fork that resonates if the pitch it is tuned to is present. if a collection of mini-columns resonates to the stream it has seen this time/space pattern before and the axon activation signal coming out of some extent of this area/map are very sparse. This is what goes down to the thalamus and it enters tonic mode. This is picked up by the lower brain structures as “I have seen this before and recognize it locally.” This also acts to spread activation to the other maps/areas connected to this section of thalamus. Part of this activation is the thalamo-cortical resonance which binds this recognition together like a beating heart. This starts the processing in the hierarchy, which goes on the to do its own processing on these sensations. This stream of recognition is spread by axon projections from the L2/3 layer.
Also - a winning mini-column activates inhibitory inter-neurons that shuts down local mini-columns that don’t recognize the pattern in any dendrite - no bursting here! There is no need to learn this pattern because some nearby mini-column knows this.
But we are still in this little bit of one map. I have gone back to my previous post and added text that describes how maps going up the hierarchy learn progressively more complex patterns. It is still little patches of recognition - but of a larger pattern. They are also ringing like a tuning fork - trying to fit some larger patterns like words or phrases. In my mind I think it looks like the “suggest a word” function when you are typing something on a computer or smartphone. In speech production you are snapping (intents of meanings) to (word sounds) to (templates of sentences) as you go along. Each area is trying to fit its local version of a pattern into the stream in a cooperative process. In the sensing direction the stream of the world provides this guiding template to form the sensed stream. In the production direction the prior produced stream is sensed and provides part of this template.
What if NO dendrite matches the sensed time/space pattern? Then the mini-column(s) signal its surprise by bursting. This loud signal goes to the thalamus and it does NOT do tonic mode - it has a burst mode response all of its own. This acts to gate more of the confusing signal to the cortex for processing. It takes energy to learn something new. This process has been referred to as the Spotlight Of Attention.
The last little bit of function that I want to point out is memory. There is no one spot for memory - it is everywhere in little spots on dendrites in the cells in the mini-columns. Each area works to remember what it sees - what passes by it in whatever form it takes.
I have posted about all these aspects and how they fit together before.
This is a more detailed bit on memory:
This is a more detailed bit on the thalamus and how to make HTM retrieve stored memories. The memories stay distributed and local but they are retrieved as a whole:
Ok, i come back from a new account (i am Predictor).
I finally completly understood all of that. But a thing is still making me in trouble. I imagine i have my auditory cortex who is organised in tonotopy structure, i imagine a sequence where first, the lower frequency is activated, then in a second time the higher frequency gets activated, so as my minicolumns are in my cortical columns each others from opposite side thus the first pattern is not able to predict the second cause of space between the lower and higher frequency due to tonotopy, unless hierarchy continues until we get a relatively little range of minicolumns taking in the initial entire tonotopy surface (after some spacial pooling), but is there such thing in real brain ?
As you may know - even though the right and left visual fields are completely separated and travel to the cortex in separated streams - there are fiber connections that join them in a highly regular patterns in the early cortex to allow steropsis.
In the ear much of the processing is done in older structures in the brain stem and presented in the cortex in a more digested form. I do understand the tonotopic topographic map (look here for more on topograpic maps in general) but I think that more information is being presented at the same time throughout the auditory cortex. Note the “crazy quilting” in these maps (figure 2) that places different frequencies close to each other throughout the auditory cortex. I think that this is again an example of bringing samples of signals to the small local areas to be sensed and registered for different properties.
This sensorimotor integration part of HTM that includes the common cortical circuit and object recognition is so new, you’re basically seeing it being researched in real time. There is nothing using it yet. I want people to take these concepts and try to build things. I think there’s a huge potential in robotics for HTM, but it is a hard problem. We are trying to build things out as we understand them, and we’re right behind the latest neuroscience. It’s a hard task, but important. That’s why we are so transparent.