Does the HTM model have a solution to the binding problem?



Thank you for checking into this.
Now I will have to find versions without paywalls.
It still gives me some starting points to dig further.


For a nub. What is binding problem?



The biggest problem with “the binding problem” is that it is hard to define. Definitions are more philosophical than scientific.

When I said that HTM was moving in the right direction, it is because I can see how objects can be learned in completely unique spaces while being composed of sub-objects, which I believe is related.


As Matt says - there are multiple aspects to the binding problem.

Speaking simply - how do you combine what this cell over here is seeing or hearing or feeling with other cells to form a larger object and how does the object recognition in different sensory modalities combine to make a “beeping red round thing” and a “quiet green square thing” at two different places at the same time. Each has different shape, color, and sound and over the space of a second or two - may occupy the same space in the visual cortex in sequence.

In the real world you are surrounded with huge numbers of things as you move around and your eyes dart around at the same time. The cortex is bombarded with a flood is bits and pieces of an ever-changing kaleidoscope of sensation. Somehow you both segment these bits into individual objects and combine partial information into discreet objects. All at the same time.

“We” do know that the cells in the EC (entorhinal cortex) signals our location in the environment. It is suspected that we also signal the place of things in the environment in the EC. How does all this “multidimensional thing recognition” get combined and make it’s way to the EC to be signaled? How does the brain keep track of all these discreet objects at the same time without arrays, lists, or any of the usual tools we use in computer programming?

The only tool That I have worked out to date is the combination of columns into a hexagonal grid pattern to signal the extents of a recognized object. The advantage of this over nearest neighbor voting is that is extends the reach of column binding with a spacing of 8 or so columns apart and suppresses the surrounding cells so that you get a very sparse representation very quickly. This space covering hexagonal grid system has much the same conceptual advantage as “nodes of Ranvier” where the geometrical properties enhance the functional speed and span.

I have a fuzzy picture in my mind of the various modality projections into the association areas all overlaying each other, being sampled by the local SDRs in that area, and the hexagonal grid system combining the output from those SDRs sampling different modalities into extended objects.

I am slowly working to model this to see if it really is practical. This is a very large project and is taking a good long time to get the various parts up and running. Since I don’t have anything to show at this time I am continuing to see what other people have to avoid wasting time re-inventing the wheel.


my notes.
The Binding Problem:

i see similarity at 3.09 minutes into this video.
Semantic Folding::

Thinking out loud.

In both video, A encoder or the activation of a detect NN is routed to SDR bit.

There would be a detector NN for the circle, white field, blue field, and various straight
lines. Each detector mapped to certain location.

Second video they would be a detector NN for the words, organ, Bach, and other

In machine learning data is regenerated form the activation of SDR bits, is done
with a GAN, generative adversarial network. The GAN can generate a compressed
representation used for memory, and or also do a motor movement.

The regenerated compressed data could be in the format of something like a
descriptor table could be used.
There would also be sub descriptor within the main descriptor for letters of world or sub feature in a face.

The mind can not see the SDR, it is in the dark, but it can see the regenerated data.

We can Factoring out variables form a equation in a algebra, features can be factored out from a descriptor table.

So now every thing is target From large object to all the way down to small sub
What can be done with all these features?
Add them or remove them to make a pattern come into existence?

Or use a link list algorithm. NN work great for a link list algorithm. But must use
source data to make target data . They take input data and generate output data. And
they must be auto trained. Just select source and target, done.
I think there is a function in the brain that dose this.
Like for example. I think of my car and out come a image or direction to my night stand by my bed were the key are. Or as in algebra i think of “x” and get out a six.



I have a hypothesis of how the binding problem is solved:

This hypothesis is based on the two papers:

  • Hawkins, Ahmad, Cui, 2017, “Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World”
  • Emilio Kropff and Alessandro Treves, 2018, “The emergence of grid cells: intelligent design or just adaptation?”

The binding problem is solved by the output layer’s (L2/3) spatial pooler. It accomplishes this by accumulating excitatory input over a significant period of time. This slow response to changes in excitement has a stabilizing effect on the mini-columns. It allows a layer 2/3 mini-column to remain active over several sensation of the same object, and it favors mini-columns which do remain active for several consecutive sensation over sporadically active mini-columns. These effects are then learned, and the mini-column then represent a large contiguous area of the input. This solves the binding problem because an area at the top of a cortical hierarchy should receive sensory input from all of the sensors, and hypothetically be able to represent objects as they move between different sensory areas.

The paper [Kropff and Treves, 2008] is the source of the mechanism which slows down the mini-columns response. The response exponentially approaches the input overlap. The equation for it is:
r(t) = r(t-1) + alpha * (InputOverlap - r(t-1))
Where t is time,
Where InputOverlap is the amount of excitatory input,
Where r(t) is the response, it’s used in the competition to activate,
Where alpha controls how fast mini-columns respond to changes in their input overlap.

I’m working on testing this hypothesis, I’ll tell you all when its a theory.