HTM Hackers' Hangout - Jul 6, 2018


It’s slightly easier, makes it Saturday lunchtime instead of Saturday morning, but Saturdays are generally tied up with parenting duties and sport. Evenings are the perfect time for me, which is the very early hours of the morning in PDT so not realistic. Both of the current timeslots will work a lot better for me in a couple of months when the soccer season ends.



Why is it so important to HTM theory that cup is recognized everywhere in the brain at once?

What is the driving force to diverge from virtually every other model of perception and push object recognition all the way to columns in V1?
And to virtually every sensory column in the brain. How do the columns in the small of my back learn all the coffee cups and all the other things I can recognize?

If I am strongly right handed and use it for everything does that mean the columns that are connected to my left hand won’t recognize all the things my right hand has learned?


I think I know what he will say, but I will ask Jeff this tomorrow morning and get back to you.


High speed data bus in modern computers, for accessing memory, work at giga Hz,
or at may mega Hz. And only access few location in one read / write cycle.
Brain does not do multiplexing that well.
I would think a brains with a slow data bus would be highly paralyzed. Copies of sensory data would be sent to many running programs in the brain.

Then there would have to be program that would have to select which of these program will have the highest chance of generate the right

Like a caching / boosting algorithm.


One the plus side: every connection is permanent and direct. The transfer time for any given transfer is as fast as one neuron can signal another.
There is a well known fact that the brain goes from perception to output in as little as one hundred cell-2-cell communication steps.


A couple of observations which might help with understanding this view of hierarchy ( or it might help me learn where there are still gaps in my own understanding :wink: ).

The first observation is that a single HTM region is by itself a 2-level hierarchy. The representations in the output layer are more abstract than the representations in the input layer. This is the first piece of architecture which allows the lowest level to form abstractions.

Imagine a hierarchy of three regions separated by Spatial Poolers. For the sake of simplicity, I’ll depict each region as two layers, input and output (obviously HTM theory currently has more layers involved in SMI, but these are the important two conceptually for communicating my point)


You can see that the transition from the input layer to the output layer within the same region is actually the logical boundary between hierarchical levels (not the transition from the output layer of one region to the input layer of another region). The SP algorithm is designed to fix sparsity while preserving semantics, not to increase abstraction.

The second observation is that the representations in each output layer (depending on the temporal pooling algorithm chosen) should be able to integrate more details into fewer consolidated representations the more frequently a particular pattern is encountered. This means, when a new complex concept is encountered, it might initially require hooking into more levels of hierarchy to reach a single stable representation for the overall concept. As the object is more frequently encountered, the abstraction should be able to push further down the hierarchy.

If you apply this to sequence memory and pooling (the same concept would apply to objects, but I’m lazy and it is easier to depict sequences), initially a particular sequence might require three hierarchical levels to form a single abstract representation. As it is encountered more frequently, some of the lower abstractions would start to merge, and may only require two hierarchical levels. Even more frequent encounters might push the abstraction down to the lowest level.


Combining these two points, one can theorize that the lowest levels should be capable of recognizing complex abstractions for familiar things which are encountered frequently.

Numenta turns attention to The Thalamus!

If no solution is found then the system will fall into a fetal state and position or with
a more experienced system will flee to safe place to think it over.
Chaos management.
This will take allot more then a hundred steps to train into the brain.


Even in On Intelligence, Jeff thought the classic view of hierarchy was correct, but after learning about grid cells and realizing how they might be used in the neocortex to model objects in allocentric spaces, we had to rethink the hierarchy to make it work. HTM theory posits that a location signal (via grid cells) exists for each cortical column representing some unique point in space. This assumption broke our classic idea of hierarchy if we continue to apply the Mountecastle model of a cortical column.

If we assume every cortical column is representing some feature at some location in an object’s space, this just doesn’t make sense in the classical view of hierarchy, because each region of the classical hierarchy is identifying different objects and composing them. We don’t think it works like that anymore. This new model of hierarchy explains a lot of things, including object composition by association. stay tuned for podcast

If there was a black box in front of you and I asked you to reach in with your hand and tell me what it was quickly, it would take you about 0.5 seconds to grab the coffee cup, recognize across your hand hierarchy at all levels that it matches coffee cup, and tell me right away.

However, if I asked you to reach in with your foot and tell me what object was inside, it would take you much longer, and you would probably find yourself touching and thinking “what was that feature, a rim? oh, that’s a cup then”. That’s because your foot hierarchy has never learned coffee cups. It knows only the world of feet, like socks and shoes, lint, sand, coins, and the occasional unfortunate wad of gum. That part of your brain has no models of coffee cups unless you teach it (like every other part of your brain).

So how did you recognize it with effort? Think about the hand and foot hierarchies as different structures that eventually converge together at the high levels. Your foot has no idea what a coffee cup feels like, so it passes simple features like rims and curved planes and smoothness up the hierarchy until it can match it across columns that link downward in their hierarchies to different sensory areas!

That is the 2nd essential point here, that cortical columns train each other as they learn. That’s how we think this learning transfer happens at all levels of the hierarchy. Just like this, at lower levels of the hand hierarchy your fingers train each other as they learn things. That is how you can touch something with one finger to learn it and recognize it with another finger, even on another hand. This lateral learning must be happening across cortical columns at all layers of the hierarchy.

To be clear, we don’t know exactly how this works yet. But we have thought about it a lot and are working on solving it.


Spatial pooling, I think, is happening within each column where sensory input is processed, but I am not sure it is happening to the feedforward input from lower regions.


True, it might not be necessary if representations in the output layer are sufficiently sparse. I was just not making any assumptions about the pooling algorithm used (in HTM research code, for example, there is a “Union Pooler” algorithm which uses 10% sparsity IIRC versus the typical 2%). Also keep in mind that the number of minicolumns in the input layer is likely less than the number of cells in the lower output layer (depending on the configuration of course), so that could be another reason to use a SP.


So to summarize what I am getting from the various answers:

  1. Columns and hierarchy start out as is usually proposed in most takes on brain theory.
  2. As learning progresses the internal model is proposed to be refined and pushed out toward the sensors. This has the possibility to reduce processing time and reduce the processing load in the association areas.
  3. Some of the process to push the learning down may include sampling relative position. This part is not backed by any proposed mechanism but a firmly held intuition that this could explain many observations.
  4. The position grid mechanism communicated by hex-grid signalling has been identified as a possible lead to working out this unknown mechanism as it has been observed to signal spatial position in the EC.
  5. This same hex-grid signaling has been observed in hubs of the various lobes, putting it closer to the sensory input streams, therefore the spatial signaling may also be available at these points.
  6. This position signal available at the lobe hubs may be pushed out via some non-hex-grid signaling format that is yet unidentified.
    Q: Why this last point?
    A: Because full-on hex-grid signaling has not been reported from the primary sensing areas. It may be there but I have not seen it documented anywhere; I have been looking for for this for a long time.

Am I getting this right?



Why is it important to change the size of the sensor array relative to the column size?

Why not change the size of the sensory array and leave columns at the same size?


I don’t want to put words in other peoples mouths but I believe that this is an attempt to move beyond the +/- 8 column reach of the dendrites of a single cell and allow binding of the features of an object over a larger area.

Binding is a central problem in neural network representations. How do we combine all these individual features sensed by individual columns into a larger object?

This is one of the main driving forces that lead me to Calvin’s hex-grid coding method as outlined in my “HTM to hex_grid” model. This model allows a unified object representation to be bound over a large area of a map using all local operations without resorting to the magic of the inter-map connections to do representation spreading. If tract bundles diverged as is sometimes proposed that would be a plausible method but study of real brains shows that they maintain loose topography as the tracts connect one map to the next.


What @Bitking said above, but also this matches what we see in biology. Higher regions in the hierarchy are still getting feedforward sensory input. Why? The classic hierarchy model had no good explanation of this. In this model it makes sense that larger columns at higher levels of the hierarchy will have larger receptive fields within the entire sensory array, which gives it the ability to see “the whole picture” while also sampling the details of the picture to confirm the feedforward input from lower regions are in sync with the object under scrutiny.

And it gives us a way to continue to apply Mountecastle’s big idea, which was that every column is performing essentially the same task. This model of hierarchy in some ways generalizes out the idea of scale from the problem of object representation. I’m not sure if I’m expressing that correctly.


I had you until #4 above. Maybe we are not thinking of grid cells in the same fashion here. At least we never use terms like “hex-grid signaling” at Numenta. We say “grid codes”, which are usually unique SDR-like representations of space (originating from grid cell modules), which can be used as a starting point for object definition giving continuing sensory movement within the object space.


I invite you to compare the hex-grid post to observed EC behavior without preconceptions and see if you see the tight correspondence that I do.

I see the information signaled and the coding method as separable.


I’m not sure we have as many preconceptions as you. To accept this hierarchy model, you only need to assume two things:

  1. grid-cell mechanisms can produce virtually limitless unique representations of space as SDRs
  2. transitions between these representations can also be represented in SDR

If you assume this, you pick a unique location in “the universe in your brain” to start defining an object. Then, through movement and sensory perception, build the model of the object in space using only movement vectors, or transitions, from one point to another.

Every cortical column produces unique representations of space. When you choose a random location to start defining a new object, you are doing that in tens or hundreds (thousands?) of cortical columns. Each one chooses a different unique location to start defining the object. They are all different. The only thing that is compared is the end result, the object representation, which is detached from the sensory input/location.

To make this idea work, we needed do scrap the old hierarchy model. When we started rethinking it, this made much more sense and opened up a lot more doors for us.


You are responding to the data being coded. We are in agreement here.

I am chasing after the coding method to represent this information. As I just said - I see these two things as separable.


Yes, I agree with that. But I also think that problem does not involve the hierarchy. I don’t think learning “concepts of space” necessarily requires hierarchy to occur. I haven’t talked to anyone about this, but why would you need hierarchy for grid cell behavior to emerge?


I don’t have a dog in the “grid cell behavior” fight.

I have been reading much literature on this as I run into it with google searches and checking the references in these papers and to the best of my knowledge - nobody really knows how the underlying spatial codes are generated. Could be local or it could be hierarchy but as I said - nobody knows yet.

There are many theories at this time and nobody has demonstrated unconditionally that the proposed model has verification in animal studies. I have not committed on this yet.

I do have to point out that there is a vast body of research that documents the connection patterns of the various maps and there does seem to be regular pathways if not hierarchies. Likewise - I have not seen any research that refutes the pioneering work of H&W as to the simple and complex cells as you move from V1 onwards. Likewise in the auditory pathway

Many studies have shown that when a spot of activity is generated at some area in the EC it tends to be a patch of hexagonal connected cells. These are thought to be attractor basins but even that is not fully confirmed.

A few studies have remarked on observing this same hexagonal coding in the various hubs of lobes so I am making the bold leap that the same coding method is at work in both places.