The claim is that the topology in HTM might be more about biological constraints and less about functionality. I’ll demonstrate why I think so more and more, especially after digging deep into SMI
Let’s first examine a case where topology works - in convolutional neural networks. Without going into too much detail. CNN works because of 3 key reasons
Local feature and featur representation are consistent globally (a line is a line no matter where it is)
Local feature groups are meaningful (elements close by in the input vector have relations)
After operation, the 2 previous statement is still true (you can stack convolutions)
Like CNN, topology in HTM tries to capture the localized information of an SDR. Then by stacking/connecting cortical columns together to form representation of objects. However in HTM the properties does not hold true. Let’s assume an visual pathway and we have an signal from the retna encoding a picture of a horse. The first problems arise right after we get to V1. Since every neuron have their own synapses and permeances. Any nicely encoded SDR will break into different representation. Neuron A will have a different representation of a tail vs neuron B. Breaking property 1 and 2 for V2. Thus no information sharing within V2 can happen since every neuron is dealing with different a representation. This happens for every layer in every column. Rendering topology useless, there’s no localness in the generated SDR.
Note: 1000 brains theory can solve this via movement. But still topology takes no functional rolw in processing.
If you restrict all processing to one level then I would agree with you.
I don’t see that this is a necessary restriction.
If the goal is not object recognition but instead, feature extraction, then this can be distributed over many levels. The SDR that forms later in the hierarchy can be based on a rich mix of these extracted features.
The difference between “stacking” and “connecting” cortical columns is the difference between hierarchical connections and lateral voting, which informs object pooling. If CCs are all emitting a representation to neighbors, they can use each other’s reps as contextual input to internal computation of feed forward input. It helps to think of attractor dynamics. In TBT, this lateral voting is not necessary to build an object model, but it helps a lot in practice.
Stacking a hierarchy is another story altogether. I’m still trying to figure it out. I think it involves thalamus and object abstraction, but that’s just me spit-balling on a Monday.
In any case, ask where topology matters? It matters a lot in the lower levels of hierarchy, but as you ascend, the topology gets mangled and mashed so that abstractions can be made (right @Bitking?)
Here’s my explanation for why topology works in HTM
I’ll consider topology to mean either of two things:
there are multiple local inhibition areas in a cortical column that are smaller than the entire column
there are multiple columns (with or without multiple inhibition areas) that are arranged topologically. (more relevant to thousand brains theory with vision)
Since HTM does not have sliding filters, this property will never be relevant for an HTM system that only looks at static images. It becomes relevant again when considering saccades, which will indirectly implement something analogous to sliding filters in CNNs.
this continues to be true.
since each area has a different set of synapses, global consistency is immediately broken on the first pass through the spatial pooler(s). This would be a problem if you intended to stack a conv layer on top of the HTM layer, because it’s sliding filters would no longer be meaningful. This is not a problem if you want to stack more HTM layers, however, because the next levels in the hierarchy deal with topology the same way (not with sliding filters, but with multiple independent areas)
CNNs are pretty clever and arguably take advantage of topology to a much greater extent than HTM (at least for static images) by sliding filters across the image (not really biological). That ensures that the features can be reused anywhere no matter where they are learned.
HTM systems should have an advantage over CNNs only when movement / saccades are enabled, and multiple topologically arranged columns are allowed to vote on each other’s representations.
With saccades, a single column’s synapses can be applied to any area of the image (like a sliding filter), and it’s neighboring columns play the role of parallel voters.
Neighboring columns will always have the same spatial relationship to each other, allowing them to learn the spatial relationship of features (like a successive CNN layer does to with the previous layer).
each column will have unique failure points, having learned independently on slightly different inputs. This means that together they will implement a robust model ensemble for more accurate voting.
Hi @marty1885 interesting thoughts. Does this really matter in the HTM context as an incidence detector at the bit level (input space)? I think the C in CNN is doing more of an extraction/filtering of “features” which is different from for example an SP.
UPDATE - While I agree that consistent representations are so important in an ML standpoint, the following is not always true.
Local feature and featur representation are consistent globally (a line is a line no matter where it is)
Local feature groups are meaningful (elements close by in the input vector have relations)
A CNN may not always be translation variant. For example the C’s output may not always be consistent (e.g. line moved in space) but it can be “corrected” on the next operation (max pooling).
I think that CNN and HTM is not a good comparison as they are both doing different operations, unless someone can show that convolution equation is equal to the an SP operation of course. Otherwise it’s not an apples to apples comparison.