How do we describe a topology where two or more lower topologies merge? To humans, reality seems seamless, so there must be a clean slate where all the spatial rules of the merging topologies can average out. But perhaps these abstractions don’t need to adhere to any previous rules of topology.
(some context for this discussion can be found in a previous post)
I see this in a certain way that does not seem to be the Numenta official view - objects are formed/recognized late in the hierarchy out of clouds of features extracted in the lower levels.
There - I have come out and said it here in the lions den of object recognition in every part of the cortex. Now I expect that I will be beaten and banished to the hinterlands.
As the eyes dart around gathering snapshots of small details of the landscape a rich mix of perceptions trickle up to the temporal lobe to be assembled in to something we commonly call an object. If you consider the topology of these snapshots there is no consistent features that form an object at the lower levels. The eye plays a game of 20 questions as the features are scanned until object recognition is accomplished. All of these perception lay one on top of the next so at every level you have both straight pattern recognition and the transition from one pattern to the next. Some of this is parsed out with lateral voting, some with hierarchical voting.
I agree that the brain is doing the same thing in all sensory areas - every area has a temporally related stream of features that are snippets of some perceived object. At every level there is local recall of whatever has been learned at that level - a filter if you will - that is the analyzed features that is processed up the hierarchy, and the higher levels send back context to be used as part of the recognition task. In some cases this is the part of the parsed location signal.
At the highest levels we have object and exprience registration, but that is also the key to accessing the lower level perception - like a key in SQL databases - to call up the redness of an object. Yes, the redness resides in the part of the brain that perceives red.
I wouldn’t say we were so far apart. When observing the environment, an object must emerge from patterns in some part of the system first and spread like a fire through the rest. It makes sense to me that this spark could occur in the higher level where parts and spread from there. The TBT could still apply.
I try to apply the term “object” in an extremely general way. Perhaps a better word is “interaction”, I’m not sure. The common CC algorithm we’re looking for needs to accommodate both what and where areas of cognition. How “object” merges with “affordance” could be an “interaction” in associative areas, but it all boils down to the same computation in different contexts.
I’d also concede that we already know there are differences in cortical areas, but they are modifications on a common theme. We need the theme first, then when we can understand the local modifications we’ll really start to understand how everything fits together.
OR - the subcortical structures are following some pre-programming that takes the low resolution feed at that level and forces the eye to parse out the features of the visual primitives.
There are very few brain functions that can ignore the interactions with subcortical structures.
This seems to trip up AI researchers at all levels. Just think of the progress that DL could make if they realized how much functionality this adds to the cortex!
Several optical illusions mess with this low level parsing - the recent “two circles” comes to mind.
There is more processing going on in the eye other than just brightness.
In addition to rods and cones there does seem to be basic motion detection.
I know that there is some lateral voting but I have not read much beyond that.
Does HTM have any ideas about how the different channels from the retina (parvo, magno, konio, etc) feed in to the cortex? We know that they’re segregated, and it struck me as possibly useful for object recognition to have it this way. For instance, imagine there are two TM like layers that each receive input from one of these channels. The one that receives the magnocellular signal (which carries movement and luminance but no color) would receive inputs that are more consistent w.r.t. objects under different lighting conditions, and similar objects of different colors, making the generalization task easier.
One of the interesting properties of hex grids is that it is possible to distill topology from semantics. I’ve been working on a demonstration for this (slow progress due to competing interests), and will post a thread about it when it is ready.
This will be easier to explain when I finish the visualizations, but imagine each bit of a source map representing a random grid over a target map. Representations in this source map which share overlapping bits will naturally gravitate toward similar areas on the target map where those random grids have a lot of overlap. Thus you could take a series of encodings which have no topology (or in this case, multiple sources which have separate topologies to be merged), and generate a new topology from the semantics they encode.
There’s a lot of neuroscience literature on it. There are blobs of cytochrome oxidase (high metabolic activity) in L2/3 of V1. The spaces above/below the blobs are blob columns, and between them is called interblob columns. The P / M / K streams can connect to each of these in each layer/sub-layer differently. It’s super complicated and confusing, and not fully described yet.
There are similar cytochrome oxidase patterns elsewhere, like the barrels/septa in L4 of the whisker cortex, or the thick / thin stripes / inter-stripes in V2. They all seem related to separate processing steams.
I believe they are at separate levels of the cortico-thalamo-cortical feedforward pathway (hierarchy) in barrel cortex and maybe possibly mouse V1, so I think of them as separate regions like A1 and S1. The nice thing is there’s more information available about their interactions than there is for conventionally separate regions, because V1 is studied a lot and it’s easier to study just one region in detail.
I see this as similar to multi-sensory integration. It’s just combining different types of info from the same sensory organ rather than info from two different sensors. The koniocellular channel has some auditory and somatosensory responses, so it is multiple sensors I guess.
It might not be exactly the same as multi-sensory integration. E.g. for all I know they might share maps and attentional targets.
Maybe this involves temporal mechanisms and learning.
Whiskers don’t sense the world smoothly in the first place. The inputs are mapped the same way as the array of whiskers, but that’s different from a map of the space sensed by the whiskers. They move back and forth like 10 times a second, so without a smooth map, I think the sensory input would change too quickly to comprehend.
There are a couple studies related to making a smooth map of the space swept by the whiskers. They’re confusing and I think they’re too tangential to whatever the actual mechanisms are.
I’m glad you asked this question. Although I don’t have the answer, I have a similar question to error-driven learning in the 3VS paper. How on earth would an error is chemically and electrophysically calculated (diff of minus and plus phase) and propagated in a consistent and stable manner? It seems to me this operation if its biologically plausible would be very unstable and chaotic.