Prototype of Stability Mechanism for Viewpoint Invariance


The cortex is able to recognize distinct objects from a continuous stream of sensory input. Existing theory describes how sensory features are processed and associated with objects, but does not explain how the brain determines where one object ends and the next begins. This paper proposes an unsupervised method of learning to recognize objects and their boundaries. The method is implemented, analyzed and tested on artificially generated text, where it appears to work.


As long as we are on the topic:
Spatial relationships between contours impact rapid scene classification


Content-specific activity in frontoparietal and default-mode networks during prior-guided visual perception


The paper: “Content-specific activity in frontoparietal and default-mode networks during prior-guided visual perception” is interesting and agrees with my findings. To quote the abstract: “We observed that prior knowledge significantly impacted neural representations in the FPN and DMN, rendering responses to individual visual images more distinct from each other, and more similar to the image-specific prior.” I’m pretty sure that my model can explain this evidence. My explanation is that the brain is assuming that every two consecutive sensations could be part of the same object, so it biases their representations towards having more overlap. If the two sensations really are part of the same object then those overlaps will be reinforced, otherwise they will be slowly unlearned.

I don’t really see how the paper “Spatial relationships between contours impact rapid scene classification” relates to this.


I saw the part about scene edges as a feature that naturally emerges from the scene digestion process.

What I see added is the comparison of the translation of a feature boundary independent of the spatial junctions positions.

It looks like you are focused on the boundaries and not the junctions.
This raises the question: do junctions add anything to this work?


The way I see it is that both the lines and the line intersections are sensory features, so they are processed by the input layer spatial pooler in the same manner. Then the output layer spatial pooler sees all of these features and builds something akin to a bag-of-features model of the object. So it makes sense that the junctions aide in recognition. After looking at the example scenes in that paper, it looks like an exercise in seeing how abstract a scene can be, and how many sensory features can be removed, before recognition fails. This is an interesting topic, which I don’t have any concrete answers for.

Also, in that paper the participants were shown the images for 53 ms, which isn’t long enough to move an eye around an image. This means that no cortical areas actually crossed a boundary in the image, other than to and from the blank screen shown before and after the image. The boundaries which I speak of are boundaries in time, when an object enters or exits the field of view of a cortical area.


If you know a place on a thing then you also know what thing. You could take a grid cell locating system and specialize it to knowing where on an object a selected point is (exactly where you are looking.) And that would kind of entrain information about the object.
Location: Map of apple, upper left hand part.
Neat idea, I didn’t think of that before.