Can HTM explain this optical illusion?

What strikes me interesting about this optical illusion is (1) that the interpretation shifts from circles to rectangles back and forth about every 3-6 seconds and (2) that the interpretation tends to be spatially homogeneous; at least I have a hard time to perceive of rectangles in some regions and circles in other regions at the same time.

Observation (1) can likely be attributed to a prior for temporal change in the human brain which has perhaps evolved to avoid brains from getting stuck in addictive simple states (Dark Room problem) in which model predictive rewards are large due to the simplicity of the sensory inputs. At the same time this prior encourages exploration: It allows to get out of local optima during inference (and perhaps during thinking in general).

Observation (2) seems to suggest that the brain infers two features separately: One feature for a repeating structure (in this case a regular grid); and one for the content of the repeating structure (circle vs rectangle). It seems top-down feedback then also sharpens features for the respective interpretation across the field of view.

I was wondering whether HTM might be able to explain these phenomena, or alternatively whether there is something that HTM could learn them.

1 Like

A possible explanation for (2) is that density of cones in peripheral regions is lower than in central ones; it wouldn’t seem unlikely that peripheral regions have thus weaker bottom-up visual signals and therefore rely more on top-down predictions/expectations rather visual input to infer what lies in front of us.

1 Like

A possible explanation for (1) is that first, the problem solving priority is “what does this aggregate of white, grey and black lines represent?”.
Once a solution is found (say, “circles”) the next question is: “what does the area between the circles represent?”.
The answer to the second question is: rectangles. However this answer increases expectations of rectangles and decreases the expectations of circles, and thus we only see circles.
And so on.

You mention a very good point that the periphery is more constructive and it makes up most of the field of view, so it is more likely that the vast majority of the field of view is strongly determined by top-down feedback. But that does not explain why the top-down feedback tends to be the same everywhere.

As for (1), there are no parts of the image that either of the interpretations leave uninterpreted, so this seems to be an unlikely explanation.

I find it unlikely that “I’m seeing rectangles lying on a sea of weird lines of different widths and colors” can be considered as a full interpretation. Unless something completely fills a pattern we already saw, our brain is always on the look for additional meaning.

Let me give you an example of why top-down feedback tends to be the same everywhere.

You are standing in your house. Your eyes are closed. Something brushes your left foot. Your foot alone is unable to understand what touched it. With your eyes still closed, you touch this something with your left hand: its hairy and it moves. It’s your dog. Now something brushes your right foot.
You expect it to be your dog, don’t you? Your foot will perceive “my dog is standing against my right foot”.

Why did this happen? Because the “sensory neurons” belonging to your left foot, the ones belonging to your right foot, and the ones belonging to your hands all share the same areas, at some point higher in the hierarchy. This is necessary, in order for extrasensorial expectations to modulate sensorial perception (I need to be able to know, if I am in a jungle, that a sound of a cracked wood might be made by a lion, even if I don’t see any big fur).

That matches my intuition as well. All low-level feature detectors share the same top of the hierarchy. But this feedback is not very strict. With some meditation I can force myself to briefly see a rectangle at the top left and a circle at the bottom right, so ultimately this aspect is probably not easy to explain because it can be subject to complex emergent behavior and attention.

Not sure whether recognition of spatial repetitions have something to do with object tracking, or whether it also simply emerges as a pattern detector from a solution to the universal prediction task.

When I see circles, my mind comes up with a figure/ground interpretation where the foreground consists of circles with a continuous texture of vertical lines and the background consists of a continuous texture of horizontal lines. Interestingly, the figure/ground interpretation is itself bistable: the circles can either be holes or elevations. This set of interpretations explains all of the image, but it is not stable (at least for me and everyone who I have presented this optical illusion to).

You make a good point with object tracking and attention.

Let’s take the example in the image below.

Seeing the faces completely explains the image; and yet, full stability isn’t achieved.
I was therefore wrong in my previous hypothesis.

You are probably on something with attention: when we focus on the faces we see the faces, when we focus on the chalice we see the chalice.
What makes us shift attention, though? Why can’t we focus indefinitely on an interpretation?

I would go with an evolutionary hypothesis: being able to entertain bistability, to shift to a second interpretation when a first one is available, is one of the hallmarks of creativity. An individual who is creative clearly has an advantage over the one who isn’t.

What is the underlying neurological mechanism?
It would be interesting to set an experiment in which, somehow, the eyegaze is fixed on a point (the faces, for example). Is it necessary to wait for an involuntary eye movement, in which the focus falls on the chalice, to switch interpretation? (in such case: are saccades predictors for creativity?)

Moved from #htm-theory:neuroscience into #other-topics:community-lounge.