Sensorimotor Importance to Vision with Precise Timing

This is my first post so I’d first just like to say thank you very much to Jeff Hawkins and all the researchers in HTM theory for everything you are doing and especially being so open with your work - its really quite amazing.

I’m interested in vision, how it works and how we might best create basic artificial vision systems. Specifically, I’ve been lately looking at how the eye’s motion supports vision perception and creating artificial systems based on this - Optomechtronics.

I have watched many of the short videos on the sensorimotor hypothesis and have been thinking about how vision relates to it. While I’ve seen vision mentioned in your discussions, it seems you prefer to reference the ideas to touch and I can understand why. But it also gets me thinking about the many analogies of vision to touch (and the differences).

Skin has an uneven distribution of sensor density – fingertips have a lot, less so elsewhere and we have to explore with fingertips to get a high acuity measurement. The Retina’s foveal region are like your finger tips with much high densities of photoreceptors than the periphery. We explore visually through movement of the eye. The eye has the rods and cones so that’s a difference but one that doesn’t seem that significant.

What follows is my understanding of how it works

Saccades are mostly conscious rapid movements (e.g to words in a sentence or faces in a scene) to re-position your foveal region prior to a fixation.

During fixation, your eye performs the subconscious relatively slow drift movement which is so that your photoreceptors can generate output due to the change in light flux as the fixation point in the scene is scanned across the fovea. Microsaccades occur to reposition the fovea back to the fixation point so that another drift can be performed – perhaps with a different velocity vector to transduce a different aspect of the scene. Horizontal drifts transduce mostly vertical edge information and vertical drifts transduce mostly horizontal edge information etc. Question – If this is true, is there a feedback mechanism that influences the drift direction (do the neurons ask the eye for a particular type of information)?

I’ve read that the V1 neurons are inhibited during the fast eye motions (saccades and microsaccades) so that to me implies a feedback. This suppression during those movements also helps separate eye induced motion from scene induced motion. This and other functions like recognition (supporting ocular motion feedback and anticipatory excitation) must all happen at a very low level in the layer hierarchy which is why I was excited to hear the same type of hypothesis in the videos. There is also the ocular micro tremor (OMT) that occurs during motion somewhere around 80 Hz I think – and I’ve read that this is likely just an artifact of the mechanical control of the eye motion and not important in vision itself.

So, I am very interested in learning about HTM, the available implemented models and how I can apply it with my setup. I can simulate most eye motions precisely and generate simulated spikes (responses) from photoreceptors – I perhaps can discuss this setup at a later date but it’s not that important to the topic.

Each response from each photoreceptor has associated with it precise timing information. This timing is fairly important, especially when considered in relation to the movement velocity (speed and direction) and even more so when aggregated with precisely timed information from neighboring photoreceptors. Higher contrast transitions generate responses earlier and more often than lower contrast transitions. And so the relative timing of the responses encodes important information about the scene. I think with just one short drift motion, a very accurate representation of the scene is captured in a very sparse representation due in large part to the precise timing and sensorimotor boosting; aided by the anticipatory response that occurs due to scene familiarity etc.

I’ll be very interested if/when timing will be added to the HTM but until then I have a lot to learn anyway. If anyone has come across or knows of good resources for these topics, I’d really appreciate hearing.

If anyone has made it down to this far – then thank you for reading and I would love to hear your thoughts. Even if they are thoughts like – that’s completely crazy, or that’s completely obvious! Because I’m not well educated on these matters but have just become very interested. My education is in Systems/Cybernetics started at a machine learning company back in the 80’s and have been a Systems and Software engineer mostly in electro-optics – hence my interest in vision.

Hoping to retire in the next few years and make this my hobby….


@jhawkins Could you explain what precise timing should look like? I also have a few questions.

My first question is how matrix cells encode timing (ramping activity, more/less cells activating, or maybe different cells active at each moment?) My second question is whether the goal is to track time since an input or until an expected input.

I’m also wondering what are the differences in precise timing for sensory input versus precise timing for behavior, if you don’t mind commenting about that.

If you haven’t watched it, Jeff Hawkins talks about precise timing in this video:

I have notes on some of the sources I’ll list in this (messy) google doc:

I don’t know of any sources specifically about precise timing, so you might have to incorporate information from multiple sources and do some hypothesizing. A possible starting point is researching the thalamus, especially matrix cells and higher order thalamus.

For precise timing related to sensory input, I don’t know of any good sources.

If you are interested in precise timing related to behavior, these articles could help:
“Anticipatory activity in the human thalamus is predictive of reaction times” (Nikulin et al.)

“Simultaneous Top-down Modulation of the Primary Somatosensory Cortex and Thalamic Nuclei during Active Tactile Discrimination” (Pais-Vieira et al.) This might seem irrelevant to precise timing, but precise timing for behavior might be a predictive signal because behavior is usually planned tens or hundreds of milliseconds in advance, sometimes much longer.

Presaccadic predictive activity:
“Division of labor in frontal eye field neurons during presaccadic remapping of visual receptive fields” (Shin and Sommer, 2012) (Also mentions saccadic suppression.)

“The time course of perisaccadic receptive field shifts in the lateral intraparietal area of the money” (Kusunoki and Goldberg, 2002)

“What the brain stem tells the frontal cortex. I. Oculomotor signals sent from superior colliculus to frontal eye field via mediodorsal thalamus” (Sommer and Wurtz, 2004)

“Neurons in the monkey superior colliculus predict the visual result of impending saccadic eye movements” (Walker, Fitzgibbon, and Goldberg, 1995)

Saccadic suppression:
“Thalamic pathways for active vision” (Wurtz et al., 2011, a review)

Separating behavior-induced visual movement from actual object movement:
Unfortunately, I only know of sources about touch, mostly about the rodent whisking system. I can give you a quick overview of the whisking system, if you want.

“Feedforward motor information enhances somatosensory responses and sharpens angular tuning of rat S1 barrel cortex neurons” (Khateb, Jackie Schiller, and Yitzhak Schiller, 2017)

“Vibrissa Self-Motion and Touch Are Reliably Encoded along the Same Somatosensory Pathway from Brainstem through Thalamus” (Moore et al., 2014)

“A disinhibitory circuit mediates motor integration in the somatosensory cortex” (Lee et al., 2013)

“Active sensation: insights from the rodent vibrissa sensorimotor system” (Kleinfeld, Ahissar, and Diamond, 2006, an opinion article)

“Reducing the Uncertainty: Gating of Peripheral Inputs by Zona Incerta” (Trageser and Keller, 2004)

Some random things I’ve read which might be relevant:
Saccadic suppression is partially because saccades produce blurred images. Presaccadic predictive remapping (which I think is the anticipatory excitation you mention) occurs in V1, but weakly, perhaps just because of small receptive fields.

Moved from #htm-theory:tangential-theories into #htm-theory.

@Casey - Thanks for all the references! Lots to unpack.

I think the best single paper that I have read that sums up how I think about the sensorimotor aspects of the eye is

“The unsteady eye: an information-processing stage, not a bug” by Michele Rucci and Jonathan D Victor

Summed up in one sentence from the paper
"Thus, ocular drift can be regarded as an operator that transforms space into time."

In my field of electro-optics, people want to hold the pixel steady on the subject and integrate (measure) then readout. The eye’s way is much more efficient by only detecting change, it’s super efficient - not sending redundant information (kind of compressive sensing) and the scanning aspect supports supperresolution which seems to get us at least 30% more acuity than geometry would suggest (200 urad vs 300 urad).

I think the precise timing is important to this whole process though. I would surmise that without it would be more complicated to determine gradient slopes.

From the paper, a drift movement is typically 5-10 photoreceptors and 35 Hz. The speed seems like it would be related to the responsiveness of the cones (although is it a bit on the fast side?) but 5-10 pixel movement seems like a lot unless they are communicating locally to identify features right there in the retina or something equivalent to time-delay-integrate (like we do in the scanning sensor world to enhance SNR). I would need to do a lot of research since I’m not well versed in the neuroscience - but maybe some of you can help ?

1 Like

In the retina, there are neurons besides rods and cones, so I think they do some sort of feature identification. It seems pretty complex, so you might want to find more theoretical articles, but the wikipedia article for the retina could be a starting point.

If you have any questions, I’d like to try to answer them. If you haven’t yet, I would learn about sparse distributed representations, because for me that can be a very useful context for understanding neuroscience details.

One of the key discoveries in our sensory motor inference model is that each patch of sensory data is paired with an allocentric location (relative to the object). This only became obvious when thinkng about touch because with touch each finger literally rests on different locations on the object, and therefore for a patch of S1 to predict the sensory input, it must know the location on the object. Also fingers move independently, that tells us that the location signal has to be determined locally to each part of S1. We believe the same principles are occurring with vision, say in V1, but these properties are less obvious. In vision, all patches of the retina move in unison and no patch of the retina actually "rests’ on the object. As far as I know, our proposal is novel in the vision world.

“Conscious” might not be the best word to use here. I think what you mean is that saccades are not random, where drifts and micro saccades might be in random directions. Most scientists would say you are not conscious of your saccades. When you look at something, your eyes saccade several times a second and you are not aware (aka conscious) of these movements. The world seems stable and you can’t perceive your eye movements. Our new SMI theory proposes an explanation of how this stability is created. There is a debate as to whether drifts and microsaccades are performing some function or are just noise and correction. I don’t know the current status of this debate, but our theory is consistent with either approach.

I recommend a recent book titled “Your brain is a time machine”. In it, the author talks about various mechanisms that have been proposed for timing and several of them are consistent/could be used with my proposed matrix cell to L1 proposal. My best guess is that some events start a timing cascade, and when the next event occurs, the SDR for the second event learns how much time has expired since the last event. I believe the same timing mechanism is used for both inference and motor behavior.

We currently don’t see a need for precise timing for saccades. If you do, please share.


Hi Jeff, where can I find information about this?


Hmmm, sorry about that. I have talked about this idea (matrix cells encoding timing) for years. I even briefly mentioned it in On Intelligence, but I don’t recall ever documenting it elsewhere. I was looking for a mechanism for precise timing of sequences. I figured it had to be centrally located (not local to each patch of cortex) because we are able to speed up and slow down entire sequences, not just individual elements. The matrix cells in the thalamus have the correct anatomy for this, and the function of the matrix cells was/is totally unknown. Other than that I have no further evidence.

Don’t worry Jeff please, thanks a lot for all the information you and the people from Numenta put at our disposal!
Following your approach has been really rewarding.

I certainly can’t say anything definitive but I think precise time or precise relative timing would be useful in interpreting spatial frequencies and contrast. Not with saccades but with the slower fixational eye movements - (e.g. drift) which seems to me to be the likely important scanning function for stimulating photoreceptors.

Miniature Eye Movements Enhance Fine Spatial Detail

Contrast sensitivity reveals an oculomotor strategy for temporally encoding space

Here is a very nice lecture that I think expounds a theory of vision (also included in the paper links) that is very compatible with Sensorimotor ideas about perception (applied specifically to vision) - Actual lecture starts about 2.5 minutes into the video.

Bodian Seminar: Michele Rucci


some top-down musing: if we think of the human body as a single cell, then the
skin is our cellular membrane and all the external senses formed on the surface
of the skin. In other words, all the external sensors are variations of touch—they
are all different kinds of pressure sensors. Perhaps, the cortex is processing
for pressure differentials—and that is why the cortex is everywhere the same (?).

1 Like