Visual receptive fields are characterised by their centre-surround organisation and are typically modelled by Difference-of-Gaussians (DoGs)1. The DoG captures the effect of surround modulation, where the central receptive field can be modulated by simultaneous stimulation of a surrounding area2–5. Although it is well-established that this centre-surround organisation is crucial for extracting spatial information from visual scenes, the underlying law binding the organisation has remained hidden. Indeed, previous studies have reported a wide range of size and gain ratios of the DoG used to model the receptive fields6–9. Here, we present an equation that describes a principle for receptive field organisation, and we demonstrate that functional Magnetic Resonance Imaging (fMRI) population Receptive Field (pRF) maps of human V1 adhere to this principle. We formulate and understand the equation through consideration of the concept of Direct-Current-free (DC-free) filtering from electrical engineering, and we show how this particular type of filtering effectively makes the DoG process frequencies of interest without misallocation of bandwidth to redundant frequencies. Taken together, our results reveal how this organisational principle enables the visual system to adapt its sampling strategy to optimally cover the stimulus-space relevant to the organism, restricted only by Heisenberg’s uncertainty principle that imposes a lower bound on the simultaneous precision in spatial position and frequency. Since surround modulation has been observed in all sensory modalities10, we expect these results will become a corner stone in our understanding of how biological systems in general achieve their high information processing capacity.
HTM models receptive fields as simply as possible. It would be interesting to see how applying more complese models of SP minicolumn proximal RFs or TM distal RFs might change how the HTM learns. I don’t think Numenta has done any recent research in this area. It is wide open for experimentation.
Lateral connections and local inhibition.
I think that these are key to bringing the behavior described in this paper into the models.
I have been saying that these are critical but have not spent much time explaining why. k-winner does the bit that Numenta needs for its model but misses these important biological properties.
It seems like they are (in my words) saying V1 is set up to extract speed information by bandpassing optical flow related retinal motion sensors that produce a Frequency Modulated spike train, and at the same time is sensitive to DC change caused by change in brightness from say getting closer or further away, can sense a slowly changing gradient. Video should begin where the retinal direction sensor is shown:
At 18:43 into the video Peter Schiller said this that in my opinion the Thousand Brains Theory predicted necessary, for column(s) to sense motion of and help motor the eyes/straw(s) around in a way that they can all get a good look at the wider scene they are each seeing:
So this [retina directional circuit] is a fairly simple model of creating direction selectivity by virtue of inhibitory interneurons. So that then was also examined in the cortex and was found that there too inhibitory interneurons play a central role when direction specificity is created in a cortical cell.
From my perspective Numenta went beyond typical “research in this area” by supplying visionary thinking the emerging branch of science cognitive biology needed right here, right now to help make sense of all the already existing research.
Speed/Optical flow. Texture. Stereo/disparity. Movement. Edges & Orientation. Color.
There is a lot going on in the early visual processing.
This is not a new area of study.
In 1982 this book is full of analysis of the items I just listed: https://link.springer.com/book/10.1007/978-3-642-68888-1
As far as motion and position, this paper has a lot to say about that:
I can now see that! And from what I read some animals essentially only have a V1 area.
The wonderful part is that as with the David Heiserman beta class machine intelligence I most experimented with: it does not matter what kind of sensor is included in the RAM address, the system adapts to whatever it has to work with for sensors. It’s then anything works, just connect all you have straight in and you’re done.
With the number of bits of RAM being limited it was best to compress/encode sparse data like edge detection from one eye facet/field to another to the fewest bits possible. Normally after 16 or so address bits most of RAM goes unused due to even that having sparseness. But a memory system that loves the sparsity would not need any of that. Even one cortical column per brain hemisphere should work, for something on the scale of a simple virtual critter like mine.