VISION, by David Marr
Published posthumously, this tome defined the genre of computational neuroscience.
I’ve read through it and I’d like to summarize and implement some of these ideas.
Although it was intimidating before I started, I found it quite enjoyable to read.
The book begins by defining the purpose of the visual system, and Marr relates everything to this purpose: “Vision is the process of discovering from images what is present in the world, and where it is.”
Each section of the book states a problem to be solved and then analyses it at several levels. First Marr looks at the computational theory: what is the goal to be accomplished and how to achieve it at an algorithmic level. Then he looks at what information needs to be represented and how the inputs are transformed into the outputs. Finally he looks at how the process can be realized physically, either by a computer or by the brain.
Inside of the Retina
The first stages of visual processing happens inside of the retina. The retina detects light and immediately applies mexican-hat shaped filters to it. The filtered image is transmitted to the brain, not the raw light intensities. The choice of mexican-hat filter is well justified, both with theory and with biology. This filter responds to variations in the input, but not areas of constant intensity or linear gradients. Also it is only sensitive to input features that have a similar size as the filter. There are at least 4 four different sizes of mexican-hat shaped filters which are sized at powers of two of each other, so the retina can detect features over a broad range of scales.
These transformations preserve almost all of the incoming visual information and it is possible to mostly reconstruct the original image from the filtered outputs. The notable exception is that while the relative differences between pixels are preserved, the absolute magnitude of the image’s light intensity is lost.
To demonstrate these transforms, I will apply them to this test image:
Converted to greyscale and with the mexican-hat filters applied:
Colors are processed by taking the difference between the color channels before applying the mexican-hat filters. The retina subtracts (red - green) and (blue - yellow). Here is a false-color representation of the result:
Finally, the retina takes the derivative of the filtered greyscale image. This will be useful later for detecting motion.
For a more recent and in-depth review of the retina’s biology see: http://www.pnas.org/cgi/doi/10.1073/pnas.1011782107 . However, it’s worth reading VISION first because that review does not attempt to explain the computations of the retina.