I think converting from images to SDRs isn't a straightforward thing.
First, an important property of SDRs is
Semantically similar data should result in SDRs with overlapping active bits (Scott Purdy, Encoding Data for HTM Systems)
However, semantic similarity is very ambiguous in 2D image data sets.
Second, the high-dimensional space of images is enormous and most of the space is occupied by random images that are meaningless to human, which makes the SDR generation not as trivial as numbers->SDRs or dates->SDRs.
There are some approaches can do what you want: convolutional neural networks family, and embedding methods such as t-SNE, etc. Each approach by itself can well handle MNIST.
Based on my limited knowledge in neuroscience, generation of SDRs from retina input may not happen in visual cortex or neocortex, There's a region called LGN (lateral geniculate nucleus) on the pathway from retina to the lower layers of visual cortex. The LGN has hierarchical structure similar to deep neural networks, and may be responsible for generating some (sparse?) visual representations.