How does ImageSensor work?

I had a general question on how ImageSensor (in nupic.vision) works. Would appreciate a somewhat high-level (going into algorithm is fine too) of what it does with the images it takes in and before feeding to SP.

1 Like

I really don’t know, but I think @scott does. But he’s out of town at cosyne so I would give him a few days to respond if you don’t mind.

The ImageSensor has documentation here:

There are two main concepts: explorers and filters. Explorers control how an image or section of an image is selected as input each step. The filters process the image in order to preprocess the image (e.g. apply a gabor filter). The actual loading of the image is done through commands to the region that specify a directory in which images are location. See the MNIST example doing that here:

So the ImageSensory doesn’t really enforce any particular algorithm or processing. Instead, it delegates image loading and cropping to “explorers” and preprocessing to “filters.” Let me know if you have any other questions or if the code documentation isn’t clear.

Thanks for responding.

I guess I understand the exploring part that views the images in sections and applies filters to those (similar to how CNNs work). I’m more curious now how the output of this preprocessing will look like that gets fed into SP.
I think I can intuitively guess that, for a given feature (like an edge), a filter will be there to capture such, and turn ON a specific bit at index i. But will it turn on multiple bits at different locations if it saw the same feature over exploration? If so, how will SP ‘know’ this belongs to that specific feature unless number of cells is very large and able to capture all the combination of feature + position + etc. ?

The output of the ImageSensor region will depend on the filters. So some will have more or less active bits depending on what combination you use. In the MNIST example I sent, I’m not sure off the top of my head what the output will look like but you could run it with a break or print statement to see. I think you can get essentially a black/white image where the black pixels are 1s. If you use a gabor filter then you’d presumably have 1s just at edges.

That’s interesting. So for the MNIST example, will it be correct to say it’s almost a direct 1-to-1 translation, as in, each black pixel results in a different active bit at the output? Which code do I have to look at to see this translation take place? So far I looked through some Explorers and Filter codes as well as ImageSensor.

I realize the MNIST example is just using Flash exploration, but for case more like a sweep, how will each section get fed into SP? Will each section be separate iterations that get fed into SP (so 5 feeds if numIterations = 5) or will it be a single feed that concatenates the five together? Would you mind guiding me to the code that does this as well?

Yes, I believe so but I’d have to run it to validate.

It will be separate iterations as you describe. There is a holdFor option in the explorer, for instance, that will result in each image being output multiple iterations.

The FlashExplorer code is here and you can see the specification for explorers in the base class which has decent documentation. Basically, the ImageSensor loads the images and tells the explorer about the data set and then relies on the explorer to determine the position in the image. You can look at the whole ImageSensor.compute function which is the top level region entry point to see how it uses the explorer.

It’s a pretty complex region and I think the top level documentation for the ImageSensor is the best place to start and you can look at the individual explorer and filter classes to see what each of them do.

1 Like