Encoder for convert image to binary matrix

This is a very good point. The encoder we have so far is more low-level and imho only models the nerve signal at a single saccade.
It is based on cv2.bioinspired.Retina which models peripheral and foveal vision, sending spatio-temporal signals.

As you mention, and we have it in TODO too, is the saccadic movements and actually “seeing at a bigger picture” where we need to describe an object as a (relative) set of features, ie “2 eyes, a nose, mouth → face”.

It is interesting to note that modern “deeplearning vision models” have converged to this from the practical point of view - the networks train feature extraction and location module (draws bounding box where the feature is) at the same time.

My understanding, or simplified model on how saccades could be implemented (in HTM) is a “cropping problem”. Say we train on MNIST, but then use digits on larger background.
A simple model would

  • crop a portion of the image
  • ask SP “do you recognize this?” (classify SP’s SDR output)
  • (randomly) repeate until found.
    That is the “significant points” recognition part.

The other part would be

  • describing an object as a set of these features (simply union of SDRs?)
  • encoding relative positions between the features (can GridCell encoder do that?), this is what CapsuleNets do.
1 Like