Encoding vision for HTM

I will write the steps, first the set up steps which are performed at program start up:

  1. Assume the encoder will accept a grey-scale image with (M x N) pixels to encode into an SDR with dimensions (M x N x C) bits, where C is the number of bits in the output SDR for each each input pixel.
  2. Let BIN_CENTERS = Array[M x N x C]
    This array stores a (floating-point) number for each bit in the output SDR. Fill BIN_CENTERS with uniform random numbers in the same range as the pixel intensity (typically 0 to 255, but depends on the input data format).
  3. Let BIN_WIDTH = data-range * (1 - target-sparsity)
    Where data-range is the theoretical range of pixel intensities. For an 8 bit image this is 256, but it depends on the input data format.
    Where the target-sparsity is the desired fraction of zeros in the output SDR.
  4. Let BIN_LOWER_BOUNDS = BIN_CENTERS - BIN_WIDTH/2
    Let BIN_UPPER_BOUNDS = BIN_CENTERS + BIN_WIDTH/2
    The shapes of both of these arrays are (M x N x C). Together these arrays describe the ranges of input values which each output bit will be responsive to.

Steps to encode an image:

  1. Let IMAGE = Array[M x N], this is the input, it is real valued (aka floating point).
  2. Let SDR = Array[M x N x C], this is the output, it is boolean.
  3. Iterate through the SDR using the indexes (x, y, z), and set every bit of the SDR according to step 4.
  4. Let SDR[x, y, z] = BIN_LOWER_BOUNDS[x, y, z] <= IMAGE[x, y] and IMAGE[x, y] <= BIN_UPPER_BOUNDS[x, y, z].

To encode color images create separate encoders for each color channel. Then recombine the output SDRs into a single monolithic SDR by multiplying them together. Multiplication is equivalent to logical “and” in this situation. Notice that the combined SDR’s sparsity is the different; the fraction of bits which are active in the combined SDR is the product of the fraction of the bits which are active in all input SDRs. For example, to recreate the original posters example with 8 bits per pixel and a density of 1/8: create three encoders with 8 bits per pixel and a density of 1/2.

What follows is a discussion of this encoders semantic similarity properties. Semantic similarity happens when two inputs which are similar have similar SDR representations. This encoder design does two things to cause semantic similarity: (1) SDR bits are responsive to a range of input values, and (2) topology allows near by bits to represent similar things.

  1. Effects of thresholds:
    Many encoders apply thresholds to real valued input data which converts it into boolean output. In this encoder, the thresholds are ranges which are referred to as ‘bins’. A small change in the input value might cause some of the output bits to change and a large change in input value will change all of the bits in the output. How sensitive the output bits are to changes in the input value -the semantic similarity- is determined by the sizes of the bins. The sizes of the bins are in turn determined by the sparsity, as follows:
    Assume that the inputs are distributed in a uniform random way throughout the input range. The size of the bins then determines the probability that an input value will fall inside of a bin. This means that the sparsity is related to the size of the bins, which in turn means that the sparsity is related to the amount of semantic similarity. This may seem counter-intuitive but this same property holds true for all encoders which use bins to convert real numbers to discrete bits.

  2. Effects of topology:
    This encoder relies on topology in the input image, the idea that adjacent pixels in the input image are likely to show the same thing.
    2A) If an area of the output SDR can not represent a color because it did not generate the bins needed to, then it may still be near to a pixel which does represent the color. In the case where there are less than one active output bits per input pixel, multiple close together outputs can work together to represent the input.
    2B) If an output changes in response to a small change in input, then some semantic similarity is lost. Topology allows nearby outputs to represent the same thing, and since each output uses random bins they will not all change when the input reaches a single threshold.

1 Like