Encoder for convert image to binary matrix

hi every one. i read “the HTM Spatial Pooler-A neocortical Algorithm for online Sparse Distributed coding” paper, in this paper the author use some data set like MNIST and …
I like to use one of the images of MNIST to get it to HTM and SP algorithm.but my problem is that image has real value ,but the input of SP in HTM should be binary,what should I do? can you introduce a good encoder to convert image matrix to binary matrix?

Thresholding is one way to do it. It’s a pretty common practice for MNIST, actually.
For example, if a pixel has the value of below 0.5, make it 0. If not, 1.
But if you want to go all the way, you could represent a pixel by N number of bits.
If N = 4, you could represent a pixel by 0011(< 0.33), 0110(< 0.66), or 1100.
This is actually equal to having a scalar encoder for each pixel.

1 Like

Here is some old example code that should help you.

Hi @shiva, you may want to view the htm.core project, as it has a working MINST example ready to go. After you build, you can run the example at build/Release/bin/mnist_sp.

1 Like

thanks a lot
but can you explain how can i decode the image after threshold?(i mean is there any way to create the image from binary matrix again?)

please help me to decode this again

can you say me from which function i should use to decode the image again?

I’m not sure you can decode the image without using a standard ML classifier.

can you explain more?
I want to get image to HTM Spatial pooling so i need to encode image and create SDR, then I want decode the SDR to image,is there any way?
please help me,please

The explanation overlaps something I just posted at here.

Theoretically, encoding is a one-way process. Think about your brain. Encodings come in, but we don’t share our neural codes outside of our own brains to communicate. We developed language to do this.

1 Like

This might not be as effective but regression could be a simple solution.
I think linear regression would be sufficient for a simple problem like MNIST.
(i.e. each pixel in a decoded image is a weighted sum of an output SDR of SP.)

:pray: :pray: :pray:

thanks :+1:

There is a way to take images to and from sparse representations. If the encoding process utilizes a filter bank to obtain the k-best matches to a specific subset of the image, then that portion of the image can be reconstructed (approximately) by reconstructing the patch from a linear combination of the activated filters. All that would be needed is to find a way to store the filter response coefficients (e.g. the scalar coefficients).

One possibility for storing the scalars would be to use a grid-cell-like module to store the scalar response. This has the advantage of allowing the possibility of each of the extracted features (filters) to be manipulated independently to obtain new combinations not previously observed, or to recognize previously observed features (objects) under slightly different observing conditions.


FYI, since the MNIST dataset is fairly trivial (binary, simple images) it is possible to input it directly to SP, ie. activate synapses on pixels with, say "1"s.

For encoding realistic visual stimuly, humans have “pre-processing” in the retina. Similar algorithm is implemented for HTM. It should be able to process images or you could encode video.

1 Like

I’m sort of curious what it might mean to convert a picture to a binary matrix.

It is already a collection of pixel values, with the meaning coded in relative position of brightness values.

Are you converting it to some sort of thumbnail?

Are you making any attempt at all to extract and preserve the spatial relationship(s) that is encoded in the picture?

When your eye saccades over an image you extract relative locations and primitives as a basket of features that stand for objects and relationships between objects. I have some trouble seeing how that would be encoded into a universal binary matrix.

1 Like

This is a very good point. The encoder we have so far is more low-level and imho only models the nerve signal at a single saccade.
It is based on cv2.bioinspired.Retina which models peripheral and foveal vision, sending spatio-temporal signals.

As you mention, and we have it in TODO too, is the saccadic movements and actually “seeing at a bigger picture” where we need to describe an object as a (relative) set of features, ie “2 eyes, a nose, mouth → face”.

It is interesting to note that modern “deeplearning vision models” have converged to this from the practical point of view - the networks train feature extraction and location module (draws bounding box where the feature is) at the same time.

My understanding, or simplified model on how saccades could be implemented (in HTM) is a “cropping problem”. Say we train on MNIST, but then use digits on larger background.
A simple model would

  • crop a portion of the image
  • ask SP “do you recognize this?” (classify SP’s SDR output)
  • (randomly) repeate until found.
    That is the “significant points” recognition part.

The other part would be

  • describing an object as a set of these features (simply union of SDRs?)
  • encoding relative positions between the features (can GridCell encoder do that?), this is what CapsuleNets do.
1 Like

@breznak as Jeff mentioned recently, we can put Magno-Spikes into Gridcells and Parvo-spikes into L4. For both parvo and magno, spikes are the position of the active pixels in the images. According to Numenta research, Gridcells Module has 2 inputs: ones from sensor (like Parvo spikes), and one from motor (as single 2D displacement data like position translation from the last hand position to the current position).

It is interesting to understand how can this Gridcells Modul processes the Magno-spikes because the Number and the position of spikes Is different from frame to frame?

Can other HTM Theoretiker help us?

1 Like