Encoder for convert image to binary matrix

shiva · December 29, 2019, 2:52pm

hi every one. i read “the HTM Spatial Pooler-A neocortical Algorithm for online Sparse Distributed coding” paper, in this paper the author use some data set like MNIST and …
I like to use one of the images of MNIST to get it to HTM and SP algorithm.but my problem is that image has real value ,but the input of SP in HTM should be binary,what should I do? can you introduce a good encoder to convert image matrix to binary matrix?

hsgo · December 29, 2019, 3:41pm

Thresholding is one way to do it. It’s a pretty common practice for MNIST, actually.
For example, if a pixel has the value of below 0.5, make it 0. If not, 1.
But if you want to go all the way, you could represent a pixel by N number of bits.
If N = 4, you could represent a pixel by 0011(< 0.33), 0110(< 0.66), or 1100.
This is actually equal to having a scalar encoder for each pixel.

rhyolight · December 29, 2019, 4:03pm

Here is some old example code that should help you.

brev · December 30, 2019, 1:32am

Hi @shiva, you may want to view the htm.core project, as it has a working MINST example ready to go. After you build, you can run the example at build/Release/bin/mnist_sp.

shiva · December 30, 2019, 4:07pm

thanks a lot
but can you explain how can i decode the image after threshold?(i mean is there any way to create the image from binary matrix again?)

shiva · December 30, 2019, 4:08pm

thanks
please help me to decode this again

shiva · December 30, 2019, 4:09pm

thanks
can you say me from which function i should use to decode the image again?

rhyolight · December 30, 2019, 4:14pm

I’m not sure you can decode the image without using a standard ML classifier.

shiva · December 30, 2019, 4:55pm

can you explain more?
I want to get image to HTM Spatial pooling so i need to encode image and create SDR, then I want decode the SDR to image,is there any way?
please help me,please

rhyolight · December 30, 2019, 5:25pm

The explanation overlaps something I just posted at here.

Theoretically, encoding is a one-way process. Think about your brain. Encodings come in, but we don’t share our neural codes outside of our own brains to communicate. We developed language to do this.

hsgo · December 31, 2019, 6:20am

This might not be as effective but regression could be a simple solution.
I think linear regression would be sufficient for a simple problem like MNIST.
(i.e. each pixel in a decoded image is a weighted sum of an output SDR of SP.)

shiva · December 31, 2019, 11:29am

shiva · December 31, 2019, 11:29am

thanks

CollinsEM · January 1, 2020, 2:34pm

There is a way to take images to and from sparse representations. If the encoding process utilizes a filter bank to obtain the k-best matches to a specific subset of the image, then that portion of the image can be reconstructed (approximately) by reconstructing the patch from a linear combination of the activated filters. All that would be needed is to find a way to store the filter response coefficients (e.g. the scalar coefficients).

One possibility for storing the scalars would be to use a grid-cell-like module to store the scalar response. This has the advantage of allowing the possibility of each of the extracted features (filters) to be manipulated independently to obtain new combinations not previously observed, or to recognize previously observed features (objects) under slightly different observing conditions.

breznak · August 6, 2020, 9:07pm

FYI, since the MNIST dataset is fairly trivial (binary, simple images) it is possible to input it directly to SP, ie. activate synapses on pixels with, say "1"s.

For encoding realistic visual stimuly, humans have “pre-processing” in the retina. Similar algorithm is implemented for HTM. It should be able to process images or you could encode video.

Bitking · August 6, 2020, 11:26pm

I’m sort of curious what it might mean to convert a picture to a binary matrix.

It is already a collection of pixel values, with the meaning coded in relative position of brightness values.

Are you converting it to some sort of thumbnail?

Are you making any attempt at all to extract and preserve the spatial relationship(s) that is encoded in the picture?

When your eye saccades over an image you extract relative locations and primitives as a basket of features that stand for objects and relationships between objects. I have some trouble seeing how that would be encoded into a universal binary matrix.

breznak · August 7, 2020, 7:16am

This is a very good point. The encoder we have so far is more low-level and imho only models the nerve signal at a single saccade.
It is based on cv2.bioinspired.Retina which models peripheral and foveal vision, sending spatio-temporal signals.

As you mention, and we have it in TODO too, is the saccadic movements and actually “seeing at a bigger picture” where we need to describe an object as a (relative) set of features, ie “2 eyes, a nose, mouth → face”.

It is interesting to note that modern “deeplearning vision models” have converged to this from the practical point of view - the networks train feature extraction and location module (draws bounding box where the feature is) at the same time.

My understanding, or simplified model on how saccades could be implemented (in HTM) is a “cropping problem”. Say we train on MNIST, but then use digits on larger background.
A simple model would

crop a portion of the image
ask SP “do you recognize this?” (classify SP’s SDR output)
(randomly) repeate until found.
That is the “significant points” recognition part.

The other part would be

describing an object as a set of these features (simply union of SDRs?)
encoding relative positions between the features (can GridCell encoder do that?), this is what CapsuleNets do.

thanh-binh.to · August 7, 2020, 10:30am

@breznak as Jeff mentioned recently, we can put Magno-Spikes into Gridcells and Parvo-spikes into L4. For both parvo and magno, spikes are the position of the active pixels in the images. According to Numenta research, Gridcells Module has 2 inputs: ones from sensor (like Parvo spikes), and one from motor (as single 2D displacement data like position translation from the last hand position to the current position).

It is interesting to understand how can this Gridcells Modul processes the Magno-spikes because the Number and the position of spikes Is different from frame to frame?

Can other HTM Theoretiker help us?
Thanks

Topic		Replies	Views
Why do we need binary representation(encoder) and can we directly not create SDRs? NuPIC sequence-memory , spatial-pooling , encoders	5	700	June 23, 2020
Encoder and Spatial Pooler Confusion Getting Started	17	896	April 5, 2019
How to encode images and other visual data for HTM system Engineering	5	943	July 14, 2020
Squeezing more from HTM Engineering	12	743	June 22, 2019
The HTM Spatial Pooler: a neocortical algorithm for online sparse distributed coding Related Papers	49	4500	November 25, 2019

Encoder for convert image to binary matrix

Related topics