Hi everyone, I have been a lurker for a while now and reading many of the questions, but now signing up to post my question as I have been working towards my own implementation of Sparse Vectors.I am trying to encode images to enable an SDR to recognize , say cats and dogs - a canonical problem in machine vision.
I followed the guidelines in the [encoding data for htm] (Encoding Data for HTM Systems) and encoding the pixel levels in each of the Blue,Green,Red Channels with w=50 and n=1000. Here is the python code for generate the feature vector: cv2.resize(image, (32,32).flatten()
I trained it on cats and dogs image samples and as suggested in the paper, v = int(pixel/256*1000) -> value bit
I set the value bit -> value + w bit and get recommended sparsity (<5%),
To make predictions I compare a new image against by or ing against the SDRs and taking a Jaccard i.e. and_bits.count(1)/or_bits.count(1), where and_bits = query_sdr & ref_sdr
however the prediction results are less than thrilling.
Am I doing this correctly? I seem to have followed the instructions, but no dice
Can anyone help?
Would it be fair to say your data is not temporal in nature? If so, are you using only the spatial pooler SDR as opposed to the sequence memory SDR? Sequence memory representations are unlikely to be useful for computer-vision-style single-image classification where there is no meaningful time dimension.
If the data is not temporal in nature, HTM may not be quite the right fit for this problem, because HTM is about modelling temporal sequences.
Assuming a sequential version of this problem such as video analysis however, the encoding also strikes me as an issue. I’m assuming you’re doing this according to the following section in the paper:
"8. Encoding Multiple Values
Some applications require multiple values to be
encoded for a single HTM model. The separate values
can be encoded on their own and then concatenated to
form the combined encoding."
If you have 32x32x3 feature vectors where each feature is encoded by 1000 bits, that’s a 3072000-bit input vector? If so, that is probably far too large an input space to learn to classify high level images like dogs and cats, unless you have millions of training samples.
My work involves images, and I’ve found the encoding to be the most important step. The problem with images is that they’re so high-dimensional, the system would need an enormous amount of training data to learn anything useful. So I usually encode images by preprocessing with a standard sparse coding mechanism, like a bank of Gabor filters.
Thanks @jakebruce, I was concerned about the lakc of temporality. But noticed that nuPic Vision had static image encoders. However, it might be possible to add temporality by rotating the image and saving the encodings therein.
Yes indeed I am following the above approach of “extending” bitarrays to form a combined encoding. Thanks for the suggestion of using a Gabor bank, have you tried using any of the convolution approaches that are in vogue now?
Are you able to share any of your approaches (code/papers) so us newbies can learn?
Thanks again for responding!
I agree with Matt; temporality is the single relevant question.
But just to answer your question about convolution: yes. The right way to do image classification right now is convolutional neural networks. You can use a pre-trained network like VGG-net and get very nice feature vectors by looking at the intermediate layer representations, and you can binarize these and/or take the top 2% which makes a very good sparse encoding. Gabor filters are just the simplest version of this (the lowest layer of a CNN usually learns Gabor-like filters).
This is not limited to HTM, but these features do work very well as an image encoder for HTM.