Does Nupic have a wavelet encoder?

A friend recommended me look at the work of Douglas Greer and his theory of the cortex (http://gmanif.com/) and one of his ideas is that in the cortex, an image is represented as wavelets.
(Wavelets are like a Fourier transform, but instead of being a cosine wave that stretches out to infinity, they might look more like a finite squiggle. You can fit them at various points in an image, then stretch them to a wider shape and then fit them again, and you do several times and collect various coefficients).
So I wondered if anyone has created a wavelet encoder that feeds into the spatial pooler.
One possible advantage of this is that people have found ways to instantly recognize a rotated image using wavelets (see: Hilbert-wavelet transform for recognition of image rotation - https://www.researchgate.net/profile/Khan_Iftekharuddin/publication/268359931_Hilbert-wavelet_transform_for_recognition_of_image_rotation/links/54fb1e6c0cf2040df21d95c5.pdf)
So there are some operations in wavelet space that are easier to do than in regular space. The idea that wavelets are used is not that odd, if you think of the retina, you have off-center surround arrangements where if the center cell is off, and a circle of cells around is on, then a neuron fires. That’s like a 2 dimensional wavelet. Anway, the basic question is again, has someone written a wavelet encoder?

2 Likes

I have not heard of anyone doing this, but it sounds like an interesting idea.

It’s one of the vision encoding methods I mentioned here. Usually referred to as “Gabor filters” in computer vision.

I’ve tried it, by doing a winner-take-all over Gabor filters for every local patch in a sliding window over the image using standard OpenCV tools (and it’s trivial to roll your own).

It works okay. Doing the same thing with winner-take-all over ConvNet features works better in my experience (on place recognition from video). ConvNet features are similar to wavelets/Gabors, except they’re learned end-to-end on challenging natural image tasks, and are inherently hierarchical if you want a richer multi-scale image encoding than a set of local descriptors would allow.

I didn’t interface any of this with NuPIC I’m afraid, so I can’t help you there. But it would be a good thing to have in the pool of community encoders if someone wanted to write one. It was pretty easy, you just convolve the image with N filters, then report the filter ID with the highest activation at every pixel, and there’s your SDR.

2 Likes

When you say ConvNet, are you referring to a specific implementation of convolutional networks?

I was using a convolutional network called ResNet (trained on ImageNet and fine-tuned on a place recognition dataset called Places365) but I don’t think the specific architecture matters too much.

I really do not like to mix unpick with ConvNet because HTM and DL are totally different. As alternativ you can find in my comment at „encoding vision in HTM“ or another interesting implementation named Sparsey. What do you think,@jakebruce

Sure. But DL is a very powerful technique for learning useful feedforward representations, so why not use it? I’m less interested in purity and biological plausibility than I am in building a system that actually works.

Also, we’re talking about encoders here. Biological encoders were sculpted by evolution over millions of years, they’re not learned from scratch from the data. Might as well shortcut that with backpropagation if you can, imo.

I also don’t think deep learning and HTM are really that different except that HTM adopts (understandably) some very restrictive biological constraints, but you can find my arguments about that elsewhere on the forum.

@jakebruce understood. Could you please describe me more your CNN layers you want to use (number of layers, kernel etc.)?

I did most of my experiments with ResNet-50 and ResNet-152, trained on ImageNet and fine-tuned on Places365. You can find the pretrained weights online. You can decide which features you want to use, but I found (and this is anecdotal) that the middle conv layers were most discriminative for my problem.