Nupic/HTM for shipping target recognition

If I want to use the HTM algorithm for ship target recognition and classification, what literature can I refer to and the specific approach should be? Thank you for your answer

Hi @xxxxxc, could you describe your data more or provide an example possibly?

My data comes from images taken by drones. My task is to use the HTM algorithm to accurately identify and even classify targets such as ships or cars in remote sensing images. Do you have any good ideas?

Why do you consider using HTM for that?

Because I am interested in the HTM algorithm and would like to give it a try, is it not suitable?

1 Like

Unless something escapes me, it is probably not the best choice for image recognition.
“TM” in HTM stands for temporal memory, which means its main purpose is time series prediction.
I am aware only of an example on using HTM’s Spatial Pooler (used as an intermediate image encoder) for MNIST digit recognition with results significantly less impressive than what a relatively lightweight CNN would get.


That’s awesome, I can definitely relate and agree that the HTM alg is very worth being interested in. But I have to agree with @cezar_t that HTM probably isn’t the best choice for this purpose.

HTM really excels at learning sequential patterns, where the signal is contained in transitions over time. I think this generally isn’t true with image data, where there usually isn’t an inherent notion of time. That’s not to say there’s no way an HTM-based approach could do well, just that it isn’t the kind of scenario that HTM is well tested on.


It is possible to turn an image recognition problem into a temporal sequence that might be better suited for HTM by restricting your receptive field to one or more small patches and then saccading over the image. Use the detected features at each location to activate specific minicolumns. Then saccade in either a fixed pattern or in a manner that efficiently acquires samples needed to disambiguate the classification.

There’s another option that I’ve been wanting to try out but haven’t found the time for yet. There’s a technique in computational topology known as persistent homology, where you start with a tight focus on a small region of the input and then slowly expand the region while keeping track of how some statistical properties of the region are changing. This tends to produce a somewhat unique fingerprint for each sampled location that looks very much like a time series dataset.


So you did tried the above option using saccades?

So I’ve spent a pretty good deal of time thinking about how to practically apply HTM to create a vision system, and I think the biggest hurdle currently is finding an effective way to preprocess the image data such that the HTM modules can actually converge on learning to predict specific features at locations, given dynamic scale, rotation, and translation. I’m currently working to replicate the Numenta experiment with stable object representations in L2 deriving from feature representations at locations from L4. I don’t think it’s a very large step from there to actually having it learn unsupervised. Once we can get the system learning to recognize stable representations at different levels of the hierarchy, then I think repurposing the system for vision is likely just a modality change that will just require a novel preprocessing algorithm to extract all the relevant features for the L4 sections to use.

Unfortunately, I think this task is likely to be pretty difficult, as the neurons that process the information coming into your retina are extremely complex, and there seem to be a lot of different neurons doing a lot of different things. I don’t think we’ll need ALL of those functions to replicate a vision stream that can at least be learned from, but the salience of our agent will be limited by the dimensionality of their input streams. Finding a good balance between performance and good granularity of information will take some considerable time.


I have a partial implementation that does the saccading over a region populated with different colored MNIST digits. At the moment, the minicolumn encoder (i.e. proximal synapses) are simply using Gabor Filters. I got distracted by other projects before I could finish the temporal memory and motor feedback. I had the demo posted online a while back. Let me see if I can find somewhere to repost it.