Applying HTM to the Omniglot dataset

So I recently applied HTM to the one shot learning problem of the Omniglot challenge. Let’s just say it did not go too well.
Perhaps HTM is not well suited towards image data? Or maybe my implementation had some silly mistakes? Anyways, I thought it would be quite interesting.
I recently read a paper on one shot natural language processing using Semantic Hierarchical Temporal Memory (SHTM) and decided to give it a shot on this hard-to-solve dataset.
I may try and use a better implementation of the HTM for my next attempt.


It isn’t suited for images in its current form (far as I know). HTM has shown successful at learning sequential transitions, so it’s really picking up temporal features. This is usually done on numeric data of some kind. I’m not familiar with this Omniglot challenge, though I think applications with image data generally don’t involve the time dimension, relying instead on the rich spatial features contained in each individual image.

1 Like

Have you looked into the htm.core MNIST example? If I recall correctly it gets ~95% when you add the spatial pooler. But then, Omniglot looks like a much more diverse challenge with its multiple alphabets.

And yeah, Image->SDR transition is far from perfect:

def encode(data, out):
    encode the (image) data
    @param data - raw data
    @param out  - return SDR with encoded data
    out.dense = data >= np.mean(data) # convert greyscale image to binary B/W.
    #TODO improve. have a look in etc. For MNIST this is ok, for fashionMNIST in already loses too much information

Assigning np.mean(image_vector) to SDR.dense likely isn’t the best solution, as they mention, but it’s the one we have right now. It looks like you do something similar with your load_data() bitwise method:

def load_data(file_path):
    image = cv2.imread(file_path, 0)
    image = cv2.resize(image, (64, 64))
    image = cv2.bitwise_not(image)
    return image.reshape(-1)

def learn(self, dataset):
        '''Learning algorithm for one timestep'''
        for input_data in dataset:

You call reset_memory() with each new dataset, that makes sense since there’s no temporal element to the Omniglot challenge, I believe.

And yeah, we might be barking up the wrong tree entirely. For tasks that involve ‘slow’ image understanding, a superdense convnet can affort to take ~5 seconds on each image.

I’d imagine HTM would come in super handy for object recognition in video feeds, like YOLOnet does with convolutional tricks, because each potential bounding box/object classifier can be ‘informed’ by geometrical features of previous frames in the video.
A bird flaps its wings occasionally and ‘breaks’ the constant geometrical form, giving it a distinct pattern from, say, a drone (although angles might be interesting). Cars and bikes tend to move at different speeds. These are all things humans take into account when identifying things in our live video-feed.

Have you looked into toplogy at all? I’d imagine images are a big use case.

I haven’t looked too much into topology yet, but from the looks of it it seems like it may improve the performance drastically. From what I can infer, topology to HTM is synonymous to dense feed-forward networks to convolutional neural networks.