Toward An Unsupervised, Incremental, Streaming and One-Shot Visual Class-Based Learning and Recognition System with Hierarchical Temporal Memory Theory

Hello community,

I wanted to share with you some HTM experiments I did for a class project this semester. In all reality, I whipped this entire thing up together in less than 2 weeks so its not super pretty or groundbreaking but the results are interesting.

Those who are comfortably familiar with HTM already can skip the background section.

This is the beginning of ongoing personal research I plan to continue: specifically the effort to design a biologically plausible encoder that encapsulates the functional properties of the human visual pathway i.e. everything that happens before any signal carrying visual information from the retina ever reaches cortex including the functionality of the lateral geniculate nucleus in the Thalamus and related structures.

You can find the PDF here: http://brodykutt.com/Brody_Kutt_CSCI-731_Project

EDIT: The report is no longer at the above link. You can now find the report along with the code here.

5 Likes

very interesting.
I am interested in your results. Do you have any demo video? Thanks

Hello Brody, I have been studying your well written PDF. In it I noticed this:

We could go a step further as shown in the next possible encoding below.

  1 = 1 1 1     2 = 0 1 1     3 = 0 0 1     4 = 0 0 0
      0 0 0         1 0 0         1 1 0         1 1 1

For a hexagonally arranged cortical sheet why is this not instead true?

  0 = 1 1 0     1 = 1 1 1     2 = 0 1 1     3 = 0 0 1     4 = 0 0 0     5 = 1 0 0
      0 0 1         0 0 0         1 0 0         1 1 0         1 1 1         0 1 1

Including wrap-around bits maintains 6 unique states. It’s how I have been modeling possible network behavior from hexagonally connected neighbors.

The often square 2 axis arrays of HTM are still a mystery to me. It seems more biologically accurate to always maintain the 3 axis network geometry. In that case there are six possible states in your examples. The neighbor to neighbor signal bits are then circularly arranged, and will wrap-around as they would when a 6 bit binary counter is “rotated” left or right.

I am curious to see how this deals with distributed representation?

It sounds like that question might be for me. I thought about the kind of representation each column would have, and ironically it’s the ideal thing for extracting angular motion based information. With a through a straw sized view a bit pattern can be seen “rotating” around itself as in the example I gave where bits will periodically wrap around:

110001
111000
011100
001110
000111
100011
110001
111000
011100
001110

When there is a small amount of rotational “jitter” coming to a rest:

011100
001110
011100
001110
011100
011100
001110
001110
001110
001110
001110

There are still predictions being made. Difference is that each column is able to predict when it’s at the center of a rotating pattern, or going off to a side where the pattern widens out then goes from 000000 to 111111 at the same frequency and bit pattern as before. From what is moving in the picture and their relative location in it: column cells gain an overall view indicating size, spin rate, and all else there are motion related words for including “bounce”.

Certainly, there’s nothing stopping you from including wrap-around in this situation to bring the number of represented states to 5. That example was meant solely for illustration of a concept and not a suggested encoding scheme. I should make a note actually that there is assumed no wrap-around.

I found meaningful results only after setting up a topological organization of column’s potential synapses. When each column’s potential pool of connections spanned the entire input space, the results were junk. In the input to the spatial pooler (the binary pixel data), the bits have relative spatial semantics. That’s to be contrasted with something like a random distributed scalar encoder where input states are randomly distributed across the whole space. The encoder of course does not have to be designed this way, one could imagine even just doing a deterministic scrambling of bit locations to each state equally to distribute patterns across the whole space. In short though, when the encoder output has embedded this kind of relative spatial semantics of bits and their locations in the input space, it appears topologically organized columns do a much better job of making sense of it.

No demo video (although it’d be cool to make one) but I can share the code with you if you like.

@bkutt Fine, please share your code with me. I want to test it with the virtual retina. Thx

Thanks for the detail. Your explanation is so close to what I use hexagonally and circularly I was not sure whether your example extends to all six possible rotational states for any bit width pattern. In what I’m used to the process of finding a center angle causes a sparsing of information down to normally one central bit or position in between, from the original 2 or more bit signal pattern. Exception is occasionally more than one direction is possible, depending on current heading. When I saw three bits in a row the two sides were immediately redundant. It’s then in a form that takes it to six possible one bit states anyway, which you started with.

You’ll maybe be thankful to know that my having to ask the question is related to a serious limitation to the models being simulated using the Emergent platform, where the 2 axis squareness and oversimplification most shows, in a Reddit thread where I presented a new paper that might be worth trying to model from as a “put this into layman’s terms” challenge for all who are willing. This exactly:

https://www.sciencedirect.com/science/article/pii/S1074742717301971

To explain how the model I have works there is a link back to my Numenta forum topic and wherever that has since lead. It could otherwise seem like the only path to a detailed neurological model is through Emergent, when Numenta is so on a mission to put the whole cortical sheet together as one fully connected model:

https://www.reddit.com/r/neuro/comments/8dxdrf/would_it_be_possible_for_someone_to_put_this_into/

I’m not sure whether I should start a new topic for this or not, but all help making sense of the confusing looking connection diagram below is appreciated. It seems to be HTM friendly next step towards a model for the entire cortical sheet, where all may look puzzling right now but it will make sense when we better understand how the system all together works. Or I hope so.

3 Likes

Very nice work @bkutt! Would be great to see larger scale experiments now that you have the framework in place. Also, I’m curious if you can keep one-shot learning and still get good performance on bigger datasets.

2 Likes

Definitely…once I develop a more sophisticated and biologically plausible visual encoding scheme, I want to bump up the breadth of testing. I’ll certainly keep the community updated as my research progresses.

Hi Brody,

I was trying to download the pdf from the link: http://brodykutt.com/Brody_Kutt_CSCI-731_Project
Unfortunately, that link is not working properly, is there any way that I could get a copy of the document?

Thanks
Ilia

Right! Sorry for not updating this. That old link is dead. You can now find the report along with the code here.

1 Like

Hi Brody,

I appreciate your help. I am looking forward to read the paper(s) that come out of your thesis: http://scholarworks.rit.edu/theses/9797/

Best Regards,
Ilia

2 Likes