2D Object Recognition Project

I have been working on this project on Twitch today with @codeallthethingz and @Tachion.

We have an object schema defined in YAML, we have some object visualization code in JavaScript (which might change soon, but anyway). And we have some Python code with an example text environment and object loading (including tests!).

Please have a look if you are interested. We are keeping the Trello board updated as we go. See video here. Sorry it is 3 hours.

I made an enviroment to work with using pygame

the easy part is over, gonna work on movements and implementing some HTM’s to this


I talked about this spontaneously this morning. Here are the videos:


I’ll talk more about this Thursday on Twitch, but I am going to pause work on this project. Instead, I’m going to work on Building HTM Systems.

I’m doing this because we are getting a lot of new folks here wanting to know how Spatial Pooling works, how the TM works, how Encoding works, etc. I would love to work with you more experienced people on this 2D project, but I can see that the newer crowd needs this more. Having https://buildinghtm.systems fleshed out with a complete reference implementation could be huge for HTM adoption.

I promise I’ll be back, but I am going to put some effort into basic HTM reference documentation first.

You are all still welcome to work on this project in the meantime! In fact, if anyone where willing to live-stream work on this project on twitch I would support them by hosting their work on my channel.


I love the idea, but also think we are ready for what Jeff last week described for a cortical column.

Bitking found evidence of the required Speed variable and now have: place, body/head-direction, border/boundary, conjunctive, speed and grid cells to work with, which greatly simplifies the challenge of programming a self-exploring agent.

To show why the Objects=Places thinking is true while at the same time simplifying recognition problem a 2D cup would be drawn into a flatland world as a solid sphere with short projection out the side that it senses by bumping into. Optionally when cup is tilted top slice at rim creates an opening for it to get trapped inside for awhile where it then feels the shape of the inside too.

Letters of the alphabet can be used to place cups in a room but in the virtual world all solid objects are most simply places to physically touch the surface of, then only need one or more touch sensors. After that a (optionally sequentially presented) flatland view of the Numenta logo can be painted to the outer side of a cup surface and other objects including various sized ones on walls.

And direction/distance to objects.
Some people collect trading cards.
I think that I may have to collect papers on HC/EC cells specializations.


I just watched this video, cool vision!

One question about it:

I thought layer one was mostly informed by the hierarchy - that is regions above this region would communicate their expectation of future states (their prediction) for this region down to this region as a union of active cells in layer one. Are you suggesting that…

  1. that isn’t the case: the union is exclusively produced by the horizontal voting of regions? or …
  2. that is the case but the union is influenced heavily by horizontal voting? or…
  3. that is the case, and we’re simply not going to model that hierarchical aspect yet?



There is no hierarchy in this proposed model, which includes the Columns and Columns+ papers. These present an idea of just one level of the hierarchy. We are not saying hierarchy doesn’t exist, we are just not attempting to explain it.

So we are suggesting #3 above. :nerd_face:


Please like this post if you would watch someone else in the community doing a Twitch stream on this project while I’m busy elsewhere!


I listed what I expect to be the essential requirements needed for feeling the shape of a cup or other object. Looks like we more specifically need to start with a primary motor cortex column, for each hemisphere. To add eyes and other complex sensors: a primary somatosensory cortex column can later be added to derive these needed essentials. For right now vestibular and other signals can be taken from precise already calculated program variables used to draw into environment.

Sensory In
  Vestibular system
    Linear displacement, speed, most simply distance from previous location to current.
    Rotational displacement, most simply positive or negative change in angle since previous timestep.
    Bit that changes state when bumps or applies force against a solid. 
  Motor, main drive
    1 bit Forward and 1 bit Reverse interoceptive feedback, typically motor stall, must reverse out.   
    1 bit Left and 1 bit Right interoceptive feedback, typically motor stall, must turn other way.
    Optionally 4 motor bits (see below) and/or speed, or sequence of readings to recall unique routines.
Motor Out
  Motor, main drive
    1 bit Forward and 1 bit Reverse thrust through speed range. Subtract bits for +1,0,-1 shift direction.   
    1 bit Left and 1 bit Right thrust through (optional) speed range. Subtract bits for +1,0,-1 direction.
      Note: Bilateral columns both have only one possible motor direction, already (-1) oppose each other.

this is a very cool research project, I’ve been keeping an eye on this for a long time, but need to catch up with what you have here!

Just FYI, we have a grid cell encoder

for both c++,py. Comes also with nice visualizations.
Would be great if you could validate it for us and use it.
Cheers, breznak

1 Like

i have enhanced the agent with four sensors, environment with boundary checks etc…
Started PR7

First test result: Agent starts on some position and moves 5 times to the right, the UP sensor is encoded into SDR with category encoder and then put into Sensory layer as proximal input. Here is the result:
Note: SP has learning switched off
So it seems good :slight_smile:
With use of HTM.core i want to move this little bit further and observe what parts we are missing.


I have used the grid cell encoder output as the direct representation of the LL (Location layer) and wired up to the secondary distal input of the SL(Sensor layer).

I got this result:

And there is anomaly spike always on [7,4] when goes to the RIGHT
and on [4,4] when goes to the LEFT.

My first tought was that it is because of Repeating inputs problem so i call tm.reset() when agent comes to the same place where he started ( [3,4] ).

So then i got anomaly of 1.0 always the next step after the reset(). Is that expected?
When i ignore the 1.0 after each reset anomaly looks like that:
Anomaly always at [7,4] when going to the RIGHT.

The dimensions of the SL,LL and other parameters are really roughly set-up… any recommendations for the dimensions of the layers?

Also about the LL - now its just agent position encoded by GCE, but as i understand, it should be SP with GCE on its proximal input right? And i shouldn’t encode agent position but rather his movement. (incremental) and keep the actual position de facto inside the LL.

code is on this branch
Thanks for any help or recommendations


This is expected.

Regarding your other questions, I am sorry to put them off, but they deserve more attention than I can give them today. I will be less busy tomorrow and will get your PR running properly and respond to your questions. I like what you are doing, and I can see some experiments we might start running this way.


Hey @Zbysekz nice work! I ran your PR and reviewed test changes. Looks great! Thanks for hooking up htm.core. I merged your PR after updating spaces to tabs. Please give us another PR with your more recent work and I will review and run it as well.

1 Like

Ok thanks Matt, i know that you are busy with BHTM so i really appreciate it :slight_smile: Ok i will start the PR for the latest work, but about the formatting… i will use tabs instead of spaces, but i am not sure about the flake8 and black… i am using Spyder3 as python IDE and it seems that i can’t setup this. What are you using?


I added a requirements file and used pip inside anaconda instead of pipenv and that worked fine.

1 Like

Ok fine. But about the “black”? When i run it from cmd line, it formats the file with 4 spaces indent instead of tabs and it seems that this can’t be configured. I just want to be aligned with formatting to prevent any unneccesary future work. I personally don’t care if we use black, tabs or spaces.

I wish! This is low on my list right now. I’m still trying to keep up with Numenta research on Deep Learning applications. One of these days I’ll be back at building up docs. But I can’t ignore a dedicated community member helping me by doing exactly what I was asking for in my live streams and hooking up htm.core to an agent. You are the best!

I don’t know what “black” is… I thought there were just tabs and spaces? I don’t think I added that dependency… it was @codeallthethingz! :innocent: Will, do you remember why you added this package?

I don’t know what “black” is

black” is “the uncompromising Python code formatter”. It formats python code in a sensible way, without configuration options. All ‘black’-formatted code looks the same.

We use it at work to remove any questions about python format and style – we accept whatever ‘black’ returns. In our experience at my job, no one loves the format black uses, but no one hates any one part of it enough to complain.