2D Object Recognition Project

I see a tangential conversation emerging, so I’m going to pull this back a little.

In this experiment, there will be no real agency (at least not initially). And what I mean by agency is that the agent is causal to the next action taken. For the agent to have an ability to influence the next movement, HTM theory says we must introduce displacement cells.

So I’m putting this off as much as possible. But we should be able to prove out some simple form of object classification via random or scripted movements through the environment space and identify collections of features as objects without agency and without displacements.

BTW I’m going to talk in detail about this project spec at tomorrow at HTM Hackers' Hangout - Mar 1, 2019. I’m hoping to clear up some confusion (my own included). You are all free to join, but I know the timing is bad for those of you in Asia (sorry!). But the video will be available afterwards for anyone to watch, so I will do a complete review of the project in my head and you all can post questions / comments here on this thread.

EDIT: I should also note that I updated drawings in my posts above.

1 Like

OpenAI has released an pretty fun 2D environment. Might be worth trying after we finish the first two phase of this project.


Thanks @Falco for joining. Here’s my code.

I’ll take a swing at this too.


If you keep the same hours, I can make it every day except Tuesdays. But I can always watch later of course.

Thanks for doing this.

1 Like

As suggested, I spent the majority of my work day today working on defining the object schema for experiments in this 2D object space.

(or YouTube if you prefer)

The relevant material starts at about 1:30. Here is the Trello board and here is the code.

Also: the result.


I watched your stream last night and this morning. Best episode so far I think. :+1:

Here are some suggestions, (just for consideration):

  • This app you’re writing is really an editor, so you could name it 2D Object Editor. And a 2D object is flat, so maybe… FlatEditor? Or you could call it Flatland, in hommage to Edwin A Abbott. (People will definitely want to find out what you’re doing in Flatland).

  • In the header, you might consider adding the number of different features in the object after the dimensions. For some tests you might want to select only objects from your library with a specific amount or range of features.

  • If you want to make your objects form-agnostic, why not just number your feature types (0 -> 255) in the csv file and have your editor display whatever symbols (A, B, :heavy_check_mark:, :negative_squared_cross_mark:, …) you like. Personally I would use colored boxes for clarity, but whatever…

  • Since you can ignore certain lines in the csv file, maybe an idea would be to add a light explaination of the header info and the data structure. This is very useful in code, so why not use it directly in data? One or two lines would be enough.

  • Eating raw onions is good for your teeth. Onions are a natural bactericide.

1 Like

2 posts were split to a new topic: Twitch privacy concerns

I have been working on this project on Twitch today with @codeallthethingz and @Tachion.

We have an object schema defined in YAML, we have some object visualization code in JavaScript (which might change soon, but anyway). And we have some Python code with an example text environment and object loading (including tests!).

Please have a look if you are interested. We are keeping the Trello board updated as we go. See video here. Sorry it is 3 hours.

I made an enviroment to work with using pygame

the easy part is over, gonna work on movements and implementing some HTM’s to this


I talked about this spontaneously this morning. Here are the videos:


I’ll talk more about this Thursday on Twitch, but I am going to pause work on this project. Instead, I’m going to work on Building HTM Systems.

I’m doing this because we are getting a lot of new folks here wanting to know how Spatial Pooling works, how the TM works, how Encoding works, etc. I would love to work with you more experienced people on this 2D project, but I can see that the newer crowd needs this more. Having https://buildinghtm.systems fleshed out with a complete reference implementation could be huge for HTM adoption.

I promise I’ll be back, but I am going to put some effort into basic HTM reference documentation first.

You are all still welcome to work on this project in the meantime! In fact, if anyone where willing to live-stream work on this project on twitch I would support them by hosting their work on my channel.


I love the idea, but also think we are ready for what Jeff last week described for a cortical column.

Bitking found evidence of the required Speed variable and now have: place, body/head-direction, border/boundary, conjunctive, speed and grid cells to work with, which greatly simplifies the challenge of programming a self-exploring agent.

To show why the Objects=Places thinking is true while at the same time simplifying recognition problem a 2D cup would be drawn into a flatland world as a solid sphere with short projection out the side that it senses by bumping into. Optionally when cup is tilted top slice at rim creates an opening for it to get trapped inside for awhile where it then feels the shape of the inside too.

Letters of the alphabet can be used to place cups in a room but in the virtual world all solid objects are most simply places to physically touch the surface of, then only need one or more touch sensors. After that a (optionally sequentially presented) flatland view of the Numenta logo can be painted to the outer side of a cup surface and other objects including various sized ones on walls.

And direction/distance to objects.
Some people collect trading cards.
I think that I may have to collect papers on HC/EC cells specializations.


I just watched this video, cool vision!

One question about it:

I thought layer one was mostly informed by the hierarchy - that is regions above this region would communicate their expectation of future states (their prediction) for this region down to this region as a union of active cells in layer one. Are you suggesting that…

  1. that isn’t the case: the union is exclusively produced by the horizontal voting of regions? or …
  2. that is the case but the union is influenced heavily by horizontal voting? or…
  3. that is the case, and we’re simply not going to model that hierarchical aspect yet?



There is no hierarchy in this proposed model, which includes the Columns and Columns+ papers. These present an idea of just one level of the hierarchy. We are not saying hierarchy doesn’t exist, we are just not attempting to explain it.

So we are suggesting #3 above. :nerd_face:


Please like this post if you would watch someone else in the community doing a Twitch stream on this project while I’m busy elsewhere!


I listed what I expect to be the essential requirements needed for feeling the shape of a cup or other object. Looks like we more specifically need to start with a primary motor cortex column, for each hemisphere. To add eyes and other complex sensors: a primary somatosensory cortex column can later be added to derive these needed essentials. For right now vestibular and other signals can be taken from precise already calculated program variables used to draw into environment.

Sensory In
  Vestibular system
    Linear displacement, speed, most simply distance from previous location to current.
    Rotational displacement, most simply positive or negative change in angle since previous timestep.
    Bit that changes state when bumps or applies force against a solid. 
  Motor, main drive
    1 bit Forward and 1 bit Reverse interoceptive feedback, typically motor stall, must reverse out.   
    1 bit Left and 1 bit Right interoceptive feedback, typically motor stall, must turn other way.
    Optionally 4 motor bits (see below) and/or speed, or sequence of readings to recall unique routines.
Motor Out
  Motor, main drive
    1 bit Forward and 1 bit Reverse thrust through speed range. Subtract bits for +1,0,-1 shift direction.   
    1 bit Left and 1 bit Right thrust through (optional) speed range. Subtract bits for +1,0,-1 direction.
      Note: Bilateral columns both have only one possible motor direction, already (-1) oppose each other.

this is a very cool research project, I’ve been keeping an eye on this for a long time, but need to catch up with what you have here!

Just FYI, we have a grid cell encoder

for both c++,py. Comes also with nice visualizations.
Would be great if you could validate it for us and use it.
Cheers, breznak