2D Object Recognition Project

I pretty much followed this kind of approach on a previous Unity project I had going on 3d maze navigation.

I was compiling to webgl so it would run in the browser, and using HTM.js as the HTM implementation. I was passing vision bitmaps and navigation commands back and forth over a fairly simple bridge, the C# side was just handling screenshot encoding and firing movement vectors.

If there’s any interest in this approach I can probably dig up the code.

1 Like

Unity works fine with NuPIC. But I don’t want to use any virtual environment early in this process. It is more important to get a scenario that tests the theory out and sets up the simplest environment where it works. A simple 2D environment like a grid is the best place to start IMO. Any environment API we add at this point is overhead we don’t need.


I was wrong about needing displacement cells for this task. As long as we are not generating movement or composing objects, we don’t need displacements. We just need consistent location representation and the right layers and connectivity.

1 Like

Also @lscheinkman pointed me to where these networks are described in htmresearch:

Looks pretty close to my diagram, doesn’t it? :slight_smile:


Sounds like an interesting project. I’d like to help if I can.

I’ve been working on a proof of concept app for doing stereo vision saccades for a while now. My objective has been to see if there is perhaps some natural architecture choice which would lead to the eyes saccading together. Another theory that I wanted to test was that the system would learn to saccade towards areas in the input that were producing unexpected behavior. In other words, they would tend to ignore stationary and simple movements in favor of focusing on input areas that were behaving unpredictably - motor output driven (or at least influenced) by bursting columns.

The input consists of a pair of 2D Cartesian grids of cells acting as retinas. I’ve also experimented with a radial density function to get more of a foveated effect. These inputs are then tied to hidden layers before being output to a pair of motor control layers. The motor control layers are also 2D Cartesian grids. I am currently interpreting the output layer as a weighted sum of the nodal positions to find the geometric center of activation for each layer. I then use the offsets from the origin to update the orientation for each eye - sort of like a joystick push. Perhaps you could use a similar mechanic to drive the movement of your sensor.

Here’s a screen grab from an earlier incarnation written in JavaScript using ThreeJS for visualization purposes. Top left is the scene (blue orb is the head, inset spheres are the eyes), bottom half is the rendered view from each eye, and the top right is the projection of these views onto the retina layers (separate red, green, and blue layers for both eyes).


I suspect that some (most?) of the visual planning is done with sub-cortical structures.

One of the way-stations on the way to V1 is the brain-stem and this tap feeds to amygdala. Considerable evidence points to early visual primitive recognition there for things like faces, secondary sexual characteristics, and basic animal shapes. I am sure that there are pathways from that area that are elaborated through the prefrontal cortex to drive the FEF to focus on these features.

I see a tangential conversation emerging, so I’m going to pull this back a little.

In this experiment, there will be no real agency (at least not initially). And what I mean by agency is that the agent is causal to the next action taken. For the agent to have an ability to influence the next movement, HTM theory says we must introduce displacement cells.

So I’m putting this off as much as possible. But we should be able to prove out some simple form of object classification via random or scripted movements through the environment space and identify collections of features as objects without agency and without displacements.

BTW I’m going to talk in detail about this project spec at tomorrow at HTM Hackers' Hangout - Mar 1, 2019. I’m hoping to clear up some confusion (my own included). You are all free to join, but I know the timing is bad for those of you in Asia (sorry!). But the video will be available afterwards for anyone to watch, so I will do a complete review of the project in my head and you all can post questions / comments here on this thread.

EDIT: I should also note that I updated drawings in my posts above.

1 Like

OpenAI has released an pretty fun 2D environment. Might be worth trying after we finish the first two phase of this project.


Thanks @Falco for joining. Here’s my code.

I’ll take a swing at this too.


If you keep the same hours, I can make it every day except Tuesdays. But I can always watch later of course.

Thanks for doing this.

1 Like

As suggested, I spent the majority of my work day today working on defining the object schema for experiments in this 2D object space.

The relevant material starts at about 1:30. Here is the Trello board and here is the code.

Also: the result.


I watched your stream last night and this morning. Best episode so far I think. :+1:

Here are some suggestions, (just for consideration):

  • This app you’re writing is really an editor, so you could name it 2D Object Editor. And a 2D object is flat, so maybe… FlatEditor? Or you could call it Flatland, in hommage to Edwin A Abbott. (People will definitely want to find out what you’re doing in Flatland).

  • In the header, you might consider adding the number of different features in the object after the dimensions. For some tests you might want to select only objects from your library with a specific amount or range of features.

  • If you want to make your objects form-agnostic, why not just number your feature types (0 -> 255) in the csv file and have your editor display whatever symbols (A, B, :heavy_check_mark:, :negative_squared_cross_mark:, …) you like. Personally I would use colored boxes for clarity, but whatever…

  • Since you can ignore certain lines in the csv file, maybe an idea would be to add a light explaination of the header info and the data structure. This is very useful in code, so why not use it directly in data? One or two lines would be enough.

  • Eating raw onions is good for your teeth. Onions are a natural bactericide.

1 Like

2 posts were split to a new topic: Twitch privacy concerns

I have been working on this project on Twitch today with @codeallthethingz and @Tachion.

We have an object schema defined in YAML, we have some object visualization code in JavaScript (which might change soon, but anyway). And we have some Python code with an example text environment and object loading (including tests!).

Please have a look if you are interested. We are keeping the Trello board updated as we go. See video here. Sorry it is 3 hours.

I made an enviroment to work with using pygame

the easy part is over, gonna work on movements and implementing some HTM’s to this


I talked about this spontaneously this morning. Here are the videos:


I’ll talk more about this Thursday on Twitch, but I am going to pause work on this project. Instead, I’m going to work on Building HTM Systems.

I’m doing this because we are getting a lot of new folks here wanting to know how Spatial Pooling works, how the TM works, how Encoding works, etc. I would love to work with you more experienced people on this 2D project, but I can see that the newer crowd needs this more. Having https://buildinghtm.systems fleshed out with a complete reference implementation could be huge for HTM adoption.

I promise I’ll be back, but I am going to put some effort into basic HTM reference documentation first.

You are all still welcome to work on this project in the meantime! In fact, if anyone where willing to live-stream work on this project on twitch I would support them by hosting their work on my channel.


I love the idea, but also think we are ready for what Jeff last week described for a cortical column.

Bitking found evidence of the required Speed variable and now have: place, body/head-direction, border/boundary, conjunctive, speed and grid cells to work with, which greatly simplifies the challenge of programming a self-exploring agent.

To show why the Objects=Places thinking is true while at the same time simplifying recognition problem a 2D cup would be drawn into a flatland world as a solid sphere with short projection out the side that it senses by bumping into. Optionally when cup is tilted top slice at rim creates an opening for it to get trapped inside for awhile where it then feels the shape of the inside too.

Letters of the alphabet can be used to place cups in a room but in the virtual world all solid objects are most simply places to physically touch the surface of, then only need one or more touch sensors. After that a (optionally sequentially presented) flatland view of the Numenta logo can be painted to the outer side of a cup surface and other objects including various sized ones on walls.