2D Object Recognition Project


Unity works fine with NuPIC. But I don’t want to use any virtual environment early in this process. It is more important to get a scenario that tests the theory out and sets up the simplest environment where it works. A simple 2D environment like a grid is the best place to start IMO. Any environment API we add at this point is overhead we don’t need.


I was wrong about needing displacement cells for this task. As long as we are not generating movement or composing objects, we don’t need displacements. We just need consistent location representation and the right layers and connectivity.


Also @lscheinkman pointed me to where these networks are described in htmresearch:

Looks pretty close to my diagram, doesn’t it? :slight_smile:


Sounds like an interesting project. I’d like to help if I can.

I’ve been working on a proof of concept app for doing stereo vision saccades for a while now. My objective has been to see if there is perhaps some natural architecture choice which would lead to the eyes saccading together. Another theory that I wanted to test was that the system would learn to saccade towards areas in the input that were producing unexpected behavior. In other words, they would tend to ignore stationary and simple movements in favor of focusing on input areas that were behaving unpredictably - motor output driven (or at least influenced) by bursting columns.

The input consists of a pair of 2D Cartesian grids of cells acting as retinas. I’ve also experimented with a radial density function to get more of a foveated effect. These inputs are then tied to hidden layers before being output to a pair of motor control layers. The motor control layers are also 2D Cartesian grids. I am currently interpreting the output layer as a weighted sum of the nodal positions to find the geometric center of activation for each layer. I then use the offsets from the origin to update the orientation for each eye - sort of like a joystick push. Perhaps you could use a similar mechanic to drive the movement of your sensor.

Here’s a screen grab from an earlier incarnation written in JavaScript using ThreeJS for visualization purposes. Top left is the scene (blue orb is the head, inset spheres are the eyes), bottom half is the rendered view from each eye, and the top right is the projection of these views onto the retina layers (separate red, green, and blue layers for both eyes).


I suspect that some (most?) of the visual planning is done with sub-cortical structures.

One of the way-stations on the way to V1 is the brain-stem and this tap feeds to amygdala. Considerable evidence points to early visual primitive recognition there for things like faces, secondary sexual characteristics, and basic animal shapes. I am sure that there are pathways from that area that are elaborated through the prefrontal cortex to drive the FEF to focus on these features.


I see a tangential conversation emerging, so I’m going to pull this back a little.

In this experiment, there will be no real agency (at least not initially). And what I mean by agency is that the agent is causal to the next action taken. For the agent to have an ability to influence the next movement, HTM theory says we must introduce displacement cells.

So I’m putting this off as much as possible. But we should be able to prove out some simple form of object classification via random or scripted movements through the environment space and identify collections of features as objects without agency and without displacements.


BTW I’m going to talk in detail about this project spec at tomorrow at HTM Hackers' Hangout - Mar 1, 2019. I’m hoping to clear up some confusion (my own included). You are all free to join, but I know the timing is bad for those of you in Asia (sorry!). But the video will be available afterwards for anyone to watch, so I will do a complete review of the project in my head and you all can post questions / comments here on this thread.

EDIT: I should also note that I updated drawings in my posts above.


OpenAI has released an pretty fun 2D environment. Might be worth trying after we finish the first two phase of this project.


Thanks @Falco for joining. Here’s my code.


I’ll take a swing at this too.


If you keep the same hours, I can make it every day except Tuesdays. But I can always watch later of course.

Thanks for doing this.