2D Object Recognition Project


I use Github for this. We could create a new repo with an issue tracker?


It will be awhile. I have partial storyboards for the next two episodes (the 2nd covers the Columns+ Paper), but I really want to keep up with the Subutai and Luiz’s work with CNNs (which you will all see soon enough).


Good question… As we haven’t decide which framework we are going to use yet. NuPIC/htmresearch is Python2, the community fork is not ready yet. I’m tempting to use tiny-htm as it is fast, and being my own framework… But its in C++ and I’m the only user for now. (Adding python3 bindings to tiny-htm should be easy tho.)

Is there a common ground we can all agree on?


So far you are the only person who has committed to work on this, so I think it’s up to you.

Keep in mind that I will be your new user. :wink:


I took some liberties to get this kicked off. Since @marty1885 and @SimLeek expressed interest, I’ve invited them to collaborate.

We’ll be using the issue tracker there unless someone has objections. Seems like a good easy place to start. I’ll fill out the README with the high level stuff.


what kind of sensors are we talking about here when we say object recognition? (tactile, audio, visual)

Do we want something universal thats similar to those examples given in the htmresearch/papers example?


This is entirely simulated, so my idea of a sensor is extremely simple.

  • it exists at one location in the 2D space
  • it receives the feature(s) in that location (an SDR)

That’s it. It is like a little window to the 2D space that gets fed upward into the CC. The agent might move the sensor to a new location, in which case the CC gets the movement command and the predicts what feature(s) the sensor be “see” there.


i saw this example a long time ago and i think it would be a good place to start

being that the outputs should reflect the inputs in the form of movement, HTM should be able to correctly predict what next move it would make (given that rock pappers sissors example marty made a while ago showed improvements over the HTM algorithm) and later we could implement some form of gridcells to represent LOCATION and learn a map of said territory

putting together a project out of other peoples projects with esoteric design features only the original coder knows what the bottom line was supposed to mean has been always at least for me a major pain to work with

I had this idea of implementing something similar but decided to wait until subutai & jeff finished their paper that tackles place cells hoping it would give me a better insight on how to practically design something similar to “enviroment recognition”

@rhyolight, do you believe a location could be classified as “a bunch of signature features (like aunt lauras carpet in the intelligence framework/htm school video) of a location space clicking together?”

what im trying to figure out is how can this be encoded in a way where i can draw out a map in grid cell form WHERE i currently am, the video i linked previously should be a good playground for this question since a neural network is more or less deterministic to the enviroment it was set to learn the little quirks of how to complete a maze and from there we can play with its inputs and outputs to our desire. Maybe we can teach it how to complete a similar but different maze, in a semantic sense


Thanks for the example, but there is a good reason why HTM won’t work well on that problem yet. You must model orientation in the car racing game. The example I set up above specifically removes orientation from the equation, because no one really knows how the brain does it yet. We know that head direction cells are the right direction to think, and that there are probably similar orientation cells or modules in the cortex.

Also, I have a good reason not to think about movement in a controlled fashion at this point. The car race example requires control immediately. In fact it tests control. That’s not what this experiment is doing. This is about object modeling, not control.

You’re describing place cells, which are probably learned by putting together lots of sensory features with GCM space locations.


A Hard Testable Goal

If we have a 2D environment, we can load different sets of features in it. To test out that our object modeling system is working, we can do the following:

  • create a set of several “objects” which are really just environments filled with pseudo-random features
  • train an agent on each object

We should be able to classify objects based on the Object Layer representation.


Are you planning to have training as a separate step from prediction/classification? I’d be interested in helping implementing online learning for activity in the Output Layer (perhaps as a “phase 2” goal). I’ve been working on a similar use case (forming stable representations for sequences in a TM layer, but could be applied to forming object representations from activity in multiple SMI Input Layers)


If you think of the dendrite festooned with synapses - it is an input device to sample the local environment. A given dendrite can only reach about 250 µm away, or a circle of 500 µm in diameter reach for a given mini-column. Remember that these mini-columns are pitched about 30 µm apart so a dendrite can reach about 8 mini-columns away in any direction, or about 225 mini-columns total.

The lateral connections allow for a distant winning cell (beyond the reach of the dendrites) to add input/excitement. These are sideways branching projections of the output axons from a cell. I tend to focus on the topology of L2/3 as this is the pattern matching & intermap signalling layer; deeper layer lateral connections have a somewhat longer reach.

Do keep in mind that EVERY mini-column has these lateral connections shooting out in random directions and they are all working at the same time. They are all about the same length and they influence a population of distant cells in a rough circle around the cell body. I am not sure of the count but let’s start with 10 or so as a working number.

What I see at the most important feature is that this allows coverage of an area larger than any one column could cover by itself with voting - each sees a pattern but the two cells working other signal that they are seeing a part of a larger pattern.

See this picture:

Each little circle is an individule mini-column with about 100 cell bodies. The larger black circle is the dendrite reach of the center mini-column. The black beam in this picture is the long-distance lateral connection between the two center minicolumns so that the two minicolumn “receptive fields” connected by this link covers the space with very little overlap and very little area missed.

I made this diagram to show the correspondence between the biology and this idealized diagram.

There are other features/advantages:

Important point: In HTM we have binary signal that fire or does not, and the temporal memory is a rigid sequential operation. Real nerve firing is binary AND rate oriented to add an analog dimension; there are weak and strong signal and they can build over time.

Ideally - this lateral signal should push a cell that is on the edge of firing into firing and learning its inputs. In this way, a mini-column will help other mini-columns to learn a new pattern.

These connections should also allow three or more cells that are sensing a weak and noisy signal to “egg each other on” and agree that they do know this pattern and fire.

One other important bit: These lateral connections are the input to fire the inhibitory interneurons. The inhibitory interneurons should act to suppress other mini-columns that are not as sure of themselves because they have a weaker match to the input; this acts as a filter to pick weak signals out of the noise.

The signal weighting is very important - it should not force firing nor should it be so weak it is irrelevant. The balance between these lateral connections and the inhibitory interneurons is an important parameter and I suspect that models will have to tune this to get the best performance.

I hope this helps.


I don’t see any reason to separate into training / testing phases.

This is the last layer we’ll be working on, but I’m sure we will welcome your help.


Excellent. I’ll work out the algorithm in parallel and post a demo with details in another thread. Should at least fuel some ideas for discussion when it comes time to implement the output layer. I’m not sure on biological plausibility, so that will be an important aspect to discuss.


I have found nice implementation of object space environment https://github.com/vicariousinc/pixelworld



Would it not be best to use something like Unity Machine Learning to create the 2D world and then control the agent with an HTM based model?

They make the building of virtual world’s with actions and rules easy, they expose the outputs and take in actions.

I have been studying up on HTM since I wanted to start a UnityAI agent based project based on HTM. Very glad that the community is also starting down the agent route now.


Unity seems to be all about c#/dot net.
If you are wanting to stay with that there are some HTM tool that are written that way.
Please see:


Proposal is not to code HTM in Unity / C#.

Only contribution from Unity side is to easily create game environments and controlling agents in such an environment.

They have a Python API - so the “brain” can be external. Any bits of C# script would only be needed
for setting up the 2D environment.

They output “sensor” data to the Python API and accept actions - the API also allows some control of the Unity game environment.


I’m not sure how important it is to choose a game environment here. In fact, we may be adding bloat by doing so. I mean, this took me nine lines of code:


from opensimplex import OpenSimplex
tmp = OpenSimplex()

img = np.zeros((300, 300, 3))

it = np.nditer(img, flags=['multi_index'])
while not it.finished:
    x, y, c = it.multi_index
    img[x, y, :] = 1.0 if tmp.noise2d(x/100.0, y/100.0) > 0.0 else 0.0


There’s also a cv2.setMouseCallback function if you want to add interaction. And you can just load an image in.

Conway’s game of life took 16 lines to code. https://github.com/SimLeek/cv_pubsubs/blob/master/tests/test_sub_win.py#L111


I pretty much followed this kind of approach on a previous Unity project I had going on 3d maze navigation.

I was compiling to webgl so it would run in the browser, and using HTM.js as the HTM implementation. I was passing vision bitmaps and navigation commands back and forth over a fairly simple bridge, the C# side was just handling screenshot encoding and firing movement vectors.

If there’s any interest in this approach I can probably dig up the code.