2D Object Recognition Project

If you think of the dendrite festooned with synapses - it is an input device to sample the local environment. A given dendrite can only reach about 250 µm away, or a circle of 500 µm in diameter reach for a given mini-column. Remember that these mini-columns are pitched about 30 µm apart so a dendrite can reach about 8 mini-columns away in any direction, or about 225 mini-columns total.

The lateral connections allow for a distant winning cell (beyond the reach of the dendrites) to add input/excitement. These are sideways branching projections of the output axons from a cell. I tend to focus on the topology of L2/3 as this is the pattern matching & intermap signalling layer; deeper layer lateral connections have a somewhat longer reach.

Do keep in mind that EVERY mini-column has these lateral connections shooting out in random directions and they are all working at the same time. They are all about the same length and they influence a population of distant cells in a rough circle around the cell body. I am not sure of the count but let’s start with 10 or so as a working number.

What I see at the most important feature is that this allows coverage of an area larger than any one column could cover by itself with voting - each sees a pattern but the two cells working other signal that they are seeing a part of a larger pattern.

See this picture:

Each little circle is an individule mini-column with about 100 cell bodies. The larger black circle is the dendrite reach of the center mini-column. The black beam in this picture is the long-distance lateral connection between the two center minicolumns so that the two minicolumn “receptive fields” connected by this link covers the space with very little overlap and very little area missed.

I made this diagram to show the correspondence between the biology and this idealized diagram.

There are other features/advantages:

Important point: In HTM we have binary signal that fire or does not, and the temporal memory is a rigid sequential operation. Real nerve firing is binary AND rate oriented to add an analog dimension; there are weak and strong signal and they can build over time.

Ideally - this lateral signal should push a cell that is on the edge of firing into firing and learning its inputs. In this way, a mini-column will help other mini-columns to learn a new pattern.

These connections should also allow three or more cells that are sensing a weak and noisy signal to “egg each other on” and agree that they do know this pattern and fire.

One other important bit: These lateral connections are the input to fire the inhibitory interneurons. The inhibitory interneurons should act to suppress other mini-columns that are not as sure of themselves because they have a weaker match to the input; this acts as a filter to pick weak signals out of the noise.

The signal weighting is very important - it should not force firing nor should it be so weak it is irrelevant. The balance between these lateral connections and the inhibitory interneurons is an important parameter and I suspect that models will have to tune this to get the best performance.

I hope this helps.


I don’t see any reason to separate into training / testing phases.

This is the last layer we’ll be working on, but I’m sure we will welcome your help.

1 Like

Excellent. I’ll work out the algorithm in parallel and post a demo with details in another thread. Should at least fuel some ideas for discussion when it comes time to implement the output layer. I’m not sure on biological plausibility, so that will be an important aspect to discuss.


I have found nice implementation of object space environment https://github.com/vicariousinc/pixelworld


Would it not be best to use something like Unity Machine Learning to create the 2D world and then control the agent with an HTM based model?

They make the building of virtual world’s with actions and rules easy, they expose the outputs and take in actions.

I have been studying up on HTM since I wanted to start a UnityAI agent based project based on HTM. Very glad that the community is also starting down the agent route now.

1 Like

Unity seems to be all about c#/dot net.
If you are wanting to stay with that there are some HTM tool that are written that way.
Please see:

Proposal is not to code HTM in Unity / C#.

Only contribution from Unity side is to easily create game environments and controlling agents in such an environment.

They have a Python API - so the “brain” can be external. Any bits of C# script would only be needed
for setting up the 2D environment.

They output “sensor” data to the Python API and accept actions - the API also allows some control of the Unity game environment.

I’m not sure how important it is to choose a game environment here. In fact, we may be adding bloat by doing so. I mean, this took me nine lines of code:


from opensimplex import OpenSimplex
tmp = OpenSimplex()

img = np.zeros((300, 300, 3))

it = np.nditer(img, flags=['multi_index'])
while not it.finished:
    x, y, c = it.multi_index
    img[x, y, :] = 1.0 if tmp.noise2d(x/100.0, y/100.0) > 0.0 else 0.0


There’s also a cv2.setMouseCallback function if you want to add interaction. And you can just load an image in.

Conway’s game of life took 16 lines to code. https://github.com/SimLeek/cv_pubsubs/blob/master/tests/test_sub_win.py#L111

1 Like

I pretty much followed this kind of approach on a previous Unity project I had going on 3d maze navigation.

I was compiling to webgl so it would run in the browser, and using HTM.js as the HTM implementation. I was passing vision bitmaps and navigation commands back and forth over a fairly simple bridge, the C# side was just handling screenshot encoding and firing movement vectors.

If there’s any interest in this approach I can probably dig up the code.

1 Like

Unity works fine with NuPIC. But I don’t want to use any virtual environment early in this process. It is more important to get a scenario that tests the theory out and sets up the simplest environment where it works. A simple 2D environment like a grid is the best place to start IMO. Any environment API we add at this point is overhead we don’t need.


I was wrong about needing displacement cells for this task. As long as we are not generating movement or composing objects, we don’t need displacements. We just need consistent location representation and the right layers and connectivity.

1 Like

Also @lscheinkman pointed me to where these networks are described in htmresearch:

Looks pretty close to my diagram, doesn’t it? :slight_smile:


Sounds like an interesting project. I’d like to help if I can.

I’ve been working on a proof of concept app for doing stereo vision saccades for a while now. My objective has been to see if there is perhaps some natural architecture choice which would lead to the eyes saccading together. Another theory that I wanted to test was that the system would learn to saccade towards areas in the input that were producing unexpected behavior. In other words, they would tend to ignore stationary and simple movements in favor of focusing on input areas that were behaving unpredictably - motor output driven (or at least influenced) by bursting columns.

The input consists of a pair of 2D Cartesian grids of cells acting as retinas. I’ve also experimented with a radial density function to get more of a foveated effect. These inputs are then tied to hidden layers before being output to a pair of motor control layers. The motor control layers are also 2D Cartesian grids. I am currently interpreting the output layer as a weighted sum of the nodal positions to find the geometric center of activation for each layer. I then use the offsets from the origin to update the orientation for each eye - sort of like a joystick push. Perhaps you could use a similar mechanic to drive the movement of your sensor.

Here’s a screen grab from an earlier incarnation written in JavaScript using ThreeJS for visualization purposes. Top left is the scene (blue orb is the head, inset spheres are the eyes), bottom half is the rendered view from each eye, and the top right is the projection of these views onto the retina layers (separate red, green, and blue layers for both eyes).


I suspect that some (most?) of the visual planning is done with sub-cortical structures.

One of the way-stations on the way to V1 is the brain-stem and this tap feeds to amygdala. Considerable evidence points to early visual primitive recognition there for things like faces, secondary sexual characteristics, and basic animal shapes. I am sure that there are pathways from that area that are elaborated through the prefrontal cortex to drive the FEF to focus on these features.

I see a tangential conversation emerging, so I’m going to pull this back a little.

In this experiment, there will be no real agency (at least not initially). And what I mean by agency is that the agent is causal to the next action taken. For the agent to have an ability to influence the next movement, HTM theory says we must introduce displacement cells.

So I’m putting this off as much as possible. But we should be able to prove out some simple form of object classification via random or scripted movements through the environment space and identify collections of features as objects without agency and without displacements.

BTW I’m going to talk in detail about this project spec at tomorrow at HTM Hackers' Hangout - Mar 1, 2019. I’m hoping to clear up some confusion (my own included). You are all free to join, but I know the timing is bad for those of you in Asia (sorry!). But the video will be available afterwards for anyone to watch, so I will do a complete review of the project in my head and you all can post questions / comments here on this thread.

EDIT: I should also note that I updated drawings in my posts above.

1 Like

OpenAI has released an pretty fun 2D environment. Might be worth trying after we finish the first two phase of this project.


Thanks @Falco for joining. Here’s my code.

I’ll take a swing at this too.


If you keep the same hours, I can make it every day except Tuesdays. But I can always watch later of course.

Thanks for doing this.

1 Like