2D Object Recognition Project

This is entirely simulated, so my idea of a sensor is extremely simple.

  • it exists at one location in the 2D space
  • it receives the feature(s) in that location (an SDR)

That’s it. It is like a little window to the 2D space that gets fed upward into the CC. The agent might move the sensor to a new location, in which case the CC gets the movement command and the predicts what feature(s) the sensor be “see” there.

1 Like

i saw this example a long time ago and i think it would be a good place to start

being that the outputs should reflect the inputs in the form of movement, HTM should be able to correctly predict what next move it would make (given that rock pappers sissors example marty made a while ago showed improvements over the HTM algorithm) and later we could implement some form of gridcells to represent LOCATION and learn a map of said territory

putting together a project out of other peoples projects with esoteric design features only the original coder knows what the bottom line was supposed to mean has been always at least for me a major pain to work with

I had this idea of implementing something similar but decided to wait until subutai & jeff finished their paper that tackles place cells hoping it would give me a better insight on how to practically design something similar to “enviroment recognition”

@rhyolight, do you believe a location could be classified as “a bunch of signature features (like aunt lauras carpet in the intelligence framework/htm school video) of a location space clicking together?”

what im trying to figure out is how can this be encoded in a way where i can draw out a map in grid cell form WHERE i currently am, the video i linked previously should be a good playground for this question since a neural network is more or less deterministic to the enviroment it was set to learn the little quirks of how to complete a maze and from there we can play with its inputs and outputs to our desire. Maybe we can teach it how to complete a similar but different maze, in a semantic sense

1 Like

Thanks for the example, but there is a good reason why HTM won’t work well on that problem yet. You must model orientation in the car racing game. The example I set up above specifically removes orientation from the equation, because no one really knows how the brain does it yet. We know that head direction cells are the right direction to think, and that there are probably similar orientation cells or modules in the cortex.

Also, I have a good reason not to think about movement in a controlled fashion at this point. The car race example requires control immediately. In fact it tests control. That’s not what this experiment is doing. This is about object modeling, not control.

You’re describing place cells, which are probably learned by putting together lots of sensory features with GCM space locations.


A Hard Testable Goal

If we have a 2D environment, we can load different sets of features in it. To test out that our object modeling system is working, we can do the following:

  • create a set of several “objects” which are really just environments filled with pseudo-random features
  • train an agent on each object

We should be able to classify objects based on the Object Layer representation.

1 Like

Are you planning to have training as a separate step from prediction/classification? I’d be interested in helping implementing online learning for activity in the Output Layer (perhaps as a “phase 2” goal). I’ve been working on a similar use case (forming stable representations for sequences in a TM layer, but could be applied to forming object representations from activity in multiple SMI Input Layers)

1 Like

If you think of the dendrite festooned with synapses - it is an input device to sample the local environment. A given dendrite can only reach about 250 µm away, or a circle of 500 µm in diameter reach for a given mini-column. Remember that these mini-columns are pitched about 30 µm apart so a dendrite can reach about 8 mini-columns away in any direction, or about 225 mini-columns total.

The lateral connections allow for a distant winning cell (beyond the reach of the dendrites) to add input/excitement. These are sideways branching projections of the output axons from a cell. I tend to focus on the topology of L2/3 as this is the pattern matching & intermap signalling layer; deeper layer lateral connections have a somewhat longer reach.

Do keep in mind that EVERY mini-column has these lateral connections shooting out in random directions and they are all working at the same time. They are all about the same length and they influence a population of distant cells in a rough circle around the cell body. I am not sure of the count but let’s start with 10 or so as a working number.

What I see at the most important feature is that this allows coverage of an area larger than any one column could cover by itself with voting - each sees a pattern but the two cells working other signal that they are seeing a part of a larger pattern.

See this picture:

Each little circle is an individule mini-column with about 100 cell bodies. The larger black circle is the dendrite reach of the center mini-column. The black beam in this picture is the long-distance lateral connection between the two center minicolumns so that the two minicolumn “receptive fields” connected by this link covers the space with very little overlap and very little area missed.

I made this diagram to show the correspondence between the biology and this idealized diagram.

There are other features/advantages:

Important point: In HTM we have binary signal that fire or does not, and the temporal memory is a rigid sequential operation. Real nerve firing is binary AND rate oriented to add an analog dimension; there are weak and strong signal and they can build over time.

Ideally - this lateral signal should push a cell that is on the edge of firing into firing and learning its inputs. In this way, a mini-column will help other mini-columns to learn a new pattern.

These connections should also allow three or more cells that are sensing a weak and noisy signal to “egg each other on” and agree that they do know this pattern and fire.

One other important bit: These lateral connections are the input to fire the inhibitory interneurons. The inhibitory interneurons should act to suppress other mini-columns that are not as sure of themselves because they have a weaker match to the input; this acts as a filter to pick weak signals out of the noise.

The signal weighting is very important - it should not force firing nor should it be so weak it is irrelevant. The balance between these lateral connections and the inhibitory interneurons is an important parameter and I suspect that models will have to tune this to get the best performance.

I hope this helps.


I don’t see any reason to separate into training / testing phases.

This is the last layer we’ll be working on, but I’m sure we will welcome your help.

1 Like

Excellent. I’ll work out the algorithm in parallel and post a demo with details in another thread. Should at least fuel some ideas for discussion when it comes time to implement the output layer. I’m not sure on biological plausibility, so that will be an important aspect to discuss.


I have found nice implementation of object space environment https://github.com/vicariousinc/pixelworld


Would it not be best to use something like Unity Machine Learning to create the 2D world and then control the agent with an HTM based model?

They make the building of virtual world’s with actions and rules easy, they expose the outputs and take in actions.

I have been studying up on HTM since I wanted to start a UnityAI agent based project based on HTM. Very glad that the community is also starting down the agent route now.

1 Like

Unity seems to be all about c#/dot net.
If you are wanting to stay with that there are some HTM tool that are written that way.
Please see:

Proposal is not to code HTM in Unity / C#.

Only contribution from Unity side is to easily create game environments and controlling agents in such an environment.

They have a Python API - so the “brain” can be external. Any bits of C# script would only be needed
for setting up the 2D environment.

They output “sensor” data to the Python API and accept actions - the API also allows some control of the Unity game environment.

I’m not sure how important it is to choose a game environment here. In fact, we may be adding bloat by doing so. I mean, this took me nine lines of code:


from opensimplex import OpenSimplex
tmp = OpenSimplex()

img = np.zeros((300, 300, 3))

it = np.nditer(img, flags=['multi_index'])
while not it.finished:
    x, y, c = it.multi_index
    img[x, y, :] = 1.0 if tmp.noise2d(x/100.0, y/100.0) > 0.0 else 0.0


There’s also a cv2.setMouseCallback function if you want to add interaction. And you can just load an image in.

Conway’s game of life took 16 lines to code. https://github.com/SimLeek/cv_pubsubs/blob/master/tests/test_sub_win.py#L111

1 Like

I pretty much followed this kind of approach on a previous Unity project I had going on 3d maze navigation.

I was compiling to webgl so it would run in the browser, and using HTM.js as the HTM implementation. I was passing vision bitmaps and navigation commands back and forth over a fairly simple bridge, the C# side was just handling screenshot encoding and firing movement vectors.

If there’s any interest in this approach I can probably dig up the code.

1 Like

Unity works fine with NuPIC. But I don’t want to use any virtual environment early in this process. It is more important to get a scenario that tests the theory out and sets up the simplest environment where it works. A simple 2D environment like a grid is the best place to start IMO. Any environment API we add at this point is overhead we don’t need.


I was wrong about needing displacement cells for this task. As long as we are not generating movement or composing objects, we don’t need displacements. We just need consistent location representation and the right layers and connectivity.

1 Like

Also @lscheinkman pointed me to where these networks are described in htmresearch:

Looks pretty close to my diagram, doesn’t it? :slight_smile:


Sounds like an interesting project. I’d like to help if I can.

I’ve been working on a proof of concept app for doing stereo vision saccades for a while now. My objective has been to see if there is perhaps some natural architecture choice which would lead to the eyes saccading together. Another theory that I wanted to test was that the system would learn to saccade towards areas in the input that were producing unexpected behavior. In other words, they would tend to ignore stationary and simple movements in favor of focusing on input areas that were behaving unpredictably - motor output driven (or at least influenced) by bursting columns.

The input consists of a pair of 2D Cartesian grids of cells acting as retinas. I’ve also experimented with a radial density function to get more of a foveated effect. These inputs are then tied to hidden layers before being output to a pair of motor control layers. The motor control layers are also 2D Cartesian grids. I am currently interpreting the output layer as a weighted sum of the nodal positions to find the geometric center of activation for each layer. I then use the offsets from the origin to update the orientation for each eye - sort of like a joystick push. Perhaps you could use a similar mechanic to drive the movement of your sensor.

Here’s a screen grab from an earlier incarnation written in JavaScript using ThreeJS for visualization purposes. Top left is the scene (blue orb is the head, inset spheres are the eyes), bottom half is the rendered view from each eye, and the top right is the projection of these views onto the retina layers (separate red, green, and blue layers for both eyes).


I suspect that some (most?) of the visual planning is done with sub-cortical structures.

One of the way-stations on the way to V1 is the brain-stem and this tap feeds to amygdala. Considerable evidence points to early visual primitive recognition there for things like faces, secondary sexual characteristics, and basic animal shapes. I am sure that there are pathways from that area that are elaborated through the prefrontal cortex to drive the FEF to focus on these features.

I see a tangential conversation emerging, so I’m going to pull this back a little.

In this experiment, there will be no real agency (at least not initially). And what I mean by agency is that the agent is causal to the next action taken. For the agent to have an ability to influence the next movement, HTM theory says we must introduce displacement cells.

So I’m putting this off as much as possible. But we should be able to prove out some simple form of object classification via random or scripted movements through the environment space and identify collections of features as objects without agency and without displacements.