2D Object Recognition Project

I want to start thinking about building robotics applications where HTM and the thousand brains architecture are applied. One way to do that is to try to build something simple. So here are some ideas about building some that does object recognition and is simple enough to explain and demo.

Some up-front assumptions:

  • you don’t need a robot, a simulated world will do
  • there must sensors in an environment
  • either the sensors must move, or the environment must change
  • if the sensor moves, the movement must be predictable, or it must be encoded as input
  • sensory input is topologically mapped into a topological arrangement of cortical columns (CCs)
  • each sensor will have multiple CCs
  • in each CC, a pooling layer will attempt to resolve objects above a sensory layer (SP/TM)
  • CCs will have lateral distal connections to share between object layers

So now, the hard questions we have to answer:

  • do we require directed movement? Can random movements suffice?
    • if not, the HTM must generate the movement command, and I’m not even sure who has even thought about how to do this
  • how do we know when the sensor has moved from one object to another object (or a sub-object)?
  • How can we take a movement command and encode into something meaningful in object space?
  • Can we classify objects based on a collection of object layers representations?
  • Can we create a hierarchy across the sensors?

What are the first steps? I am honestly tempted to start thinking in 2D spaces. I’ve seen some of @mrcslws’s experiments where there is a grid, and each spot in the grid can have a feature there. Movements are simple because you can restrict them to :arrow_up::arrow_down::arrow_left::arrow_right:. We could start with one “sensor” that can sense one spot in the grid and get the features there. It can move, and when it does, the movement is encoded into distal input to the sensory layer of the CC, which should predict what features will be there (if it has seen any). We could hard-code some movement directives, such as affinity to explore and find more features and see if it starts predicting the right features as it explores the space.


(image added for post previews in social media)

9 Likes

I was thinking if anyone is interested in trying to put together a Network model that does this, I will help. I know where the research code is to build 2-layer networks as I described above. I can work on a spec that includes the experiment space, setup, feature description, network design, etc. But I don’t know if it is worth working on alone. I would need someone who can write unsupervised Python. I would be willing to build this with the community fork if it makes sense (are we python 3 yet :pleading_face:?)

2 Likes

Ooh! I’ve been trying to do some robotics like applications for a while.

I actually just finished a very simple robot for testing:
image

it moves the camera gyroscope after typing in ‘w’, ‘a’, ‘s’, and ‘d’. I’ll expand that for finer control for a neural network, but it works.

The reason I prefer this over a robotics simulator is mainly because I can admire it in real life and physically interact with it, even when my computer is turned off. Gazebo (server. I have one that just needs web-sockets fixed.) or V-REP is probably better, even for simple projects.

 

Alright, I’ll try answering some of the hard questions:

I thought about generating movement commands. HTM itself doesn’t generate commands, but it does recognize and predict the world, and default commands can be sent in based on what the network is recognizing or predicting. I believe predictive output could be better, because then the robot could react before something happens. It’d be something like: see pedestrian crossing street, predict driver is happy when car slows down before pedestrian, so slow down to make driver happy. Here “happy” would need to be something innately detectable, and may need to have its accuracy improved separately.

Perhaps find the part of the image or sensory input that results in the most output. Localization could help a lot here. If one region of cells has the highest average firing rate, then that will generate the most output for other networks, and is what those other networks would be “paying attention to” the most.

Do you mean for the simulation? Or for the HTM/neural network? For the neural net, the entorhinal cortex seems pretty good at that. And for the simulation, V-REP or Gazebo should take care of that.

If we’re talking about temporal memory or spatial pooling firings, this would be the classification, but it’d be unique to each robot. However, you could do something like language formation, translating input in one region to input in another that can be transmitted.

What exactly do you mean?

4 Likes

@rhyolight I would like to put this together. I have experience in 3D programming and developing games in Python. (And I’m working on SMI anyway.)
Where could we ,as the entire HTM community, can start? I don’t think Thousand Brain code is available in the standard NuPIC. NuPIC.core or the community fork.

1 Like

Long periods of moving through empty space might be harmful for tracking sequences and possible objects. Also, in real life, trying to move to a point inside an object would be harmful.

I think directed movement would require attention, because I think it needs to move to wherever is expected to best disambiguate the object.

My understanding is that location signals are initially produced differently for each sense. L4 is pretty specialized in primary cortical regions for each sense. The same might be true for movements, so the cortex might not have a universal solution for this. The specializations might be in the where stream, before the conversion to allocentric representation.

1 Like

4 posts were merged into an existing topic: Python 3 migration

I’ll write some guidance up Monday. I have some more ideas.

3 Likes

I’ve been working on a ROS course, including simulation using Gazebo. In a couple weeks I’ll have a bit of time to try stuff out.

1 Like

I have a ROS+Gazebo server you can use if you want.

http://simleak.com/world/

It’s not fully set up, but it should be better than starting from scratch. I can also send relevant code if that helps.

2 Likes

You all are welcome to hook up real world robots if you like, but I’m more interested in the software. As long as we agree to a clean common interface I’m sure we can have it both ways.

Here is roughly what I am thinking. It may be simpler than what you expected, but starting simple is a good thing. Because we are restricting this project to a 2D space, we don’t have to think about orientation.

This is all just a brainstorm, I had to pick hard numbers for a lot of things just to show them visually, but these spaces could be defined and named differently.

Object Space

The experiment space is a simple 2D grid. At each position on the grid has a feature. I’m representing different features below with a black :black_circle: , blue :heavy_multiplication_x:, and green :heavy_check_mark:

The features above are arbitrarily placed by my brain for no reason. We could probably use random features but this looks more interesting. I just wanted to have some continuous features. You can think about this space as an object and the positions as locations on the object, each having a feature.

Agent & Sensors

There is an agent that moves through this object space with sensors. In the example above, the agent is at X10 Y9:

Movement

I want to restrict movement (initially) so our agent can only move on unit either :arrow_up::arrow_down::arrow_left::arrow_right: (no diagonal). I also want to initially use random movements (we’ll talk about control when we need to).

With this setup, for any movement, we’ll get 4 new sensory inputs (one for each sensor at NSEW). As the agent moves, sensors build up their models of the object.

Cortical Columns

Here’s the hard part. We need to build a 3-layer Network for each sensor which has an object pooling layer as described in the Columns Paper above a 2-layer location/sensor circuit as described in Columns+:

The different phases described in Columns+ are in this model as well, and they are really important to get right from a network model standpoint. Everything cannot compute all at once, there must be an order. Details in the research code.

Lateral Connectivity

Object layers must share representations between cortical columns via lateral connections:

For code examples, see the supporting paper for the Columns paper.

Grid Cells

We will need to write the mechanism that creates SDR encodings for location based on the position in the object space. I have a feeling @marty1885 has most of this written. :wink:

8 Likes

This fits my model as well. Grid cells in different sensory regions could represent space in different ways, but because they keep this representation internally, it does not affect other CCs.

Here is where I would start looking at this:

See example usage in the paper’s supporting code:

https://github.com/numenta/htmpapers/blob/master/biorxiv/location_in_the_neocortex_a_theory_of_sensorimotor_object_recognition_using_cortical_grid_cells/convergence_simulation.py#L170-L175

1 Like

The draft sounds interesting! What kind of behaviour can we expect from this agent? Or rather, as there seems to be no reward involved, what drives the agent to learn/explore?

2 Likes

Before we discuss control, I want to ensure random or hard-coded movements are being properly predicted. For example, we should be able to move in a sequence between three locations with features, and given each proximal movement command to a sensor’s CC, the location layer should update it’s GCM bumps to the new location’s SDR (we can test this), then the sensory layer should have predicted cells when given distal location SDR input (a predicted feature at the predicted location, we can test this too).

Let’s make sure this is working before we get into movement control. Once we see the agent building a model of the object, the control problem gets more interesting because we can use the CC’s state to affect movement. For example should I move differently if I get to a location and the feature is not what I expected? Maybe revisit the last location I was just at and confirm expected features there?

This is also where we might try to hook up an RNN as a control system.

1 Like

First Step: Learn the Environment

We are going to use Grid Cell Module type logic to encode locations in a 2D environment. We are not going to learn how to learn space itself. We are going to assume all sensors have basically the same mechanism to represent space, and while we can vary the parameters of the GCMs for each sensor, there won’t be logical differences in how they represent space.

This means we just need a consistent mechanism to encode the 2d space (much like I created when I did the visuals for the Grid Cell HTM school episode). If we create these encodings in the right way and retain GCM properties, we should be able to operate upon them to create displacement vectors, which will help us with movement.

I propose the first step is to create this encoding mechanism and show how different 2d locations are encoding by the GCM modules, testing how different locations can be compared and contrasted, thinking about how to create proper displacements. EDIT: I was wrong about displacements, see tickets for current work.)

Background reading material: Columns+ Paper (PDF)

3 Likes

Have I given enough guidance? (happy to continue)

3 Likes

I’d prefer a small list of tasks honestly.

That way, I could pick out something like the 2D vision task, if there is one, and someone else can work on another task. Like How I have Jira at work and Trello at school.

I’m going through the Columns+ Paper. It’s still a mistery to me how lateral connections work. And the GCM in the paper seems to be different than the ones described in HTM school (which my implementation is based on). And seems the entire archictue of the network is very different from the standard SP/TM.

BTW, when will the next HTM school episode come out? Having someone fully acknowledge explaining the theory is alot easier and faster than crunching the papers.

1 Like

@rhyolight @SimLeek Maybe we should have a community dashboard or something (maybe a chatroom?) to record what everyone is doing now and so people won’t be redoing the same task? Maybe the community is still small enough. Just a thought.

1 Like