Hi the HTM community,
I want to present you a different approach to developing an AI model. I hope it would be interesting.
All critics, questions, and opinions are welcome.
First, a little about my work. I developed a new type of artificial neural network (this type is closest to auto-associative networks, I suppose). I well aware that the HTM community position is “the model have to implicate the real structure of the brain”, but “we don’t need to create an exact replica, just a system that demonstrates the important properties” (Subutai Ahmad).
Instead of modeling the brain (as you do) or modeling the specific task (as many AI researchers do), I was trying to make a network which could work on any task without adjusting the architecture for the specific one.
I started with the model, which I got from the «On intelligence» book, and some other scientists work. That model did work on classification task. However, when I was trying to apply the network to controlling “an animal in a maze”, I discovered that the model has to be able to process temporal sequences of patterns to do that task.
Therefore, I modified the model, made it work on the current task and moved to another. Unlike most researchers, I wasn’t trying to get the state-of-the-art results on just one task. I was curious, “Could I make the network do this? Would it be able to do that?”. Thus, I continued to find tasks, which my model would fail on, and improve the architecture.
I noticed that the more complex the model, the more “narrow” it is. So, in a process of testing on different tasks, my model has become very simple. Currently, it has very general architecture.
When I started comparing my network with the HTM, I was surprised how similar they are. That’s not the structure or algorithms. But they are based on the same principles and have a lot of common properties.
- Temporal sequence processing. My network works with sequences of patterns and uses prediction to do that. Like your model.
- Continuous learning. I completely agree that the model has to work in “real time” on a continuous stream of data. The network simultaneously learns new data and processes input.
- The model processes any type of input (visual, audio, etc.) the same way. Like the HTM with its SDR, my network operates the sequences of binary patterns. For the network, there is no difference between pattern from vision and hearing. That property also allows combining different kinds of input (as you combined sensory pattern with the location in the recent paper).
- Predictions are the key. Actually, for me, that was the most doubtful part of your theory. Nevertheless, appears to be that my model works pretty similar. I just call it differently, “associative activation”. Because, for example, if we are recalling what we had on lunch, that’s not really a “prediction”. But we use the same “prediction” mechanism to do that.
- Inhibition. Initial model did not include any inhibition but appears that inhibitory connections are absolutely necessary for some tasks.
- Creating new connections between neurons.
- Only small part of neurons activates on each timestep.
- Hierarchy. There is enough evidence that the brain processes information hierarchically and uses generalizations.
But now, let me point on some differences between my network and the HTM.
- My network doesn’t have layers. Yes, the network is hierarchical but has no separate layers. I started with fixed hierarchical structure (when a receptive field of neurons from N layer can be only among neurons from N and N-1 layers), but I discovered that it just doesn’t work for some tasks. Now, the receptive area of a neuron can consist of neurons all over the network and combine representations from different levels. I would be glad to provide more details and examples. How do you think, could the biological neuron have the receptive field in multiple layers regardless of their location?
- Actually, my network doesn’t have parameters at all (*). The network is universal for every task. Its architecture is dynamic. From the beginning, the network has zero neurons. I know, that is against biology, but you do the similar in your model. You make connections between existing neurons, but for me easier to just make new neuron with necessary connections. And this approach actually does work.
Different patterns don’t mix up. Similar patterns associatively connect, but new data do not overwrite existing data. That property gives the network ability to learn new tasks without spoiling old knowledge.
By the way, 2 and 3 lead to another interesting property.
- Theoretically infinite capacity. The network can receive new inputs, learn new things, and grow. But because on each timestep only small part of the network (associatively connected) become active, even a huge network can work with the same performance as a small one. And we can create neurons and connections till we run out of space on the hard drive.
Hierarchical learning. That one better explains by an example:
First, we train the network to recognize squares and circles.
Then we train it to recognize a “button”, which consists of this circle and this square:
Next, we feed the network with this picture:
This picture has almost no overlapping inputs with the previous one but it still will be recognized as a “button” (because it still consists of a circle and a square). We don’t need to train the network on all combinations of squares and circles for robust recognition. This property allows to learn new thing much faster when it consists of features, the network already knows.
This example is very simplified. If you interested, I’ll give you the real results and the detailed description of how the network do this task.
For my network, there is no difference between input and output. That property gives it the ability to produce really flexible output, not predefined in any way. And also it allows the network to do such cool things like “imaginable” input. Let me show you:
First, we train the network to recognize a square:
Then we show it a partial square:
It still will be recognized as a square. You may notice, that in case B2 the receptive field doesn’t contain anything. So the network will activate “predicted” input (red dotted line) and proceed to move until it will encounter bottom angle.
By the way, the network controls the moves of the receptive area. That’s like saccades. This move is an output, but simultaneously it’s an input (because the next prediction clearly depends on the direction of the move). This approach actually makes visual recognition very robust.
Generalization and inhibition. Appears to be that generalization cannot be “automatic”. It has to be dynamic. That one is very contrary with the HTM, so let’s look at the picture:
In the case A, you would probably recognize figure 1 as some kind of fruit. In the case B, figure 3 is exactly the same as A1, but I suppose you’ll take it for a ball.
This task can’t be done by just relying on self-organizing topology. That’s why the model needs inhibitory connections, which can deactivate representation depending on the input and the context.
- “One-shot learning”. Some tasks require learning by just a few examples. Hierarchical learning allows doing that.
- Self-learning. The network can evaluate its own actions. That’s done the same way - through associative activations. Appears to be that property is absolutely necessary for dynamic generalization.
Everything described above can be done by just creating and using associative connections.
The network clearly goes against the biology. But I didn’t intend to model the brain. I was simply trying to make a universal architecture. For the past 3 years, I tested the network on such tasks as classification, visual recognition (with saccades), text dialog, dialog + visual recognition, “animal in a maze”, “Tic-tac-toe” (including quite interesting “blindfold” version), “Pac-Man” game and some logical tasks (including the “Winograd Schema Challenge”).
My network doesn’t wipe out the HTM. I think it’s more like a “look from different point of view”. And I would love to see your “neuroscientific” opinion about my model.
I greatly appreciate any questions and critics. I would be glad to provide more information if you got interested in something specific.
(*) The network doesn’t have usual parameters, like learning rate, count of layers, activation function, etc. But it has only one parameter. That’s a count of representations (neurons), which can be recognized (activated) simultaneously. That one has a huge effect on the complexity of tasks, which the network able to perform. And it actually makes a pretty interesting correlation with some anthropology experiments.
P.S. In addition, on the topic “How the allocentric locations are encoded for SMI?” @rhyolight mentioned that “we’re trying to figure out how and where this location signal is generated”.
My system doesn’t include the location signal. It uses another technic to achieve the same goal. Can I propose my method for your consideration?