So I made this post on Reddit, but I thought I’d get everyone’s take here on the principles behind it.
Numenta’s philosophy of intelligence is the foundation of my understanding on the topic. And what is that philosophy? I’d say it’s the principles that underly the HTM and many brains theory. Principles of Sparse representations, Semantic meaning, and Hierarchy.
Now, I thought, if I were going to apply these principles in the simplest way I could from scratch, what would I design? The above post shows my initial answer to that question.
If you read the post you’ll notice I’m not employing the cortical learning algorithm, let alone anything more complicated like the spatial movement stuff Numenta’s been figuring out lately. Instead, I wanted something ultra simplified, and perhaps modular, such that by combining many of these units I might be able to make a fractal image of the whole.
I want to see how far you can go with the simplest possible implementation of the principles underlying intelligence.
I would love to hear your thoughts on my design, and how it can be improved, or even how it can be implemented. Thanks!
If I understand the basic concept correctly, the current sensory input is predicting motor output n timesteps into the future (with decaying influence), and current motor output is predicting sensory input n timesteps into the future (with decaying influence). Predictions are based on weights (which are used to score confidence), and weights are adjusted after evidence comes in on whether or not the predictions were correct.
It is likely that I am initially interpreting it wrong, but my first thought is that this would result in a low-order temporal memory. Or is there a mechanism there which would capture a higher-order context?
There is one piece that is not clear to me (which probably would answer my previous question). In the post you mentioned that there are connections going backwards as well from the other side of the network. But the labels seem to indicate that each layer should activate from left to right chronologically (since they are labeled “T=1”, “T=2”, etc), so just having trouble figuring out how connections going back the other direction would work - I’m guessing it isn’t predicting events in the past
Ok, that is how I interpreted it, but that then makes the labels on the arrows “T=1, T=2, T=3” confusing to me. You have one set on the top going from left to right, and rotated on the bottom you have another set of “T=1 T=2 T=3” going right to left. Since time can’t flow both directions, those can’t be depicting timesteps for when a particular layer is activated. Are you able to clarify that part? (sorry if this is a dumb question)
Or wait sorry, it is probably "T=4 T=5 T=6"on the bottom, then “T=7 T=8 T=9” on the top, etc. correct? If it is like this, I assume “T=4” should be the longest arrow (basically last layer looping back to the first layer), or would it be the shortest arrow (last layer just reverses direction)?
Are the same neurons used for both sensory representations and motor representations, or are they equally divided between the two?
After thinking about it some more, I think I understand (sorry, took me a bit). Here is another drawing of the same circuit highlighting the parts that I wasn’t sure about. Let me know if I got parts of this wrong:
Basically it is a series of layers that form a loop (last layer connects back to the first one – another way to arrange them would be as spokes around a wheel) Each time step you learn in the current layer and predict in the next layer. Prediction is a k-winner algorithm (something like SP, but with weighted instead of binary input). Learning algorithm compares the prediction to reality and nudges the weights. I didn’t draw it above, but another detail is that it predicts multiple timesteps into the future rather than just one, with decaying influence (most recent timestep holds more sway than the one before it, etc)
While HTM is a binary beast, real neurons are spiking with the leading edge of a pulse train establishing phase information and repetition rate establishing an analog value.
HTM temporal memory captures one aspect of phase-related information with the depolarization & firing first/k winner thing.
I feel a bit iffy about variable weights and weight adjustments; a synapse is pretty much an all-or-nothing thing. Substituting synapse weight strength & adjustments in place of adding or removing synapses makes me nervous. I suspect that it would make the network very brittle.
If I understand the circuit correctly, I think it could be adjusted a bit to address the potential brittleness of weight adjustments, where you use a lot of cells in each layer so you can make use of sparsity, and use synaptogenesis rather than weight adjustments. It might be more resource hungry that way, though.
no, I’m not sure your conception matches my own. I’d draw it more like this
A very simple entity with 5 binary muscles and 5 binary sensors. Notice it has a 2-way directionality to the information. As if the motor output that is activated is a sensory input going back in the other direction. Here I’ve drawn the connections of 2 nodes, but allow me to make this explicit by drawing every connection that one node has:
notice when this node gets activated it makes predictions left and right. since it’s more on the motor side of things, it makes predictions to every node on the motor output and has a fair amount of pull as to which nodes actually get activated, since it’s right next to that one. Whereas, it also predicts the sensor input 3 timesteps into the future (waiting some small amount of time so that motor output has time to make an effect on the sensory input). It also predicts the middle, most condensed layer in the very next timestep. forwards and backward, it’s the same pattern. it’s really 2 feed-forward networks layered on top of one another, going in opposite directions.
Ok, so actual sensory input predicts motor output 4 timesteps in the future, and actual motor output predicts sensory input 4 timesteps in the future? Is there any reasoning behind 4, or is that just an arbitrary number (could be anything)?
Also, due to the bi-directional predictions, motor output at time T would be partially of predicting its own output at T + 2, for example (signal goes left at T+1, then back right at T+2), at T + 4, etc. Same for the Sensory input
it’s simply the number of layers. the cell gets activated and predicts it’s adjacent layers on both sides for the immediate next timestep, then the layers one beyond those for the timestep after that, and so on. As if it’s saying, “ok, next door neighbor, I think you’ll become active, and when you do, I think you’ll make this neighbor of yours active.” if that helps…
Correct, I’m exporting all temporal memory to the hierarchy which means its extremely short term Maybe there’s a way to push some longer term sequence memory down into the nodes or layers like CLA or HTM does, but idk, I’m going to see how far this design can go before making it more complex.
I’ve come to interpret what the brain does as finding tons of efficiencies. As another example, I think here I’m proposing a nearly fully connected RNN but really, why? isn’t that overkill? the brain would say so, it would say, you can get nearly the same results dropping 90% of the connections, so long as you keep the same basic proportions of connections.