Hello, I am new to this forum, so I’m not sure if it is a proper forum branch to write this.
I really like HTM and its attempt to explain brain in a mechanistic way. I found so many ideas that are really good. However, I have concerns due to my not complete understanding or due to a limitation of HTM.
Does it have limited memory capacity?
The sparsely active network can have combination(N, a) states where N total number of cells and ‘a’ - number of active cells. HTM neuron has many dendritic branches (say k) where each should learn some pattern. In this case, total number of learned sequences/transitions is simply kN/a much lower than combination(N,a), because after this number kN/a all dendritic branches will be used and no new pattern will be stored. For k=100, N=1000, a=25, capacity = 4000
How to choose between multiple predictions?
You stored “synapse” and “strength”. After presenting ‘s’, a prediction for both ‘y’ and ‘t’ will be active, how to chose between two if there will be no further input, and I want to retrieve the whole sequence? It seems that in the brain strength of a prediction encoded or in a firing rate or in a size of a sparse population. The more probable prediction has a higher rate or larger population that will be more likely sent further in a hierarchy or be selected by attention. If prediction strength encoded by the size of SDR than there is a limit how many predictions you can have because more prediction will decrease sparseness, so some attention mechanism that alternate different SDR should be present.
How silence is encoded? It is a crucial for timing, to learning delays and durations. One of a possibility that duration in the brain encoded within continuously changing active populations (called state-dependent computation) when inhibitory neurons work like a pacekeepers by constantly sculpturing new SDR, however, HTM lacks this autonomous dynamics.
How hierarchy works? It cannot be just a collection of layers connected via spatial poller, how to use top-down connections? It is believed that top-down connections serve as a way to send prediction because higher levels track sequence on a larger time scale and could send to lower a prediction what will be next. It is relative to the second question about selecting a prediction.
Does one layer have to learn the whole sequence? Currently when you store ‘synapse’ than ‘y’ is encoded in column pattern for context ‘s’, for ‘p’ context is ‘syna’. It means that if you want to store every word then each column has to learn contexts from all words (in light of the first question, it will quickly saturate memory). Would not it will be more reasonable to have patterns reuse, where the whole sequence is a collection of simple elements spread across a hierarchy? In other words in the first layer you learn transition without context but learn transition of ‘sy’ to ‘na’ in the second layer, and so on further in a hierarchy where a context is spread across many layers.
I’ll try to answer some of your questions, hopefully correctly.
I’m not sure if NuPic does this, but it can add segments as needed instead, so the cap is based on computing power or how long you’re willing to wait.
The temporal memory is very sparse because only one cell is on per column (mini or microcolumn, not cortical column, which is bigger). If you have 2000 minicolumns, 30 cells per column, 100 dendritic segments per cell, and 40 columns active, then there are 60,000 cells, 40 of which are on. More will be on if a column bursts, but columns burst when there’s no segment for the transition, so that’s not a factor. k = 100, n = 60000, a = 40, so capacity = 150000. I think I read that the capacity is millions of sequences, so there might be some re-usability.
The confusing thing about temporal memory is that, it’s not about making predictions. It predicts the next state, but only as part of a process to track the sequence of inputs which led to the current input. The brain can predict entire sequences, but as far as I know, HTM doesn’t have a biological solution yet. It’s not required for understanding the world, at least perception, which is HTM’s current focus. The prediction also needs to be human readable.
Instead, HTM uses an artificial classifier. There’s a new algorithm which I don’t know, so I’ll describe the old one broadly. There is one classifier block for each distance into the future you want to predict. If you want to predict 10 steps into the future and only 10, you need one block, but if you want to predict 1-3 steps into the future, you need 3 classifier blocks, and the all do the same thing. Let’s say you have a classifier block that predicts 5 steps into the future. Each cell in the temporal memory has a corresponding row of numbers, one number for each classification. It keeps track of how often each cell is active 5 steps before the classification is the input. To predict the likelihood of one possibility, it adds up the corresponding likelihood value for each active cell in the temporal memory. The point of temporal memory is to track the sequence, since the current input alone isn’t very useful for predictions.
Generally, it’s not worth predicting it if there are too many possibilities, because it’s essentially random and not really a sequence. Instead, other approaches are required, like object recognition. Another possible solution is to use a denser SDR but use higher sensitivity to the combination, so you can tell apart each item in the union. Since I’m speculating, I think the brain uses some other solutions, but I also think it alternates between SDRs like you mention. Attention is definitely required, but there are other ways to swap between SDRs. Neurons in the brain can be very sensitive to timing, specifically, some synapses only do much if a bunch of them activate at the same time. This is true for thalamic input to the cortex. Extremely sparsely active layer 6 cells project to the thalamus in a way which synchronizes cells corresponding to a feature, such as a line. That synchrony-based code wouldn’t exactly switch between SDRs, but it would change which cells are firing in synchrony. Even if the SDR is very dense, since only a few patterns are synchronous at a time, it is effectively sparse. That’s just a hypothesis.
If I understand it correctly, salience means how much something should attract attention. Attention is really complicated, since it involves motivation and emotion, as well as novelty detection. It might be a while before HTM is capable of attention. There are other functions to solve before attention can help much.
Jeff Hawkins has a theory or hypothesis about timing. Someone asked about those ideas recently (the post is called Timing Circuits). There’s also a video on the HTM source channel where he talks about it, among other things. It involves a timing signal from the thalamus to layer 1.
I'm going to rant a bit about some research I've been doing here.
In various structures, including parts of the thalamus and the cortex, individual firing rates and the number of active cells increase as an action approaches, until it reaches a threshold (basically). This mechanism could also track timing of sensory input, but I haven't researched it much. By adjusting the rate at which activity increases (ramps), actions can be timed. This code is really powerful, I think. For example, in part of the frontal cortex, the order of each planned action is represented by activity rank, so the next action has the strongest activity, the action after that has weaker representation, and so on. Then, you can apply a ramping signal to all of them, as a single action sequence representation, and they will surpass the execution threshold in the proper order. I also think this isn't a timing signal, but a gradual switch from representing the current sensorimotor input to the desired sensorimotor input. I'm pretty confident this is the case.
Somewhat recently, HTM made a big advancement, and hierarchy might need to be reformulated. For example, higher levels in the hierarchy might recognize objects which cover larger sensory areas, such as 20 degrees of your field of view rather than 1 degree. Right now, HTM is trying to do as much as possible with a single region.
Hierarchy is probably required to learn every word, but each column needs just 1 dendritic segment for each letter it responds to, or 40 dendritic segments in the whole thing with normal parameters. It’s also possible that object recognition would just recognize the whole word without a capacity issue.
Thanks for your thorough answer. Sory for misspelling salience should be silence (how to encode absence of input between two sequences, or between two sounds). I like your ideas about activity rank to select a planned action. To realize it with binary neurons (like in HTM +predictive state) I assume it is necessary to include explicit inhibitory neurons, that would not just inhibit surroundings, but have selectivity, like some types of inh. neurons in the brain.