Hello and thanks for this (so long expected) new episode!
I fail to understand what’s being said in 6:25-7:15. Don’t all the cells have the same number of distal dendritic segments? How can you choose the one with fewer segments?
For me, a segment is basically a set of potential synapses (each connected to either a bit in the input -proximal- or to another cell -distal-, and having a permanence value that makes the synapse be connected -permanence bigger than a threshold- or not). In the examples, each cell is usually shown as having a single distal segment. I know this is for simplicity. But considering cells having more than one segment, I’m pretty sure all of the cells have the same quantity of segments. It seems to me you are somehow mixing “segments” and “(connected) synapses” (but I’m quite sure you are not).
What am I missing?
Thanks a lot!
Each cell may have a different number of segments. Segments are created when bursting occurs, if necessary. If, for example, a mini-column bursts and there are no existing segments from those cells to any previous winner cells in t-1, we randomly select one cell to create a new segment that reaches toward previous winner cells. Now you have this one cell with an additional segment, while the other cells’ segments are left alone. So hopefully you can see how some cells will have more distal segments than others.
I’m not sure where you get this idea, but I’m fairly certain it is not true. Also each segment can have different numbers of synapses along it.
Thanks for your answer, Matt.
Alright, I didn’t know that new segments could be created. However, I’m still a bit confused:
Does that text mean “those cells don’t have any existing segments with synapses to any previous winner cells in t-1”? I don’t know whether I’m confusing synapses and segments, or the reason is that you are using a “relaxed style” (not being exactly precise when the “true” meaning can be understood thanks to the context).
Thanks a lot!
Yes that’s what I mean.
It may have been good pedagogy to contrast a single cell (micro?)column with a two cell (micro?)column in discussing the distinction between first order and second order “sequence memory” (aka “temporal memory” which is the result of temporal learning).
I was confused about the same at first and found a good discussion about it here: Cell segments vs synapses
What caused the confusion for me was that I did not clearly distinguish the initialization between:
1.) The proximal connections made from the Spatial Pooler to the input space. This are randomly initialized and the pool of potential connections should be equivalent for each mini-column (shared by the cells in each, since they have the same receptive field).
2.) On contrast the the distal connections made from each cell of a mini-column do initially have no potential-connections at all. However as a column bursts as shown in the video, a winning cell is chosen (the one with least segments, on ties we just choose randomly) and then builds up a segment.
The segment is an array of synapses - or in HTM terms an array of potential connections to all cells that were previously active. We can separate between the cases from the video now:
2.1) In the initial step this segment would be created (containing potential connections to all prev. active cells), as none of the cells in the mini-column has a segment containing potential connections to the previous winner cells yet.
2.2) The case can occur that there is already a segment containing potential connections to the prev. winner cells, but the permanence values are not high enough so that sufficient actual synapses are established. I.e.
it stays under the connection threshold to activate that segment. (“Closest to a winner”) In this case this cell with the segment with most connections is automatically the winner. This means we increase the permanence values of potential connections of the segment to active cells, decrease permanence values to inactive ones and add new potential connections to the segment for all active cells that were not yet contained.
After some iterations of inputs the winning cells will have a set of segments, for all different contexts it “won”. Where each segment contains potential connections(possibly being actual connections).
When a connections in a segment is formed there can be enough winning cells that are connected to cross a connection threshold. This “actives the segment” and when a cell has one or more segments activated it becomes polarized -> predictive state, making it the winner cell directly.
I think the initialization might not have been 100% clear in the episode. However, amazing explanations of this - when it comes to the details - quite complicated topic.
PS: We can also initialize a set of segments with random connections and then have the same cases just with a faster learning due to less chance to have a tie and needing to building up all segments. This is the way the paper describes the initialization:
If I got anything wrong please correct me.
Regarding proximal synapses:
About randomly assigning permanence values to the random potential synapses during initialization, how is this(random permanence values) helpful?
Giving all those synapses some value just above the threshold would be more beneficial, right? As the potential synapses won’t be lost very quickly due to decrements and would be picked up(become active) easily if new local patterns come in the near future and also the neuron will pick up all the on bits “connected” to it from the start and work on those. Also, will reducing the synaptic permanence increment and decrement values(w.r.t to the distal synaptic increment and decrement) values be beneficial over time?
We want a relatively large pool of potential synapses to be able to represent the input well. Thus we have a pool of e.g. about 85% potential connections.
Now we do not want to make all of them a connection (above threshold permanence) as we want a sparse representation and we want to impose as minimal constraints on the input distribution (which bits represent features) as possible. So we randomly assign permanence values with a normal distribution around the threshold such that about half of them are connected. A normal distribution also ensures the property you wished for that it is easy (with low variance) with decrements/increments to cross the threshold initially.
We could make more connections but this would then need to be balanced against in the inhibition phase when we rank all mini-columns and choose winner columns (as all mini-columns would have more connections).
The values are tune-able but we should have in mind that we want to represent the input distribution with its features as accurately as possible with a sparse representation in the Spatial Pooler.
In this sense it is useful to have all the different mini-columns having a different potential pool and then again initially randomly connections to this input fields. This helps to display different features for each different mini-column, which is later additionally supported by boosting.
Increasing the synaptic permanence increment will lead to faster learning (building connections specific to input).
Increasing the synaptic permanence decrement will lead to faster forgetting.