Hello and thanks for this (so long expected) new episode!
I fail to understand what’s being said in 6:25-7:15. Don’t all the cells have the same number of distal dendritic segments? How can you choose the one with fewer segments?
For me, a segment is basically a set of potential synapses (each connected to either a bit in the input -proximal- or to another cell -distal-, and having a permanence value that makes the synapse be connected -permanence bigger than a threshold- or not). In the examples, each cell is usually shown as having a single distal segment. I know this is for simplicity. But considering cells having more than one segment, I’m pretty sure all of the cells have the same quantity of segments. It seems to me you are somehow mixing “segments” and “(connected) synapses” (but I’m quite sure you are not).
What am I missing?
Thanks a lot!
Each cell may have a different number of segments. Segments are created when bursting occurs, if necessary. If, for example, a mini-column bursts and there are no existing segments from those cells to any previous winner cells in t-1, we randomly select one cell to create a new segment that reaches toward previous winner cells. Now you have this one cell with an additional segment, while the other cells’ segments are left alone. So hopefully you can see how some cells will have more distal segments than others.
I’m not sure where you get this idea, but I’m fairly certain it is not true. Also each segment can have different numbers of synapses along it.
Thanks for your answer, Matt.
Alright, I didn’t know that new segments could be created. However, I’m still a bit confused:
Does that text mean “those cells don’t have any existing segments with synapses to any previous winner cells in t-1”? I don’t know whether I’m confusing synapses and segments, or the reason is that you are using a “relaxed style” (not being exactly precise when the “true” meaning can be understood thanks to the context).
Thanks a lot!
Yes that’s what I mean.
It may have been good pedagogy to contrast a single cell (micro?)column with a two cell (micro?)column in discussing the distinction between first order and second order “sequence memory” (aka “temporal memory” which is the result of temporal learning).
I was confused about the same at first and found a good discussion about it here: Cell segments vs synapses
What caused the confusion for me was that I did not clearly distinguish the initialization between:
1.) The proximal connections made from the Spatial Pooler to the input space. This are randomly initialized and the pool of potential connections should be equivalent for each mini-column (shared by the cells in each, since they have the same receptive field).
2.) On contrast the the distal connections made from each cell of a mini-column do initially have no potential-connections at all. However as a column bursts as shown in the video, a winning cell is chosen (the one with least segments, on ties we just choose randomly) and then builds up a segment.
The segment is an array of synapses - or in HTM terms an array of potential connections to all cells that were previously active. We can separate between the cases from the video now:
2.1) In the initial step this segment would be created (containing potential connections to all prev. active cells), as none of the cells in the mini-column has a segment containing potential connections to the previous winner cells yet.
2.2) The case can occur that there is already a segment containing potential connections to the prev. winner cells, but the permanence values are not high enough so that sufficient actual synapses are established. I.e.
it stays under the connection threshold to activate that segment. (“Closest to a winner”) In this case this cell with the segment with most connections is automatically the winner. This means we increase the permanence values of potential connections of the segment to active cells, decrease permanence values to inactive ones and add new potential connections to the segment for all active cells that were not yet contained.
After some iterations of inputs the winning cells will have a set of segments, for all different contexts it “won”. Where each segment contains potential connections(possibly being actual connections).
When a connections in a segment is formed there can be enough winning cells that are connected to cross a connection threshold. This “actives the segment” and when a cell has one or more segments activated it becomes polarized -> predictive state, making it the winner cell directly.
I think the initialization might not have been 100% clear in the episode. However, amazing explanations of this - when it comes to the details - quite complicated topic.
PS: We can also initialize a set of segments with random connections and then have the same cases just with a faster learning due to less chance to have a tie and needing to building up all segments. This is the way the paper describes the initialization:
If I got anything wrong please correct me.
Regarding proximal synapses:
About randomly assigning permanence values to the random potential synapses during initialization, how is this(random permanence values) helpful?
Giving all those synapses some value just above the threshold would be more beneficial, right? As the potential synapses won’t be lost very quickly due to decrements and would be picked up(become active) easily if new local patterns come in the near future and also the neuron will pick up all the on bits “connected” to it from the start and work on those. Also, will reducing the synaptic permanence increment and decrement values(w.r.t to the distal synaptic increment and decrement) values be beneficial over time?
We want a relatively large pool of potential synapses to be able to represent the input well. Thus we have a pool of e.g. about 85% potential connections.
Now we do not want to make all of them a connection (above threshold permanence) as we want a sparse representation and we want to impose as minimal constraints on the input distribution (which bits represent features) as possible. So we randomly assign permanence values with a normal distribution around the threshold such that about half of them are connected. A normal distribution also ensures the property you wished for that it is easy (with low variance) with decrements/increments to cross the threshold initially.
We could make more connections but this would then need to be balanced against in the inhibition phase when we rank all mini-columns and choose winner columns (as all mini-columns would have more connections).
The values are tune-able but we should have in mind that we want to represent the input distribution with its features as accurately as possible with a sparse representation in the Spatial Pooler.
In this sense it is useful to have all the different mini-columns having a different potential pool and then again initially randomly connections to this input fields. This helps to display different features for each different mini-column, which is later additionally supported by boosting.
Increasing the synaptic permanence increment will lead to faster learning (building connections specific to input).
Increasing the synaptic permanence decrement will lead to faster forgetting.
Thank you @kaikun
I see. This makes sense. Since if most of the minicolumns are highly activated then the winners will have to be selected using a fixed rules that will give consistent SDRs for the same input spaces over time.
Hi all. Another HTM enthusiast here seeking clarification. In the 6:25 - 7:15 section, @rhyolight says we choose the winner cell to be the cell with the fewest segments. He then says:
Now that the winner cell selection is done, we need to either create segments or grow synapses from all the active cells in this bursting column.
Looking at the pseudocode for burstColumn in BAMI (page 9), it looks like the above statement isn’t true. Instead, only a new segment is grown from the winner cell (which makes sense in my wet HTM).
The winner cells are only used for deciding which synapses to grow.
If that’s true–and in every other case it seems to be–then it seems unnecessary in this case to choose a winner cell if we’re not going to…afford it the privilege?..of being the only one who learns this context.
Can anyone please clarify? Am I reading the code wrong? Is this a special case?
Are you saying that we might select one or more winner cells to learn the context? If so, seems like that might work, but I don’t know what other repercussions would come out of it.
No no, I’m not at a level to make any coherent suggestion (yet). I’m just wondering what NuPIC does. If I understand the pseudocode and the video, the code says one thing, the video says another.
Given the specific case presented in the video, does NuPIC create segments and grow synapses from all the active cells, OR like the pseudocode states, does NuPIC only create a segment and grow synapses from only the winner cell?
Whenever a minicolumn is activated, a winner cell is chosen. The winner is either the cell with the segment that best matches the activity in the previous time step (winners or not), or a new segment on a cell with the fewest number of existing segments (with a random tie breaker).
That chosen segment is adapted to better align with the activity from the previous time-step. Existing synapses on that segment which received activity are strengthened, and synapses which did not are degraded.
The above process counts how may of the segment’s synapses received activity (for a newly created segment, this count will of course be zero). If the number is below the desired count, then more synapses are added up to the count. When synapses are added, they are connected only to winner cells from the previous time step.
Ok, so then the pseudocode is correct. The cell with the least number of segments is chosen as the winner. A segment is grown from that winner cell to some subsampling of active cells from the previous time step. The other cells in the burst column, while active, do not grow new segments, nor have their synapses on their existing segments reinforced.