Thanks so much for the great visualization @Paul_Lamb!
I would say it is conceptually accurate to my described pooling layer, except for the fact it seems to show the features coming in at a sequence, when in reality, according to this pooling layer mechanism, this is not strictly necessary-- although the pooling layer mechanism would still work.
On an a related note, it seems current discussion has been about how the pooling layer mechanism might not be able to simply use just the proximal input at the current timestep- it might need something more, as @jacobeverist puts it, “extra sauce” to make it work effectively. oddly, from my understanding of the PL mechanism, i would say this is not the case; i have the idea that the PL mechanism can learn from merely a single proximal FF Feature-Location / Feature-Context pair input per timestep, ie, a single set of active cells per timestep.
i think some slight confusion here occurs when we think about where the “intersection between the previous features and the current feature” is occurring. paradoxically, it is not anything intuitive like an overlap score, or taking the logical-AND of the features, rather, it is more subtle, and less restrictive. i hypothesize the intersection is actually happening, although quite slowly, in the inhibition of cells, using the distal dendritic connections between the union of all activated cells on all features known about the object. this is somewhat paradoxical, (and possibly confusing) as it says you need to make the union,(using the growth of distal connections) and then you need to find the intersection of them (using CIUI-- keep in mind, that this wont create a perfect intersection, but rather, one that simply favors cells with the highest overlap between other features in the set of all features on an object.). but regardless of its initial un-intuitiveness, i think this, at least conceptually, the process that is actually happening in the pooling layer.
but let’s make sense of it. to break it down a little further, let’s imagine an object, let’s say a coffee cup, which has three features, the handle, the rim, and the bottom. now lets also imagine a newly created layer 2/3a, which is known to be a PL (according to myself, and Numenta’s work). also, for the sake of example, let’s say the distal dendritic learning rate of the layer is super high/aggressive, so that it can learn a fully connected distal connection (or multiple) in a single timestep.
{t = 1}: first, let’s imagine that a finger on the robot feels the rim of the coffee cup. the pattern representing the rim of the coffee cup (a feature location pair, whose details are irrelevant) excites cells named cell_A, cell_B, cell_C, cell_D. cell_E, and cell_F in the PL.
(the exact configuration of these cells is irrelevant, but the cell names are relevant as they uniquely identify the cell in 2/3a.)
(also notice there seems to be quite a bit of cells active. this is because the cells in the inference layer that gives 2/3a input, let’s say layer 4, don’t just synapse onto one cell in 2/3a, and thats it. just like how any thalamic axon synapses onto many cells in layer 4, the axons from cells in layer 4 synapse onto many cells in layer 2/3a. … thats why cells A B… through F are all active from a single (possibly quite sparse) FF input from layer 4. this will be relevant later on.)
now that we have a set of proximally active cells in layer 2/3a, lets learn. imagine cell_A immediately growing a set of dendrites that makes connections with cells B, C, D, E, and F. now imagine that this same process repeats for all other cells, so cell_B does the same, and grows distal connections with cells A, C, D, E, and F, and cell_C connects with A, B, D, E, and F, …and so on.
if you imagine all of these connections at once, (assuming an extremely high learning rate, and very little decay of distal dendrites over time) you get essentially a fully connected “self-associative” graph of cells that all have distal connections amongst themselves.
{t = 2} everything described till now has been about {t = 1}, or very shortly after {t = 1}, but before {t = 2}. now, at {t = 2} imagine a new FF sensory input about the cup coming from layer 4 into layer 2/3a, let’s say we are sensing the handle now. a new set of quite unrelated cells are active, lets call them cell_C, cell_F, cell_G, cell_H, cell_J, and cell_K.
we can see now, that, compared to our previous feature, there are actually very little cells in common between them. this is ok, as you will see later.
now lets do our distal dentritic growth thingy we did last timestep. imagine that cell_C, grows connections with cell_F, cell_G, … and so on, for every single cell. now if we skip ahead, till after these new growths have been made, lets look at the resultant connections of the layer:
cell_A: 5 connections: (B, C, D, E, F)
cell_B: 5 connections: (A, C, D, E, F)
cell_C: 9 connections: (A, B, D, E, F, G, H, J, K)
cell_D: 5 connections: (A, B, C, E, F)
cell_E: 5 connections: (A, B, C, D, F)
cell_F: 9 connections: (A, B, C, D, E, G, H, J, K)
cell_G: 5 connections: (C, F, H, J, K)
cell_H: 5 connections: (C, F, G, J, K)
cell_J: 5 connections: (C, F, G, H, K)
cell_K: 5 connections: (C, F, G, H, J)
so now, finally, using CIUI (“competitive ion uptake inhibition”), we can imagine that cell_C and cell_F would definitely be in the current representation of the object. but, something probably less noticeable, is that these might not be the only cells in the representation, on the next timestep, because of the fact cells C and F don’t have that much of an edge over all the other cells. this is why it is important to have a decent amount of different feature-location’s on an object that is trying to be modeled (i dont know what that number is though).
the important take away from this, however, is that the whole act of "taking the intersection of all the feature-locations known about an object, and using that as your representation of the object is leaving out a critical idea: that for a low number of features, this intersection will not happen, so cleanly, because of CIUI, a cell needs to have a large difference over its neighbors, in order to start inhibiting them from appearing in the final representation of the object. this is done by creating a web or union of distal connections between all cells in every feature, and using this union to determine which cells will be the final representation.
i hope that all made sense! i know it’s a lot to take in.
MAJOR EDIT:
after writing that i came to relization that the method i described earlier might not work as effectively as it should. the one little thing we need to add, which i have thought about for a while, and actually what @Paul_Lamb noted in his experimentation with the mechanism, is that we need to connect to the cells in the previous timestep as well. …i think. i haven’t thought through this possibility too much yet, but i suspect it might work.
i will be determining whether a PL needs to connect to:
- both {t} currently active cells, AND {t-1} previously active cells,
or
- just {t} currently active cells.
In order to determine which is the correct mechanism for a PL, i will be coding both options, (the change is super simple in my current code) and seeing which one produces a more static result.