An Apical Depolarization for Numenta: How to Generate the Allocentric Location Signal

Yes, sorry I should have explained that a little better. You could imagine three fingers sensing the coffee cup, and that input represented via the three groups of cells on the bottom. Those are then pooled in the layer above them.

This is intended to depict an overly-simplified representation of the current SMI algorithm, with @dwrrehman’s pooling strategy in place of Numenta’s reset + random SDR strategy for the output layer. I just noticed that I forgot to depict the predictive states in the input layers, so that might have led to part of the confusion.

Note that this pooling strategy (at least in my current understanding of it) requires two or more features to be input at once. This is allows it to make associations between them. For example, you can imagine in steps 7 - 12, two fingers are stationary on the coffee cup and the other finger is exploring it. The learning algorithm applied to the proximal synapses results in associations between the features being formed, eventually resulting in a pooled representation.

Another way to accomplish the required association between features (if you want to have a one-for-one relationship between inference and pooling layers) is for the pooling layer to take proximal input from the inference layer over two timesteps.

Also note that the purpose for showing horzontal rows of many active cells in the first few timesteps was to highlight the fact that it is not the traditional minicolumn bursting. In practice, these cells would be randomized over the layer, but aligning them this way makes it easier to visualize what is going on.

1 Like

Paul,

Did you ever publish your code for the pooling implementation?

I am modifying it, since my initial implementation was defective (due to conflicting logic with SP and minicolumn concepts). I’ll publish the code once I have made the necessary corrections.

I actually think seeing how it fails is very interesting and useful. Building these cognitive architectures requires a robust understanding of how the individual components work and don’t work.

I also did my own drawing of @dwrrehman SMI architecture but broke it down by functionality instead of neurogeography. For those of us who aren’t neuroscientists, it actually helps to see how the pieces work together on a functional level. I’ll post it here when I clean it up.

1 Like

I may go back and finish the demo at some point (I have a hard time making myself finish working on obvious dead-ends, lol).

At a high level, the reason it failed is because I was assuming traditional SP was handling the proximal stream to the pooling layer. This meant that in order for pooling to occur, all the feature/location representations of the object needed significant overlapping bits, so that they activated a significant number of the same columns after SP.

In more “realistic” (presumably) scenarios without a lot of overlap between feature/location representations, you would end up with just a couple active cells after inhibition, and usually with more than one representation for the object (consisting of only a couple cells each). The demo I started was intended to highlight this deficiency.

Perhaps that significant overlap is not sufficiently provided by the proximal stream. It may require its associated inference layer to constantly being feeding predictions into the pooling layer and may use some of the other data streams.

I don’t think feature/location pairs by themselves should have any overlap in general. The relationship comes from their associations through transformations of the object. The inference layer is providing that transformation based on egocentric features which are motor commands as well the built-up model of the object in the pooling layer.

I think this gets into @dwrrehman’s “slow loop” and “fast loop”. It’s a pretty interesting concept and I’d love to see how they differ in practice. But Daniel’s proposed pooling layer implementation probably won’t work in isolation with simple proximal inputs. It needs extra sauce to create that overlap condition.

hi thanks yes that helps thanks so much…but also greedy for more…do you have any neuroscientific references for these layers? If you have I can reference with my own understanding and model…rgds finn

@fine2100 Take a look at the last 5 pages of the paper at the top of this thread for a categorized list of references.

thanks:-)

Good point – overlap in the inference layer’s active cells could potentially be the result of other data streams going into the inference layer. Still, this particular restriction isn’t an issue if you eliminate the SP and handle the proximal learning algorithm a different way (and by ensuring the representations being pooled contain at least two inputs at a time)

Thanks so much for the great visualization @Paul_Lamb!

I would say it is conceptually accurate to my described pooling layer, except for the fact it seems to show the features coming in at a sequence, when in reality, according to this pooling layer mechanism, this is not strictly necessary-- although the pooling layer mechanism would still work.

On an a related note, it seems current discussion has been about how the pooling layer mechanism might not be able to simply use just the proximal input at the current timestep- it might need something more, as @jacobeverist puts it, “extra sauce” to make it work effectively. oddly, from my understanding of the PL mechanism, i would say this is not the case; i have the idea that the PL mechanism can learn from merely a single proximal FF Feature-Location / Feature-Context pair input per timestep, ie, a single set of active cells per timestep.

i think some slight confusion here occurs when we think about where the “intersection between the previous features and the current feature” is occurring. paradoxically, it is not anything intuitive like an overlap score, or taking the logical-AND of the features, rather, it is more subtle, and less restrictive. i hypothesize the intersection is actually happening, although quite slowly, in the inhibition of cells, using the distal dendritic connections between the union of all activated cells on all features known about the object. this is somewhat paradoxical, (and possibly confusing) as it says you need to make the union,(using the growth of distal connections) and then you need to find the intersection of them (using CIUI-- keep in mind, that this wont create a perfect intersection, but rather, one that simply favors cells with the highest overlap between other features in the set of all features on an object.). but regardless of its initial un-intuitiveness, i think this, at least conceptually, the process that is actually happening in the pooling layer.

but let’s make sense of it. to break it down a little further, let’s imagine an object, let’s say a coffee cup, which has three features, the handle, the rim, and the bottom. now lets also imagine a newly created layer 2/3a, which is known to be a PL (according to myself, and Numenta’s work). also, for the sake of example, let’s say the distal dendritic learning rate of the layer is super high/aggressive, so that it can learn a fully connected distal connection (or multiple) in a single timestep.

{t = 1}: first, let’s imagine that a finger on the robot feels the rim of the coffee cup. the pattern representing the rim of the coffee cup (a feature location pair, whose details are irrelevant) excites cells named cell_A, cell_B, cell_C, cell_D. cell_E, and cell_F in the PL.

(the exact configuration of these cells is irrelevant, but the cell names are relevant as they uniquely identify the cell in 2/3a.)

(also notice there seems to be quite a bit of cells active. this is because the cells in the inference layer that gives 2/3a input, let’s say layer 4, don’t just synapse onto one cell in 2/3a, and thats it. just like how any thalamic axon synapses onto many cells in layer 4, the axons from cells in layer 4 synapse onto many cells in layer 2/3a. … thats why cells A B… through F are all active from a single (possibly quite sparse) FF input from layer 4. this will be relevant later on.)

now that we have a set of proximally active cells in layer 2/3a, lets learn. imagine cell_A immediately growing a set of dendrites that makes connections with cells B, C, D, E, and F. now imagine that this same process repeats for all other cells, so cell_B does the same, and grows distal connections with cells A, C, D, E, and F, and cell_C connects with A, B, D, E, and F, …and so on.

if you imagine all of these connections at once, (assuming an extremely high learning rate, and very little decay of distal dendrites over time) you get essentially a fully connected “self-associative” graph of cells that all have distal connections amongst themselves.

{t = 2} everything described till now has been about {t = 1}, or very shortly after {t = 1}, but before {t = 2}. now, at {t = 2} imagine a new FF sensory input about the cup coming from layer 4 into layer 2/3a, let’s say we are sensing the handle now. a new set of quite unrelated cells are active, lets call them cell_C, cell_F, cell_G, cell_H, cell_J, and cell_K.

we can see now, that, compared to our previous feature, there are actually very little cells in common between them. this is ok, as you will see later.

now lets do our distal dentritic growth thingy we did last timestep. imagine that cell_C, grows connections with cell_F, cell_G, … and so on, for every single cell. now if we skip ahead, till after these new growths have been made, lets look at the resultant connections of the layer:

cell_A: 5 connections: (B, C, D, E, F)
cell_B: 5 connections: (A, C, D, E, F)
cell_C: 9 connections: (A, B, D, E, F, G, H, J, K)
cell_D: 5 connections: (A, B, C, E, F)
cell_E: 5 connections: (A, B, C, D, F)
cell_F: 9 connections: (A, B, C, D, E, G, H, J, K)
cell_G: 5 connections: (C, F, H, J, K)
cell_H: 5 connections: (C, F, G, J, K)
cell_J: 5 connections: (C, F, G, H, K)
cell_K: 5 connections: (C, F, G, H, J)

so now, finally, using CIUI (“competitive ion uptake inhibition”), we can imagine that cell_C and cell_F would definitely be in the current representation of the object. but, something probably less noticeable, is that these might not be the only cells in the representation, on the next timestep, because of the fact cells C and F don’t have that much of an edge over all the other cells. this is why it is important to have a decent amount of different feature-location’s on an object that is trying to be modeled (i dont know what that number is though).

the important take away from this, however, is that the whole act of "taking the intersection of all the feature-locations known about an object, and using that as your representation of the object is leaving out a critical idea: that for a low number of features, this intersection will not happen, so cleanly, because of CIUI, a cell needs to have a large difference over its neighbors, in order to start inhibiting them from appearing in the final representation of the object. this is done by creating a web or union of distal connections between all cells in every feature, and using this union to determine which cells will be the final representation.

i hope that all made sense! i know it’s a lot to take in.

MAJOR EDIT:

after writing that i came to relization that the method i described earlier might not work as effectively as it should. the one little thing we need to add, which i have thought about for a while, and actually what @Paul_Lamb noted in his experimentation with the mechanism, is that we need to connect to the cells in the previous timestep as well. …i think. i haven’t thought through this possibility too much yet, but i suspect it might work.

i will be determining whether a PL needs to connect to:

  • both {t} currently active cells, AND {t-1} previously active cells,
    or
  • just {t} currently active cells.

In order to determine which is the correct mechanism for a PL, i will be coding both options, (the change is super simple in my current code) and seeing which one produces a more static result.

1 Like

Yes, that was just intended to make it easier to understand what was happening in the logic. There isn’t any requirement for a particular order of feature inputs, or for some of the features not changing while another does (all three fingers in this scenario could be randomly exploring the object simultaneously)

ahh, yes, that makes sense why you diagrammed it that way then. just making sure. :stuck_out_tongue:

I specifically am considering this in order to form an association between two inputs (in the case where we don’t have multiple “fingers” contributing to the proximal data stream). This should generate an overlapping representation in the pooling layer that contains elements of the two inputs. Then the learning algorithm for the proximal synapses should cause the cells in the pooling layer associated with each of the two inputs to align a little bit better with the opposite input. Over time (in theory) this should converge on a single representation for the object.

Note that I haven’t gotten that far on the implementation yet, so that is just a theory at this point.

That would have a nice symmetry to it if that was the case. The inference layer projects to the future and the pooling layer connects to the past.

1 Like

I think the “extra sauce” in this case is that the input needs to satisfy certain properties for it to work correctly. Namely:

  1. The proximal synapses should activate a sufficient number of neurons such that nearly all possible inputs should have some neurons in common.

  2. A sufficient number of inputs should be received before pooling representation stabilizes.

For (1), if two features don’t have any active neurons in common, they can never be associated together. To ensure intersections you would probably need to increase the activation rate much higher than 2%. Perhaps a more likely mechanism would be a kind of boosting that is constantly trying to make connections to previously active neurons that are not related to the current input. An inverse boosting where you try to make new synaptic connections to highly active neurons instead of the inactive ones.

But maybe condition 1 is not required? Perhaps this loosely associated distal connected network can elect a few leaders to take over the representation even when the component features don’t all have common neurons. You just select the most intersected neuron.

Maybe that’s what @dwrrehman has been talking about, when he refers to “union of distal connections”?

1 Like

I’m not sure about that… as long as in a single time step you can activate cells in the pooling layer which are connected with two separate features (either by multiple parallel inference layers or by taking more than one time step as input) you should be able to apply the learning algorithm to their proximal synapses and thus increase the likelihood of overlap between those two features in the pooling layer. As all of the features begin to have more and more overlap through multiple iterations of this process, eventually a single representation should emerge.

This is one element of the logic that I don’t quite grasp, I think. I may be contemplating a somewhat different pooling strategy than what he is imagining.

Will you also be publishing pseudo-code with it? I’m really interested in the temporal pooling theory behind it and I find pseudo-code helps best. Really cool stuff and eager to get my mind fully around it.

I’ll give it a shot. I’m not sure the proper way to write pseudo code though, so will probably be somewhat java-scrypty

1 Like

It looks many people here (including me) have difficulties with the holistic understanding of your theory.
Could you provide a short simplified example of its application to a reduced case of visual recognition of simple 2D shapes with a focus on invariance?
I believe it would help a lot to the community.

1 Like