In the recent live stream, Jeff introduced a big idea that the initial function performed on input that comes into a cortical column, is to pool those inputs with an orientation signal. In the discussion, he concluded that the output of this function would be equivalent to place cells.
This theory makes a lot of sense to me, and fills some gaps in the earlier models like sensor orientation and scaling. However, where I became a little stuck in understanding the theory is with what happens after that initial function. In the whiteboard diagram, Jeff depicts this output of place cells being used as the input to a layer which gets its context from grid cells representing location. This seems redundant to me.
Grid cells represent a specific point on a specific object, but isn’t that the same information as the output of the first function (which is also a representation of a specific point on a specific object)? It seems like the output of the first function is the location, and thus another layer of grid cells to depict location should not be necessary. Additionally, I really wasn’t able to identify on the whiteboard which of the layers would represent a stable representation of the object.
Just from the lines on the whiteboard (not being a neuroscientist, this could be a stupid conclusion), it would make sense to me if L5 were performing a pooling function, and its output depicted stable object representations. This would explain the lateral connections with L5 in other columns (for voting) and the projections through the thalamus to another region (supporting composite objects and hierarchy). From this perspective, the output of L2/3 could be interpreted as a stable representation of the object from one position, and the output of L5 as a stable representation of that object from all positions.
But this still doesn’t explain what is happening in L6B, which is that layer labeled as “location” on the whiteboard. Why would L5 need input from L2/3 representing a specific location on a specific object, and from L6B also representing a specific location on a specific object? Why is this layer needed (it seems redundant)? Is this related to displacement and composite objects?
Anyway, I’m still mulling over the ideas. Figured I would post my initial thoughts and have some discussions with the smart folks in the community here to try and smooth out the rough areas.