Suggestion for why the output layers (probably) have minicolumns

Each object exhibits particular sequences, so perhaps those possible sequences should be included in object representation somehow.

Since L5 TT cells receive direct sensory input and form minicolumns, perhaps a subset of cells in each minicolumn belong to each object, and those cells respond to all sequences which make sense for the object. That way, it can use sequences to help recognize the object, and potentially generate behavioral sequences based on the object.

A random set of cells in each minicolumn could be assigned to each object, or some other way of forming object context, but it must track the sequence independently for each possible object. How the sequence will progress depends on the object, even if the same first part of the sequence happens to occur on two different objects, which is likely because a lot of objects share features like flat surfaces. As it narrows the possible objects down, it eliminates possible sequences.

To track sequences independently for each possible object, predictive connections can be limited to cells which belong to the same object.

When you say sequences, what do you mean? Do you mean behaviors? Or do you mean sequences of sensations at locations?

I guess I mean both. Whether the fingertip changes an object, the object changes on its own, or the fingertip touches an object without changing it, sequences of sensations and self-movements (or related things like locations) can help narrow down the possible objects.

I’m not sure there’s another way to recognize most objects, even static ones. Depending on how the fingertip moves across or onto a surface, the skin distorts depending on the sequence of movements, so orientation and location are not enough to recognize features independently of how they are contacted. That depends on the surface texture and shape, too. If it can recognize an object in only certain cases, it can start to pool sequences into objects and then recognize the object in more and more situations.

Layer 5 is both an output layer and the motor output, so I assume it represent objects, sequences of sensations, and sensorimotor sequences in some combined way. By doing so, it can produce sequences of motor outputs which fit the object and pool the various sequences of sounds a river makes into an object representation.