As it has been described in the papers and other materials I have read/listened to, it seems that the idea of the voting mechanism across cortical columns supposes that different columns may be detecting different features of the sensory stream. Is there some idea of how localized these feature detectors are? Are some columns dedicated mostly to edges and others to colours? Or are these questions unanswerable at the current stage of research?
A priori I would expect a strong degree of localization, to avoid too much redundancy (although for sure a little is good).
On a similar vein, I also wondered whether we might suppose that certain columns are much better at recognising certain whole objects than others.
Definitely, though I would argue any sane learning system is equipped with some level of locality prior. Without good locality, the computational and memory complexity as well as the amount of data required would grow exponentially(and no locality means it’s impossible to learn at all).
But for TBT, it’s more nuanced than that. Cortical columns are not just feature detectors, they treat a feature as an object(or a part of an object) and have the knowledge of manipulating it. Conversely, they can recognise particular configurations(e.g. rotation) of an object, the machine learning and math folks call this property equivariance.
Plus, they are holistic and thus deal with compositional hierarchies. One key difference to a naive locality is that it is not required to directly learn particular configurations of composite objects. The configurations of the component objects translate directly to the configuration of the composite object and vice versa. i.e. it doesn’t take a lot to learn a novel variation of a known object. It just have to learn the variance in relation to the known object. As an example, if you know how a coffee cup looks in every possible orientation, it is trivial to imagine how a cup with a logo on it rotates. You just have to learn the relative pose of the logo in relation to the cup.
Very likely. Though as there’s so much ambiguity at large scales, the columns at the lower levels would have to help them pin down to a specific object. Which in turn, enables fast voting between the lower level columns(like a shortcut).