Clustering operator as a model of Cortical column

“6) Cortical columns have the capability to incrementally learn from their inputs, storing internal
representations that can generalize from few exemplars to useful invariants. For instance, each
column of the Inferior Temporal (IT) visual region responds to different orientations or partial views
of specific animate or inanimate objects (e.g., faces) Tanaka (1996). Each column is highly selective
(e.g., it only responding to faces) but it is also has strong generalization abilities (e.g., responds to
the same face independent of rotation, light, occlusions). In other words, it appears that a cortical
column stores related “prototypes”, and exemplars that are similar to that prototype are recognized
by that column Kiani et al. (2007); Kriegeskorte et al. (2008). From the computational perspective,
this is essentially an online clustering operation: an input vector is mapped to its nearest cluster
centroid (according to some distance metric). Additionally, the centroid of the chosen cluster is
adjusted incrementally with every new exemplar so that it comes a bit closer to that input vector -
this is how an online clustering module gradually learns the structure of the input data.
7) An online clustering algorithm that is similar to k-means (and asymptotically equivalent to kmeans) can be implemented with a rather simple recurrent neural network of excitatory and inhibitory spiking neurons, as shown recently Pehlevan et al. (2017). That circuit models the olfactory
system in Drosophila but similar recurrent E/I circuits are also present in cortical columns.
In summary, a STAM module integrates the following computational functions: intrinsic dimensionality reduction through a multi-layer feedforward hierarchy, online clustering and associative
memory formation (i.e., detecting new patterns, learning and updating centroids based on those patterns, and forgetting outlier patterns), and generating top-down predictions. The STAM architecture
is described in more detail next.”


If you have 2 vector to vector associative memory systems that have been randomly initialized in different ways then on some trigger you can get them both to learn the same thing. When recalling you can look at the difference (or angualar distance) between the two. If it is zero you can be very sure the output is correct and is recalling something seen before. However in higher dimensions even an angular distance of say 45 degrees would be very close. You can be sure you have seen something before even if you cannot exactly recall it. If you know the feeling.
If you don’t use a trigger and just use a low learning rate with the 2 associative memories you probably can learn clusters just by online storing the data in the memory systems.
Again if you use a low learning rate with randomly initialized associative memory you can get them to learn each others output, given an online stream of input data. I guess that would cluster too to give unsupervised feature learning.
I don’t know if there is any evidence of pairwise memory in biological systems.

Not sure if I got you right, but why does it have to be pairwise? Clustering is many-to-one, and this model, as well as the cortex, performs hierarchical clustering.

Pairwise is the simplist case.

A quite subltle case occurs when you have a single randomly initialized associative memory whose undercapacity behavior gives some error correction.
Okay you supply an input and get a random output. What can you do with that if you are only given an input and no target? You can add some (a little) noise to the output and train the associative memory to that. That will hammer that response into the associative memory (over time) giving error correction to the response.
Without any noise the associative memory would remain unchanged and there would not be any pull/error correction to the response. You end up quantizing to certain output responses. Which I suppose you may regard as clustering.
You could imagine ways of training deep networks with that idea. I have no clue how well or badly that would work out for you.

Thanks, but I personally have no use for noise and neural nets. My model is self-contained hierarchical clustering, based on also hierarchically extended set of encapsulated parameters. Any variation is driven by feedback from higher levels, which represent generalized older inputs: