Gradient Clusteron Visualization

I’ve implemented a visualization of the Gradient Clusteron algorithm described by Toviah Moldwin, in this video, and this paper. Please let me know if you have questions, comments, or suggestions on how to improve the demo.

In this example, we establish a 1D domain with four possible input
features at each location (e.g. one of four base pairs). The top line
is the input domain. On this line, you will see three sensor patches
moving randomly over the domain. Each sensor patch has five sensors
and each sensor has four detectors that distinguish whether a feature
is present or not under each sensor.

The three lines below show a target pattern and current input to each
of the three sensor patches. For each sensor patch, there are 20
pre-synaptic neurons (not shown), and of these, five will be active
during each cycle (corresponding to the currently active detector on
each sensor).

The graphs shown below are visualizations of dendrites for three
post-synaptic neurons. Each dendrite has 20 synapses attached to the
20 pre-synaptic neurons corresponding to the detectors within each
sensor patch. Every synapse has an associated weight and location on
the dendrite. The synaptic weights are indicated by the vertical bars
along the dendrite, and location is represented by its position along
the x-axis.

Activated synapses will generate a localized effect along the dendrite
inversely proportional to the distance away from the synapse
location. This effect is indicated by the Gaussian bump centered on
each active synapse.

The dendrite activation is depicted by the thicker plot line, and the
total activation is integrated into the bar on the far right. The
white horizontal line on this bar is the post-synaptic neuron firing
threshold. These plots will take on different colors depending on the
current state. Green for successful detection of the target pattern
(true-positive), gray for successful non-detection of the pattern
(true-negative), red for detection of the pattern when not present
(false-positive), and cyan for failure to detect the target pattern
when present (false-negative).


Can you summarize what a Gradient Clusteron is, what’s interesting about it and its significance?

Here is a shorter presentation by Toviah Moldwin, the author of the paper.

1 Like

I become highly suspicious of peculiar terminology, e.g., “clusteron”, but that is just me.

Without looking at any of it, the first thing that came to mind was Edelman’s Brain-Based Design (BBD) and Neural Darwinism–things that really need attention.

It seems to be a neuron model that does clustering on its synapses.
but yea, I get turboencabulator vibes from most terms used in machine learning/computer science.

From my perspective the gradient clusteron algorithm is an interesting variation on the classic Hebbian learning rule. Rather than simply having a single scalar weight value controlling the strength of a signal transmitted from a presynaptic neuron to a post-synaptic neuron, the clusteron also includes a distance-weighting term. Synapses that are closer together will produce a non-linear response in the dendrite: the response gets stronger as they get closer together, and potentially falling to zero if they get too far apart.

h(x) = sum(a[i], i=1…N) - b
a[i] = w[i]*x[i]*sum(w[j]*x[j]*F[i,j], j=1…N)
F[i,j](r[i], r[j]) = exp(-(r[i]-r[j])^2/sigma)

Here h(x) is the dendrite activation, a[i] are the non-linear contributions of each synapse to the dendrite activation, and F[i,j] are the distance weighting kernels.

In my initial reading of the paper, I thought that the distance weighting effect would allow for synapses to produce a stronger response than either one could generate on their own. It turns out that my initial intuition was both right and wrong. It turns out that the pairwise effect is actually less if the input activations to the synapses are restricted to -1 and 1 (or 0 and 1), and the weights are initialized from -1 to 1 (or 0 to 1). The combined effect of a pair of synapses will always be less than the individual synapses since their combined effect is a product of their weights, activations, and a distance kernel which is also bounded between 0 and 1. However, due to the sum in the a[i] term, multiple of these pairwise effects can combine to generate a combined activation contribution in excess of the individual weighted activation (w[i]*x[i]).

Learning rules are provided for both the synaptic weights and positions.

I think it would be interesting to look more closely at how these types of networks compare to more traditional ANNs. In particular, I’d be interested in seeing if they are more stable/robust to noise, or if the rate of convergence of the weights and locations are more rapid than networks operating on weights alone.

All in all, I found this to be an interesting coding exercise. It allowed me to come up with a new cool way of visualizing dendritic activations (which might still be useful even without the distance/position effects). This project has also given me some ideas for other variations that I’d like to try out. I’ll post updates if anything interesting comes up.


It looks like you considered synaptic distance to be their post-synaptic positions on the dendrite, and not their pre-synaptic positions in the input space. This was something I missed on my first glance at the paper.

Mechanically, this seems to be a way of magnifying the correlated response of a set of inputs in a nonlinear way without making large synaptic weights. This would enable one to catch the subtle and important co-firing of inputs for a particular pattern, and a sharp drop-off in synaptic response when the inputs for that pattern start to disappear. This gives a nonlinear drop-off in response instead of the linear drop-off which would be the case of traditional ANNs.

I was thinking of the clusteron in purely spatial clustering terms (like the visual field) and I couldn’t see the benefits. But if you want to isolate and select out particular firing correlations, then the clusteron can do it in a way I’m not sure can be done with other approaches.

It’s interesting that treating the dendrite as a first-class computational object leads to interesting algorithmic possibilities not available to traditional ANNs. In BrainBlocks, we treat a single dendrite as a hypothesis with its synapses aligned to the expected pattern. A limited number of dendrites or hypotheses can be created per neuron, each trained to a particular pattern. These dendrites are created as needed up to a limit.

Whereas, in the clusteron approach, a single dendrite can represent multiple hypotheses by clustering synapses together for each particular expected pattern. The clusteron approach gives you that nonlinear response in the detected pattern, but I think in BrainBlocks, we would only get a linear response in partial patterns. This latter is usually not a problem because the neurons are fired in WTA manner so a 50% response for a pattern is usually discarded in favor of something with higher response.