Why don't HTM neuron synapses model long term potentiation and depression?


Distributed codes are necessary and the basis of brain function. This is pretty much a universal belief in neuroscience, (no grandmother cell encodings). If you don’t make your code sparse (mostly zeroes), as in many ANNs, then scalar synapses and scalar cell activation are essential. They are the only way to distinguish between codes.

If your codes are sparse (SDRs), as observed in the brain, then scalar synapses and scalar cell activation are not essential and don’t add much. The power of the encoding is its distributed code. From an emulation and mathematical analysis point of view, binary synapses are far easier too.

Another thing to consider is that biological synapses are unreliable, stochastic. The number of transmitter vesicles released upon an incoming action potential varies dramatically, spike to spike. This tells you biological networks cannot rely on even a few bits of precision in a synapse. Many ANNs require great precision in their synapses, including being able to go negative, both are not possible in biology. With an SDR, some percent of the synapses could not function at any moment in time and the system will keep working just fine.

So we could add scalar synapses, but it would add very little and we would have to assume that they are largely stochastic.


Scalar cell activations certainly don’t add much if codes are large, distributed and sparse. I mentioned this above. As for scalar synapse strengths, though, it’s still not clear or obvious to me that they play no functional role.

The distributed code we’re talking about is the map of activated cells (columns). Scalar cell activation would break that binary map into fine-grained scalars (which is, agreed, unnecessary and has no biological parallel) but the function of each neuron that computes its state from its input is what is in question here and that holds the potential to play a humongous role in both learning and representation. The function employed in HTM neurons is rudimentary: a tallying of activated synapses. The complexity of biological synapses, especially with the presence of stochasticity, at least suggests the possibility of important functionality being missed. You’d be hard pressed to suggest biological neurons could just be equivalently tallying synapses!

As an example, there are a lot of theories as to what role the stochasticity of cortical neuron synapses could play in learning and neural encoding. In unsupervised, local learning ANNs, “Synaptic Sampling Machines” use synaptic stochasticity as a means to Monte Carlo sampling and unsupervised learning in the effort to form efficient sparse representations (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4925698/). Stochasticity in the anteroventral cochlear nucleus in rats has been shown to increase the dynamic range of the synapse (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3761050/). Or this work that found neural generative models with stochastic synapses were able to capture richer representations (https://www.cs.cmu.edu/~mshediva/assets/pdf/poster-cosyne15.pdf). More evidence that synaptic stochasticity is not a “bug” or a biophysical constraint but an important information processing tool lies in the fact that some cortical synapses can be reliable under certain circumstances demonstrated in (https://www.scopus.com/record/display.uri?eid=2-s2.0-0029952698&origin=inward&txGid=a0888970f3c108099be39155e1c50b46) where two strong inputs in cat visual cortex — those arising within layer 4 and the putative thalamic inputs — show very few neurotransmitter release failures and a low response variability. Also observed in rat visual cortex in (https://www.scopus.com/record/display.uri?eid=2-s2.0-0029162395&origin=inward&txGid=b310de34f66cb7e7622c3cbe09f37900). Thus the fact that the putative thalamic input is stronger and possibly more reliable than some of its intracortical counterparts suggests a privileged role for feed-forward input from the outside world in ongoing cortical information processing.

At any rate, I personally cannot comfortably dismiss all synaptic details beyond binary transmission without more research and testing.


I would suggest to look at the problem from the following angle: is there an existing issue with your model, which can be solve in this way? If yes, you definitely should try it. I not, why not to leave it until it can help you to move forward?


This is such a great phrase.


I have experimented with scaler synapses and scalar inputs with a HTM-like learning algorithm. - It doesn’t work well. Since HTM synapses grow and decay base on the input activity; the synaptic weights basically become the running average of the input stream.

This is a visualization of the synaptic weights of real-valued neurons learned using a HTM-like algorithm when I send a video stream into it. (And this is with some level of boosting)

With that said, I have successfully made a MNIST classifier using a real-valued HTM neuron and a FC layer trained using backprop with the accuracy of 95%. (Basically the real-valued HTM layers become a template-and-match system. It compares the input signal to the expected input)

I think scaler values are a reasonable direction to look into. But a lot of work is needed. And the biological feasibility is also a problem.


Can you define what an “HTM-like” algorithm is, for me? Also, scalar inputs (assuming you mean the activation (action potential) of a neuron) do not have a biological parallel; there is no such thing as a partial action potential. This is not what is questioned in this thread. Modeling biologically plausible synaptic transmissions has nothing to do with changing the binary nature of action potentials even though people seem to group them together as the same thing without blinking.


Some argue that firing rate acts as a scalar input. (I don’t think that’s realistic though since EPSPs are brief.)


Ohh… Let me explain (Code in the following might not work. I initially implemented the algorithm in C++, not python). What I have done is simple. First, suppose the input signal is a 1D vector and the length of the vector is N

x = np.array(some_vector)
assert(x.shape[0] == N)

And there are M neurons, which all of them have connection to all inputs. As a neuron in FC (fully connected) layer in ANN does. The connection weights can represented as a matrix with shape (M, N) (numpy notation)

w = np.radom.normal(M, N)

To compute the output of a neuron, one simply calculate how close is the signal is to the weight. The sensitivity parameter is a hyper parameter that to control what range of valus a neuron can take in and have a reaction. This process is like how a score is calculated in HTM. However no inhibition happens so the output is also a dense vector

y = np.max(np.abs((x-w)*sensitivity)/sensitivity, 0)

Then to make the neuron learn. First one finds the top K neuron with the heightest score. Then nudge the weight towards the input signal. Like how HTM changes the connection strength for the top activated neurons.

d = x-w
learner = topNeurons(y, K)
w += learner * (np.max(d, 0)*growth_rate + np.min(d, 0)*decay_rate)

Sorry for the confusion, I’m responding to the discussion of scalar activations in this thread,


Mhh… I’d tend to think a proximal input can still ‘build up’ from previous spikes, if firing fast enough.

Anyway, even if it did not. Coincidence detection, at the level of distal dendritic segments, as modelled in HTM, could at least be stochastically influenced by those firing rates : The higher the frequency, the better the chance you’d indeed sense the pattern (up to, I’m taking a guess, each at roughly 100 Hz with probability 1).


I’m not qualified to make these claims, but firing rates are higher in experiments where they stimulate the neuron because that’s to demonstrate something. If the cell fires at 100 hz, the neuron is probably broken. Bursting doesn’t occur when awake, at least for the most part, and whether or not L5 TT cells burst varies a lot from study to study by temperature, artificial cerebrospinal fluid, and even the duration/type of anesthetic used before slicing. There’s even a huge hyperpolarization after the burst activated by the apical calcium spike, as if the cell is aware it is freaking out. I suspect real bursts exist but are much more subtle.
Synapses also depress. NMDA spikes could definitely summate over time, though. Let’s say a normal EPSP lasts 25 milliseconds. For the cell to fire again right as that EPSP ends, it would have to fire at 40 hertz. To fire half way through, it would have to fire at 80 hertz. Both of those are pretty high firing rates even for TT cells, and even firing up to half way through the EPSP doesn’t give much range.

That makes sense to me.


I agree, we don’t want to dismiss anything. The approach we have been following in our research is to add biological detail when we have a clear theoretical need for it. For example, some dendrites exhibit a “metabotropic” effect. If two or three spikes arrive very close in time, then metabotropic receptors are invoked and this leads to long lasting (up to a few seconds) depolarization. We needed this theoretically and so we dug into this in the literature. We incorporate this detail in our models although we don’t model the mechanism explicitly. Of course our model of a pyramidal cell is not a simple “add the synapses” approach. We model dendritic spikes and somatic sub-threshold depolarization. We are starting to model the differences between apical and basal dendrites, etc. If we find a need for scalar dendritic weights we will do do, but so far we haven’t found a need for it. That doesn’t mean there isn’t a need for it, just that we have not encountered it yet.

Again, just because we don’t model something doesn’t mean we don’t think it isn’t important. It only means we don’t have discovered a need for it yet.


I’m working off of the form of the SP and TM that I’m familiar with; that which is outlined in the most recent papers and in NuPIC. I can’t speak for any brand new stuff for obvious reasons. Specifically this equation in “Why Neurons Have Thousands…” which describes how predictive state is calculated:


Which is checking if the overlap is above a threshold i.e. tallying up activated synapses. It’s a similar story for the SP in choosing columns that become activated (before inhibition).

Just to clarify, this is not meant to be a criticism of HTM but rather an explorative discussion into modeling decisions which I think you’ve addressed. Personally I think biologically plausible synaptic transmissions are worth looking into, backed up by reasoning and success from other work. The main goal being the use of scalar neuron voltage disturbances and synaptic stochasticity to capture richer neural encodings.


In the context of tonic firing this makes sense to me. Fast and slow firing rates of thalamic relay neurons in tonic mode is believed to convey signal strength information to cortex.

This is as opposed to a bursting firing mode which is believed to initiate a “wake up call” to the cortex in the presence of new information. Bursting hard and fast is believed to both cement in new synapses relatively quickly and overcome any transmission variability to ensure the signal is passed through.


I see the relationship between the cortex & thalamus in regards to bursting a little differently.

I see the bursting as a signal from the cortex that roughly translates into: Hey - “I don’t know what this is!” - to which the thalamus responds by gating in more of the real world to be experienced. This thalamus response starts as bursting and drops back to mediation based on tonic of the single cells firing in the column that signals - “I got this” - I have seen this before.

I see a possible mechanism of the RAC as the initial gatekeeper here; the alarm clock reading the surprise from the cortex.


Sounds like a similar interpretation. Is there any experimental evidence you can point towards?


I do but it will have to wait until I have access to my cache of papers at home.


Regards synapses, it may be that despite the structural variability those differences may signal level of permanence rather than significant discernible differences in connection strength.

Maybe I’m misreading, but the following line seems to suggest that even the maximum level of potentiation only results in minimal increases in some brain areas:

. The final level of LTP after three trains was about 150% above baseline, a value almost three times greater than that typically described for LTP studies using field potentials in area CA1.- from Differences Between Synaptic Plasticity Thresholds Result in New Timing Rules for Maximizing Long-Term Potentiation