It seems I can’t get clear view upon both HTM algorithms and neurological perspective, and I think the following problem is fogging me.
Assume there one synapse S and two neurons A is presynaptic and B postsynaptic. It has an “average” permanence so its faith depends on the (in)activation sequences between A and B.
I want to cover out all possibilities in order to understand what happens in implementation algorithms and (if possible) biology.
Encoding a time sequence is in the form ( ... > ... )
which means what is left of “>” happens at time step t, and what is right of “>” happens at t+1
A letter before and after shows which neuron activates. An “_” instead of letter means none.
Starting with the obvious happy story and ending everybody learns about neurons:
(A > B)Light A activates at t, B at t+1, synapse is happy since its permanence is increased
( _ > _ ) - Darkness the uneventful boredom 99% of the time. I’m tempted to say nothing of relevance happens with the dormant synapse.
(_ > B)Failure to predict. From what I understand from TM descriptions, if this happens, the learning algorithm decreases S’s permanence.
(A > _)Failed prediction , sounds similar as 3 above, but it is an entirely different beast. From what I get from TM descriptions, this is treated as if 2 (nothing) happened? Active cells at t check only for activity of their presynaptic cells at t-1?
If this is true, would it make sense to decrease permanence here too, and the reason HTM doesn’t is computational cost?
(_>AB>_)Too late I bet few people expect that. If you think algorithmically, this is the worse case combining failures 3 and 4, if you think Hebbian-istic… well … they very much fire together, maybe won’t be too bad to wire together?
(B > A) - Backfire At this point things seem messy enough to consider this a variation of 5. Almost together.
And I haven’t even mentioned dendrites becoming predictive or not.
Please, any feedback/comments are welcomed.
I think for you it might be helpful to learn more about the physical mechanisms that control synaptic plasticity.
The calcium level in the spine determines the direction and magnitude of the change. High concentrations increase the weight, prolonged periods of low calcium concentration decrease the weight, and very low concentrations have no effect on the weight.
Calcium enters the spine head by two ways: NMDA receptors and voltage controlled calcium channels. Both of these channels are effected by the voltage so nearby synapses also control the plasticity.
This is my interpretation of plasticity effects in realtion to event timing
(A > B)
LTP effects of the directional synapse A->B either forming a new synapse or reinforcing the existing connection
( _ > _ )
Zero change to anything
(_ > B)
Zero change to anything - there is no pre or post synaptic event pair
(A > _)
Zero change to anything - there is no pre or post synaptic event pair
(_>AB>_)
If they are at exactly the same time (sub mS synchronisity) then no effect. There is no LTP or LTD occuring so no directional (temporal) synaptic junction can be formed. I’m not 100% sure on this though…
(B > A) -
Double effect - the directional synapse A->B is weakened because it experiences LTD
The directional synapse B->A is created or reinforced.
My understanding of biology is not all that great, having attributed everything and anything to the wrong biology in the past so would be good for someone who knows what they are talking about to validate or trash my thought.
Ok I asked this because I noticed that in the HTM algorithm the activation signal processing doesn’t flow “naturally” from upstream neurons to downstream ones.
Dendrites store addresses of upstream cells.
There are no axons, only dendrites “watching” a bunch of other cells.
In an axonic perspective dendrites do not know which are their upstream cells, the cells themselves have “axons” which record their synapses as a list of addresses of downstream dendritic segments.
Sure, it is a matter of perspective - the two perspectives should be equivalent because either choice represents the same connectome.
Yet each one has its own advantages and disadvantages.
The current (dendritic synapses) view favors faster learning when we know in advance what follows at t+1 since they need to scan only active columns synapses for predictive upstream cells and figure which predictions were correct or not.
But for making predictions - when we do not know t+1 - it gets sluggish because… we can’t pick only the active cells at t to directly map them to predictions, the algorithm needs to scan all synapses to figure out which segments predict an activation at t+1.
If the addressing is “axonic” - the learning appears to require two steps - prediction and evaluation.
Having a set of known active inputs
at t: for each axonic synapse from the active inputs increase prediction level for its destination segment
at t+1: - when we know what “destinations” activated do it again to find out which synapses made correct predictions. and reward or penalize them.
So here are the consequences:
dendrite attached synapses
faster learning, sluggish prediction
to keep learning fast they penalize only 3 - Failure to predict because they look back. Forward lookup is very expensive in this case
axon attached synapses:
half the speed of learning, prediction faster with orders of magnitude due to the sparsity of input.
to keep learning reasonable fast they can penalize only case 4 - False prediction
That’s why I started this subject.
Besides latest neuroscience revelations,
From an evolutionary/ecosystemic perspective, neuron activation is what is expensive - it consumes energy. The presynaptic neuron is the one that takes the toll for crying “wolf!” without an wolf appearing at t+1. As it should.
It is not useful to punish potentially good wolf predictors for not crying “wolf” at t-1 every time there-s a wolf, because they might look in different directions. Each covers a different area or way which can make a wolf happen. As they should.
Fast prediction is crucial in a reinforcement learning environment. Even in DL RL setups, where backpropagation is much more expensive than forward inference, the highest expense is spent by inferring lots of scenarios in simulated play after each learning step.
Another potential benefit of forward activation is it allows what could be called an event driven predictor architecture.
Similar with an event camera,
its costs (energy/computing) becomes even smaller by compounding sparsity in space with sparsity in time.
allows long(er) term prediction since it can preserve their prediction state indefinitely till a new relevant change in any of the presinaptic axons occurs…
It seems to me that algorithm implementations taking the post-synaptic/dendritic “view” (“watching” axons for activity) are easier to compose into larger structures (models of multiple interconnected cortical columns).
This doesn’t have to mean scanning all segments to generate predictions. The BAMI pseudocode shows that, but practical TM implementations can maintain an index mapping active pre-synaptic cells to dendrite segments requiring updating.
The HTM-scheme overview comments describe these data structures. The Temporal Memory implementation uses this approach while replicating the behaviour of the reference Numenta implementations (SparseMatrix/Torch)