A few months ago I developed a set of algorithms based on HTM theory principles, Simple Cortex (SC), in an effort to improve the computational capabilities of the theory and better understand the fundamentals of dendritic pattern recognition and learning. Recently I’ve been doing thought experiments in an attempt to optimize SC and parallelize HTM. From my experience, the biggest limiting factor of HTM parallelization is neuron activation (I will make a thread on this), so that’s where I’ve been dedicating a lot of my thinking. A number of questions arose that I attempt to answer below, which I’ve found reinforce the conclusions of HTM theory. Of particular note, I’ve changed my position on the necessity of minicolumns in computing neuron activations. I used to argue it unnecessary, but now I see a lot of advantages which I explain below. Anyway, I think for now I will shelve Simple Cortex and get back to HTM. I’m very curious about the object recognitoin capabilities of a large sensorymotor inference network and I think chucking GPUs at the problem can help get us there!
What is the advantage of activating multiple minicolumns with their own proximal dendrite segment rather than a single cluster of minicolumns sharing a single proximal dendrite?
Pattern reuse. Each active proximal dendrite segment represents the occurrence of a small pattern, or “feature”, in the feedforward sensory input space. The proximal segments of neurons in separate minicolumns represent different small patterns in the input space. Many active proximal segments occurring at the same moment represent a much larger pattern. Different combinations of active segments lead to different representations of large patterns. Think of a proximal segment like a particular Lego block where each block can be used to build different assemblies. Commonly occurring lego blocks are often used in many different assemblies.
For activating neurons, what advantage does minicolumn structure/behavior have over a network of neurons divorced from the minicolumn? (EDIT4: rephrased question)
According to HTM Theory a minicolumn is a collection of neurons that share feedforward responses and that are mutually inhibitory. A neuron is a structure that responds to input stimuli via dendrite segments (proximal, distal, apical) and their respective synapses. Neurons have two fundamental activation rules:
- A neuron’s proximal segment receives enough stimuli from its connected synapses to overcome a certain threshold. This activates the proximal segment which activates the neuron. At this point the neuron is active but “uncertain” because it has not received any other dendritic segment activations. This corresponds to neuron “bursting” in HTM.
- A neuron’s distal and/or apical segments receive enough stimuli from their respective connected synapses to overcome a certain threshold. This activates the distal and/or apical segments respectively which depolarizes the neuron. If the neuron is both activated and depolarized it inhibits its neighboring neurons. At this point the neuron is active and “certain” because it has received multiple dendritic segment activations.
From these activation rules it may be possible to compute neural activation without a hardcoded minicolumn structure/behavior. However from my experience and thought experiments, it becomes difficult to properly manage the activation process for neurons with multiple distal and/or apical contexts without a minicolumn-like structure. The properties of the minicolumn (shared proximal dendrite segment and mutual inhibition) applied to a network of neurons may provide computational simplicity to cortical algorithms. The advantage of shared proximal dendrite is a proximal segments are preallocated distal and/or apical segments, as well as neurons to represent different contexts. The advantage of mutual inhibition is neurons that are more confident because they received additional context beat unconfident neurons that lack this additional information. Because of this it is easier to select the right neurons even though they contain multiple distal dendrite segments.
What is the advantage of having just one proximal dendrite segment on a neuron (or a minicolumn in HTM) rather than multiple proximal segments?
As far as I know a biological pyramidal neuron has multiple proximal segments, but HTM uses only a single proximal dendrite segment. Perhaps multiple proximal segments is a future area of study, or a simplification of biological mechanisms that I missed. EDIT2: It’s most likely a simplification (see pic). EDIT3: See Hawkins 2016, Spruston 2008, and Major 2013 for more details. Basically, the receptive fields of all proximal dendrite segments on all neurons in a minicolumn are similar. Therefore, HTM may condense this into one proximal dendrite segment on a minicolumn purely for computational simplification.
My best guess is a network with multiple proximal segments per neuron makes properly representing and learning pattern-contexts computationally difficult. For example, say a neuron previously learned proximal segment 1 and distal segment 1 occurred at the same time. Additionally, the same neuron learned proximal segment 2 and distal segment 2 occurred at the same time. Let’s say eventually the network reaches a point where the proximal segment 1 and distal segment 2 are activated. The neuron would activate because a proximal segment and distal segment were activated, even though the pattern-context p1d2 never simultaneously in the environment. A neuron activating in this way is an error.
What is the advantage of having multiple distal dendrite segment contexts on a neuron instead of just one like the proximal dendrite segment?
Having multiple distal dendrite segments yields greater memory capacity in the network for pattern contexts. In this case the amount of distal segments is greater than the amount of neurons. This means an active neuron represents the occurrence of a single small proximal pattern with multiple different contexts, even though perhaps only one distal segment context was activated at that moment. Because an HTM network activates multiple neurons in each timestep, there’s a chance a set of active neurons represent two completely different contexts which would lead to mistaken predictions. However according to HTM theory and SDR math, the likelihood at least two sets of distal dendrite segments representing two contexts share enough neurons in a set to represent the same pattern-context is extremely low.
It is also possible to maintain 1 proximal segment with 1 distal segment on a single neuron and just have a network with more neurons. It may also be possible to completely remove neural activation rules because what really matters is learning synapses and using those synapses to recognize patterns.
Are neural activations even necessary or could the network operate solely on dendritic segment activations?
Neural activations are probably not necessary from a synaptic learning and pattern recognition standpoint and this may be an area of exploration for parallelization. However, an advantage of recognizing and learning neural activations is due to the properties of SDRs where many patterns can be recognized with low error probability. This reduces the data vector size for communicating to other layers or cortical macrocolumns and requires less synapses to learn patterns because proximal, distal, and apical dendritic activation is essentially encoded in a single active neuron.
EDIT1: Grammar check