Can you use just 1 segment per cell in temporal memory?



Hey everyone, I’m a grad student working on HTM designing my own slightly augmented system with a design question I wanted to pose to all you experts and theorists out there:
Would it be viable to design each cell to have just 1 distal segment? My idea is to create new cells with single segments, rather than creating additional segments on existing cells. I know that having multiple segments on each cell allows the cells to participate in multiple patterns, and therefore makes maximal use of each cell. It does however introduce the possibility (unlikely as it may be) that the cell will become predictive thinking that a certain sequence it plays a role in is occurring when it actually isn’t, due to one of its many segments becoming active. The practice of creating multiple segments on each cell also introduces the parameter of how many segments can be created on a cell.
The concept behind the alternative scheme I’m proposing, of creating new cells with single segments rather than building new segments onto a fixed number of cells, also allows the number of cells to be adaptive to the data. So rather than declaring that there will be 32 (or however many) cells in this column, that number could be growing as demanded by the diversity of patterns in the data. It seems that this would get rid of the need for the parameters ‘cells per column’ and ‘maximum segments per cell’, while creating a unique number of cells in the layer for each unique data set it sees.
So to wrap this up, do you guys think this is viable? Is there any core functionality that I’d lose by setting it up this way? I’m very curious for anyone’s thoughts on this! Thanks


Hello @sheiser1, I think this is a very good question. Just to think out aloud, whenever I read this type of a suggestion I immediately think about the main goal of HTM, which is to functionally imitate biological intelligence in the most basic sense. In that perspective, the more you deviate from neuroscience the more hand crafted it becomes. In my experience, it always leads to problems that you cannot foresee in the long run. To make matters worse, these are structures that we do not even fully grasp what they exactly solve. So adding and subtracting things means a lot of assumptions with limited experience. Not sure about the computational benefits but biologically it is kind of counterproductive to grow cells rather than segments.

I still think this is a valuable question. My implementation works as a part of an autonomous agent in a 3D environment and after 5 minutes of learning, the distal segment count per cell does not really seem to increase above 4 in a 1024 column and 8 cell per column configuration. Although I reserved space for 64 segments, this shows me that the temporal pattern combinations inside the environment is relatively low that it barely fills the unused cells. When an unknown sequence is encountered, Cortical Learning Algorithm picks the cell that is least used. In other words, new segments does not get created unless there is at least 1 segment in all cells of the active column. So if I had 64 cells per column, there wouldn’t be any cells that have more than a single distal segment.

To extract a direct answer to your question; yes it would work but you would be limiting your prediction capacity if you ran out of cells that do not already have distal segments. On the memory footprint size, I think you would need more memory for the same prediction capacity compared to multiple segment implementation and I would not underestimate this part. In my implementation the cell size/count is the deciding factor after the size of all synapses. If I understand you correctly, your idea would require you to have more cellular information (more cells) with same distal dendrite information size.

This seems like a flaw but on the contrary, I believe this is something that is needed to guide the learning.

I would definitely welcome more insights on this question.

Thoughts about topology

Multiple segments per cell don’t really cause issues. False predictions aren’t much of a problem because the goal of temporal memory is to put the input in context of transitions, although they could cause problems for the temporal pooler. As long as false predictions aren’t too common, the negative effects will be negligible because of redundancy, thresholds, learning based on multiple repetitions, etc.

One segment per cell would also make it extremely sparse, which could harm the learning rate and generalization.


At a glance, I think this might work.

I don’t think multiple-segments-per-cell adds any fundamental capability. Within a mini-column, the segments aren’t distributed between cells in any meaningful way. So multiple-segments-per-cell isn’t a dependable source of generalization. With the sequences ABCD and XBCY, maybe a few of the cells in the two ‘B’ SDRs will overlap due to multiple-segments-per-cell, but this isn’t really helping in any fundamental way.

And yes, as you know, this would diverge from the biology. It won’t demonstrate neurons having thousands of synapses. But I’d expect it to produce results similar to today’s temporal memory.


It depends on how much the data is temporally structured versus how much it’s spatially structured (and how the two kinds of structure interact). Sometimes one set of previous evidence is useful, other times another might be, and in some cases you need to combine sources of history evidence to do inference. It’s likely that we have mixtures of cells with varying degrees of dendritic branching and locality (trying all settings) early in childhood, and each brain region prunes out the settings which don’t add value. This leads to different answers for different regions.

The other consideration is that NuPIC (and separate SP and TM as a pipeline, so predictive inputs are only considered after the columnar SDR has been chosen. paCLA combines predictive and feedforward inputs before the SP inhibition stage, leading to a different character of inference which would be much more sensitive to the number of segments per cell. paCLA is closer to the biology but it’s an open question whether it improves performance of HTM.

[edit: now has a partial paCLA class for the Layer, and partial paCLA is also a parameter of layers in Comportex]


Having implemented my own version of HTM, I am having a hard time identifying where exactly paCLA makes a difference according to the paper. From what I understand, 4.3 is the main section that should be doing this but it is discussed so interleaved with Numenta’s implementation that the actual difference gets vague. What I understand is that paCLA considers the distal predictions on top of feedforward overlaps before the inhibition so that there can be a bias towards the predictions on the remaining active columns after the inhibition. There is even a possibility of a column getting active without an actual feedforward input. While it is interesting, I read in a few places that the cortex tries to decouple the real signals from the biased representations internally and that it is important not to create a spiral towards biased reality that worsens over time ( Even though some part of me knows that this is already true for us :slight_smile: ). I believe I read that this decoupling was done through separating feedforward and feedback pathways on the hierarchy itself. By the way, I would appreciate if you can point out the exact places in the sources that makes you believe this is closer to biology, if there are any obvious ones.

Oh and thank you for the effort on producing this work.


Thanks for your comments @sunguralikaan. I’d rather not have this thread taken over with discussion of paCLA, so let’s continue on the discussion page for the paper.

Just to quickly answer on plausibility: it’s more a case of asking where the sources for the SP-TM separation can be found (nobody in Numenta claims these exist). Pyramidal cells in cortex receive both proximal and distal depolarising inputs every time they become active, so making them active without including distal inputs is biologically incorrect. Jeff and Numenta concede this but expect evidence that it is worth making such a big change to NuPIC. The engineering difference is marginal at best in the experiments we’ve done, but paCLA in my view corrects an important theoretical weakness in standard HTM.


I do the same thing in my :
I “USE” one large segment of say 1500-2000 bits and rely on the UNION property of SDR to be able to store/distinguish-between 100ths of patterns… i.e. approximating multiple separate segments.

You should look at the paper to see numerical examples of reliability
of those properties for different values of n,w,b and threshold.
Suffice to say as long as the vectors are Large > 1000, Sparse <
5% and Distributed the properties hold pretty well.
My current implementation uses exclusively the UNION property of SDR.
The big benefit of the union-property is that it allow us to store
multiple SDRs in fixed-length bit-string.
Standardly we would use some structure like array or hash to store
multiple elements, but the properties of SDR make it possible to achieve
similar result with fixed structure.
This is possible because as the SDR size grow linearly the possibility
of not getting false-positive grows exponentially.


1 segment per cell is the right approach to the problem. I believe that because we were dealing with data as discrete two segment quantity since the first ever problem solving approach using a machine. On and off are the basic set of two segment cell in human brain. But they are forced to form a combination as a single segment due to the very idea implanted in the brain as uncertainty. We will never solve the puzzle if we try to approach the problem from a two segment per cell approach. We have to focus on defining a single segment that posses relative growth in it to implant the idea into the basic two segment cells out there in the temperol memory.


You are misinterpreting the definition of “segment” in the context of HTM theory. I highly recommend taking some time to read through the resources here in order to help you communicate more succinctly on “HTM Hackers” discussions like this one.


Yes, we’ve got to stick to the book!


Sure thing



No biggy… :wink:

A “Segment” is a term which is the superset of different kinds of Dendrites. Both proximal and distal dendrites can be referred to, and are both types of - “Segments”. Dendrites also connect one Cell to another via Synapses.


What i am suggesting is that if we need to implement the connection in a machine you need to implant the idea of both types of dendrites in a single class that do not use binary codes to communicate with other cells which has each dendrites forming patterns in opposite order. Correct me if I was wrong understanding the concept


Neurons communicate by firing, which is a binary event. There is also a depolorized state that a neuron can be in when enough synapses on one of its distal dendrite segments have received input from other neurons firing. This “predictive” state makes the neuron primed to fire, but this state alone doesn’t communicate any information to the next cell.


So true sir. I am trying to focus on that depolarized state here. How does a neuron gets primed to fire ie make no communication with other neurons in a way their distant dendrites recieves synapses to reach such a state. I think that’s pure social science, we need to solve the problem.


@Z_t_i As both @cogmission and @Paul_Lamb have been hinting, you are missing some fundamental knowledge of HTM. We have a common vocabulary here when we talk about cells, dendrites, synapses, connections, mini-columns, active vs predictive, etc. I don’t believe that you are understanding our vocabulary. I think it would suite you best to do some independent study before continuing to post. I recommend you read or watch these resources first: