I would be willing to be that none of the parameters you have listed there are impacted by either of those 2 parameters. Just as the SpatialPooler doesn’t care about cellsPerColumn; the TemporalMemory doesn’t care about columnDimensions other than the use of them to compute a given cell index. Performance-wise, I also don’t think the cells and columns are looped over because the active cell and segment lists are built up over previous cycles - and I don’t think we ever loop over every cell in every column to assess its permanence and proximity to a given activation threshold - those values are assembled from active columns only (off the top of my head - of course). Anyway, that’s from memory - so there’s an outside possibility that I’m mistaken, but I doubt it…
Now I’m more confused. The following params deal with connections between cells:
The cell count in the system simply must affect these. Just consider making columnDimensions=. None of the default values are valid anymore. Without changing the activationThreshold, nothing would be active.
Sorry buddy, I dozed off… (after a long phone call coordinating logistics for my mom in her rehab facility). It’s crazy, and so am I!
Obviously when collecting segments, the number of cells in a given column matters because they must each be looped over to test for matching ( >= minThreshold), and active ( >= activationThreshold), but the difference is negligible because the algorithm only surveys the 2% activated columns coming in from the SP as the base # of columns it works with.
From your original question. You want to determine:
1.) The amount any of those parameters would have to be changed to allow for changes in columnDimensions and cellsPerColumn.
2.) the impact of columnDimensions and cellsPerColumn on the actual quality of TM predictions.
My contention is that the assumption of #1 & #2 (that the rest of the parameters are affected or have some mutual dependence with col/cell numbers that would affect prediction quality, is not true imho).
Each of those parameters affect which segments or cells get added or removed from lists whose origins arise from the SP output - thus beginning with the ActiveColumnar input to the TM. The number of cells or columns in that list of “Active Columns” does not impact which of a given cell’s segments have permanences less than or greater than any of the thresholds. In addition, the amount a given synapse is incremented/decremented is not affected by the number of cells/columns. Also, the “max” variables simply represent limits imposed on memory for the number of segments/synapses a given cell can have created on its behalf - this obviously affects the total number of segments/synapses present in the system - but that has no qualitative impact on the prediction quality of the TM in general.
Keep asking questions… I’ll go all the way into it with you (and re-learn re-visit it with you) to the extent you want…?
So you’re basically saying that none of these parameters really need to be updated strictly because of changes to columnDimensions and/or cellsPerColumnexcept in extreme cases like if I cut the columns down from 2048 to 16 or something.
Honestly, imo these parameters are all disjoint… (with regard to qualitative inference impact). The only consideration I can see as important is if you increase the column/cell dimensions so much that the maxSegmentsPerCell/maxSynapsesPerSegment variables start to cause the number of segments and synapses to exceed the memory capacity - and if that were the case then constraints on the number of segments and synapses would start to also have a qualitative effect on the TM’s ability to predict.
Now… While I’m pretty sure about my conclusions, it still makes me nervous to give definitive answers because I haven’t yet started my neuroscience studies; nor my experimentation with NuPIC so that I can grow my experience with real-world use-cases. I’m still strictly limited to having pure code expertise with regard to the adherence of HTM.Java’s Java code to NuPIC’s Python code. A very very strict and narrow domain of expertise for the time being. So take what I say about qualitative performance with a grain of salt for the time being…
Just to pitch in; these are dependent on how many cells are active at a given time, not directly with the column or cells per column counts.
Each time I play with the column and cell counts, I experiment and update these values accordingly. After some time, I started initializing these parameters with the approximate active cell count at a given state to save some time. So 2048 with 0.02 sparsity would produce 40 active cells in a predicted scenario (one cell active on every column). I kind of set 0.5 times the active cell count as the activationThreshold (20) and 0.5 times the activationThreshold as minThreshold (10). These are of course soft values that seem to work in general cases for me.
maxNewSynapseCount should be lower than the number of bursting cells in any case to be a practical limit. At any time, there cannot be more than “activeColumns.size() * cellsPerColumn” connections for a cell to make on the previous activation. I generally set it to some value below the active cell count in a predicted scenario. The reasoning is we should sub sample, not form connections to the exact previous activation (40 cells). The learning happens as long as maxNewSynapseCount is larger than minThreshold. If it is smaller than that, a newly created segment would not be activated in any case because the synapse count would be below the activationThreshold and it would not be a matching segment because it has lower synapse count than even the minThreshold. For a segment to get more synapses on itself, it should at least be the matching segment which is dictated by the minThreshold. So if you set this value lower than minThreshold you would create new segments infinitely with no predictions. It happened to me.
maxSegmentsPerCell is something that I set to prevent producing infinite segments because of a bug or inconsistent parameters. Normally you would not need a cap on this because this number should converge in time in an ideal scenario unless you have massively complex/inconsistent data.
maxSynapsesPerSegment should obviously be higher than the activationThreshold for any cell to become active. It should also be higher than maxNewSynapseCount or else it would override maxNewSynapseCount.
TL;DR Sparsity and column/cell counts dictate the number of cells active at a time, which dictates activationThreshold and minThreshold. Then you set the max values according to these.
@sunguralikaan That is a very clear explanation on how to set those parameters. I pretty much agree with everything - the rules seem reasonable to me. As you said, they don’t directly relate to column or cell counts.
The only thing I would add is that I get nervous when the column counts and thresholds get below certain values. I don’t feel comfortable using column counts less than 1024, or thresholds below 10. That is because it is too easy to get false positive matches and lose noise robustness.
@subutai Thank you. My lower limits are a bit lower than that but you are right. Going below 1024, you get too many false positives if you do not increase thresholds. If you increase the thresholds, you lose noise robustness because they become too close to active cell counts. Still, it depends entirely on the problem and just as importantly, your encoder quality.
I work with 512 columns per region generally but it involves a hierarchy of 6 layers and running real time is a priority so the column count is more like a constraint for me. Initially started with 2048 and in order to see how low it can go, I observed the learning with column counts 256-512-1024-2048-4096. I could not go with thresholds lower than 4 and still maintain a stable learning. It just does not make sense to use HTM lower than that. This lower bound forces you to have at least 10 active cells to maintain a meaningful noise tolerance. As a result you have to decrease your sparsity when you go lower than 512 columns to make up for the active cells. 256 still seemed doable with 0.04 sparsity when I had 11 regions but the capacity suffered greatly.
My problem with 512 and 0.02 sparsity is you have to fine tune your thresholds to get a good balance between false positives and noise tolerance, as you said. For layers that only have a single distal input layer (10 cells as input), activationThreshold = 7 and minThreshold = 4 seem to work better. Unfortunately it means that %70 of any learnt activation is required for a prediction which is bad on the noise tolerance side.
Still trying higher counts from time to time for debugging purposes or to see if it gets better with increased columns. However, I find that most of the time improving the encoder trumps the gain from increasing column counts so they get reverted back to 512 in time for performance