I’m not sure about #3, but #1 and #2 make sense and I think that’s how it works in NuPIC.
I don’t think NuPIC uses the second optimization, though. I see the following default parameters:
"maxSegmentsPerCell": 128, "maxSynapsesPerSegment": 128,
Optimization #2 is an abstraction of setting the above to this:
"maxSegmentsPerCell": 1, "maxSynapsesPerSegment": 16384,
BTW, my example of “cat”, “dog”, and “fox” is meant to try and explain the basic concept for Optimization #2, not depict a realistic scenario. Besides the fact that “cat” and “dog” in reality share semantics with each other, a third segment probably would not have been created for “fox” either. Even if this same cell had been chosen for learning, the “dog” segment probably would have been modified (since it is the “best matching segment” in this example).
If this were the case, every time a cell decided whether to become predictive, it would need to query every cell in the layer? What if the distal segments are not coming from the same layer? Would it need access to an entirely different layer? Or are you only talking about proximal input here?
I would just transmit this information during the Activate step. I have something similar already in HTM.js. When I perform an activation, at that time I crawl all synapses connected to that cell’s axon and increment a connection score on the segment they are attached to. When the score is above the activation threshold, I activate the segment and add it to a cache. The “Predict” phase then consists merely of cycling the cache of active segments and setting the cells they are connected to as Predictive. This optimization would simplify that process even further by removing the concept of “segment” entirely.
Right now, I’m not aware of a use case for distal input coming from multiple layers. The only multi-layer use cases I am aware of are distal from one layer and apical from another. I would model distal and apical as separate connection types (unless I also implement it in conjunction with optimization #3, in which case the problem is simplified through abstraction)
No, this would be specifically used for distal and apical input. A single-segment optimization is already built into the normal SP process.
maxSynapsesPerSegment is just the max, it will only query the synapses that have actually formed. That said, if cells learned many connections then it could get slow (and cell predictions could have many false positives)
Getting rid of separate segments (#2) is a reasonable idea and the algorithm should still work, although I wouldn’t set the max synapses that high. There is some ability to handle unions without confusion - i.e. if there are three sets of presynaptic cells that put a post-synaptic cell into a predictive state, it is unlikely that the cell will have a false positive (going into predictive state when none of the three patterns are active but some small subset from each are active). Once you get up to 10, 15 patterns unioned together you will start to see false positives. You can manage this somewhat with the right learning rates and limiting maxSynapsesPerSegment but it is much easier when you have multiple segments.
Ah yes, this is the difference between an async platform and python. How fast does this work for you per cycle?
Think about the SMI case, where distal input is coming from “elsewhere in the cortex”. If you assume the input comes from the same layer, you already know the layer’s dimensions. But if you don’t know where it is coming from, you cannot assume anything about the dimensionality.
By assuming the dimensions of the distal connections are the same as the current layer, you’re limiting yourself.
Excellent point. I had misinterpreted what you were saying – you are talking about distal input from another layer (not distal input from multiple layers).
I’ll have to think on this one a bit. My initial reaction is to have the config parameters specific to each layer, and use the parameters for the transmitting layer you are connecting from (not the layer receiving input).
Moved from #htm-theory:tangential-theories.
FOREACH cell.axon.synapses AS synapse synapse.segment.score++ IF synapse.segment.score >= config.activationThreshold THEN IF synapse.segment.active == false THEN synapse.segment.active = true activeSynapses[t].add( synapse.segment )
Regarding Optimization #2, getting rid of segments would cause a pretty large drop in capacity.
Consider if every cell in a minicolumn is part of 10 SDRs. This could definitely happen with a common feature. If that feature appears in a lot of sequences, or at a lot of locations on different objects, the feature’s minicolumns would quickly learn tens or hundreds of contexts. And, on top of this, the minicolumn is part of multiple feature SDRs. If a cell connects to 20 cells of each SDR, then each cell is now connected to 200 cells, with a threshold of ~13. Depending on the parameters, the odds of a random 40-cell SDR matching 13 of these cells are non-negligible, and it will become increasingly likely as the cells learn more contexts. And, as Scott mentioned, it’s worth considering unions. Combining unions with one-segment-per-cell would cause a lot of false positives. Having multiple segments totally avoids this problem, and it mimics biology better.
And yes, without segments, the learning is now less capable. You lose the ability to use a large “permanence decrement” – i.e. the punishment of inactive synapses on a correctly active segment. You have to keep this value very small. If it’s too large, any cells appearing in multiple SDRs will be in an unstable state, trying to get back to their “happy place” of representing one thing. Having multiple segments allows cells to stably represent multiple things and be capable of quickly forgetting bad synapses.
Really good points – I hadn’t considered the impact on capacity. This optimization is starting to look pretty bad at this point… glad I brought it up again for discussion.
Regarding Optimization #3:
Would this add any functionality, or is it purely an optimization? Would this improve learning? Would it introduce any new data structures or is this just a logical change? Is this the classic argument against the binary state of HTM?
This is purely a logical change. Basically a single concept (charge) can be used to abstract multiple concepts to simplify logic and enable concurrency.
Not in my opinion. I would probably use a threshold for knowing when something is predictive and another threshold for knowing when something is active. I don’t see a need for knowing “how predictive” or “how active”, other than it could be useful when doing a winner-takes-all process to perform inhibition.
Again, interesting. Thanks for bringing these topics up, it could help anyone building their own HTM system in the future.
Thinking about this some more, couldn’t a similar argument be made against the current SP implementation? Would it be worth exploring a multi-segment implementation of the SP process? Something similar to how TM works, only with proximal connections instead of distal.
I’m sure you’d get something interesting out of it, but it would change the functionality. Each column would learn to respond to multiple distinct patterns and be unable to exploit their overlap for noise tolerance and so on, whereas the single proximal segment version can respond to a union of similar patterns, increasing its robustness to variations on each. Could still work, as with all of these changes it would be worth doing a thorough empirical evaluation.
To play devils advocate, I would think the same argument could be made for using a single segment in the TM process (response to similar patterns). It is interesting that one strategy was used for SP and the other for TM.
The way I see it, it’s about what a cell or column essentially represents. If a cell represents 1 input that can occur in N contexts, then intuitively you’d want 1 proximal segment and N distal segments (union tolerance notwithstanding).
My feelings so far:
#1 should be functionally identical to my current implementation, and no reason not to use it.
#2 is overall a bad idea given the impact on capacity and stability/ ability to adapt to change. I’ll ditch this one.
#3 is different enough that it may be difficult to theorize what negative impacts it might have. I’ll need to run some comparative experiments to see how it impacts the behavior of the system.