When implementing the multi-layer interactions needed for the new sensory-motor functions, I found that I had coded myself into a corner with HTM.js. The way I implemented the process makes it very difficult for more than one layer to interact with each other at different steps of the process. As a result I am refactoring.
Since HTM.js needs to be as lightweight as possible (designed to run in a browser), I thought I would take the opportunity to discuss and possibly implement some optimizations that I have been thinking about for a while. I wanted to discuss them here to get some input.
A couple of these optimizations modify core parts of the theory, so thought Tangential Theories might be the right forum category for this, versus HTM Hackers, but please move if it fits better over there.
Optimization #1
SP connect to input space on the fly
In the current implementation, upon creating the spatial pooler, a segment is created for each column, which in turn creates potential synapses with configurable 50% of the input cells (with configurable 50% of them being above the connection threshold).
It occurred to me that connecting a column with 50% of the input cells is equivalent to connecting each input cell with 50% of the columns. Thus it is possible to not create the segments and synapses upon instantiating the spatial pooler. Instead, each time an input cell is activated for the first time, its connections to the columns could be established on the fly.
This wouldn’t have much of an impact on memory usage for a typical single-layer sequence memory system (since in most cases all of the input cells will be used). However in multi-layer systems, this can have a huge impact. A lot of the cells in a layer are never utilized, so there could be significant memory savings by not creating unused potential synapses.
Optimization #2
Abstract concepts of “segment” and “synapse” into just “connection”
Implementing the segment and synapse features adds complexity to the logic. The idea for this optimization is to abstract these into cell-to-cell connections. These connections would still have a permanence and connection threshold, so would not have a very significant impact on memory usage, but would simplify various logic steps.
The basic idea is to move away from an implementation like this:
To an implementation like this:
This would be functionally equivalent to each cell having a single segment with more synapses on it. I proposed this some time ago when I was first learning HTM, and ultimately decided against it since the two are not functionally equivalent. There are two main functional differences that I am aware of (if there are more, please point them out):
Firstly, partial activations across multiple segments under the activation threshold could result in an activation. For example, say the activation threshold is three. The following would not put the receiving cell into predictive state:
But it would in the optimized version:
This was my original reason for not using this optimization. However, as my understanding of HTM theory has improved since then, this doesn’t seem like such a huge deal. What this scenario is basically depicting is a cell which is connected to two semantically distinct features now sensing a new feature that has semantically similar elements of both the original features.
This would be something like a cell which has a segment connecting it to “dog” and another connecting it to “cat”. Then it encounters “fox” which shares semantics between the two. It doesn’t seem like a huge problem that the cell is sensitive to this type of scenario, given the semantic similarities between the inputs across multiple segments. The likelyhood of this scenario happening by random chance due to noise is very small – it could only happen when a feature shares semantics with multiple other features that the cell has connected with.
The other functional difference is in the learning step. In current implementation, training happens in the scope of a single segment. Thus other segments that are not activated are not impacted by training on a different segment for the same cell.
In my above example, if the cell learns “fox” in the original implementation, it would grow a third segment and train it to recognize “fox”. This would have no impact on its current connections to “dog” and “cat”. In the new implementation, however, the cell would train itself to better connect to “fox”, and its connections to “dog” and “cat” would be degraded.
Ultimately, this would mean that things could be forgotten more quickly. On the flip side it would mean the system could be more efficient in memory usage. In the example above, I didn’t need to grow a whole new segment for “fox”, but I did degrade my memory of “cat” and “dog”.
I’m curious if anyone has explored the “one segment per cell” setup and has some comparative data they could share.
Optimization #3
Abstract concepts of “proximal”, “distal”, “apical”, “active”, and “predictive” into just “charge”
One of the issues with current implementation of HTM is leveraging concurrency, due to the rather mechanical three-phase TM process of Activate -> Predict -> Learn, and the classification of inputs as “proximal”, “distal”, and “apical” used differently through the process. It is also difficult to ever move away from discrete time.
The idea for this optimization would be to have dendrites which transmit a charge to a receiving cell. The receiving cell would accumulate charges from all transmitting cells connected to it. The higher the charge, the better connected a cell is with the current input (equivalent to “predictive state”). A cell’s accumulated charge would degrade over time. When the cell’s charge reaches a certain threshold, it transmits a charge to other receiving cells and its own charge is depleted.
A transmitted charge would attenuate the further the distance from the transmitting cell. So by giving different lengths for proximal, distal, and apical dendrites, we can get the desired behaviors
A further optimization on top of this would be to assume all proximal dendrites have the same length, all distals have another length, and all apicals have a third length, we can eliminate the need to calculate attenuation during the process, and simply use constants for the transmitted charges.
Cells would no longer have a state. Instead they would have a charge. Based on the charge, we can determine how well they connect with the current input, if they are predictive, whether they should activate, and so-on. With proper settings for charges and thresholds, distal dendrites could be made to only ever put a cell into a predictive state but never activate it, apical and distal inputs could be combined to generate activations, and so-on. Concurrency would also be possible (with removal of discrete time, every cell could in theory perform its functions concurrently with every other cell – would require some type of timing signal to synchronize though)
This is obviously a big deviation from HTM theory, so curious if anyone has thought if this before, and any potential issues you can see cropping up with this strategy.