The biggest problem that I’ve had so far with the HTM theory, from an implementation perspective, is the large amounts of RAM that are required (or alternately rely heavily on disk swapping, which drastically impacts performance).
A couple of initial optimizations ideas that came to mind (which I’m sure everyone starting out has had):
- SDRs can be greatly compressed by representing them as dense arrays of indexes to the 1’s
- If I can define a fixed memory size for each cell, I can eliminate need to index columns. A certain memory step could be utilized to sequentially access each cell in a column (such as when checking for cells in the predictive state or to trigger bursting)
With these initial ideas in mind, I’ve been putting some thought into efficient, fixed memory layout strategies for a single Cell in the system. The memory layout will need to include the cell’s state, some historic and predictive information, its dendrites, synapses, and permanence values. I thought I would run these by everyone here to get some feedback on things I might be missing, or other optimizations I haven’t considered.
I am using the values from the HTM Cheat Sheet to help in determining the number of bits necessary for representing each component of a cell:
- Num Columns (N): 2048
- Num Cells per Column (M): 32
- Sparsity : 2%
- Dendritic Segment Activation Threshold (θ): 15
- Initial Synaptic Permanence: 0.21 Connection Threshold for Synaptic Permanence: 0.5
- Synaptic Permanence Increment and Decrement: +/- 0.1
- Synaptic Permanence Decrement for Predicted Inactive Segments: 0.01
- Maximum Number of Segments per Cell: 128
- Maximum Number of Synapses per Segment: 128
- Maximum Number of New Synapses Added at each Step: 32
From this list I can make a few conclusions:
- Total number of cells in the system will be 65,536
- If each cell can connect with up to 2% of the other cells, that makes 1,310 potential connections (synapses) per cell.
- Indexes up to the number 65,536 require at least 16 bits per index.
- Permanence can change at a granularity of 1%, so there are 100 different possible permanence values
- Encoding a number up to 100 requires at least 7 bits
- To group synapses by segment, each synapse will need room to store a value up to 128
- Encoding a number up to 128 require at least 7 bits
With this in mind, there are a variety of memory layout strategies that could be used. After a few iterations, I think I am at a pretty good layout (listed in order from first bit onward)
16 bits for active history (will explain below)
1 bit for active state
1 bit for predictive state
14 bits for predictions (will explain below)
20,960 bits (16 bits X 1,310) for synapse indexes
9,170 bits (7 bits X 1,310) for segment grouping of synapses
9,170 bits (7 bits X 1,310) for permanence values
Total: 39,332 bits per cell
This means the total layer (65,536 cells) would come to 2,577,661,952 bits (307 MB)
As you can see, the main candidates for optimization are the bits necessary to encode elements of the synapses. One obvious abstraction (would deviate from the biology, but not sure what impact it would have) would be to eliminate the segment grouping of synapses, and instead treat each synapse as a direct, independent connection between cells. This would reduce memory in two ways: first, it would eliminate the 9,170 bits per cell needed to store the grouping information, but also it would reduce the granularity of permanence changes to 10% (so 9,170 bits for permanence values could be reduced to 5,240 bits) This optimization would reduce the total layer memory usage to 205 MB.
Another possible optimization would be to reduce the number of bits for the synapse indexes. For example, you could instead use 11 bits for indexes, and use the middle value (1024) as the cell’s own index. This would mean that a cell could only connect to 2048 potential other cells (that are within range 1024 of its own position). Used in conjunction with the previous optimization, this would reduce the total layer memory usage to 154 MB.
Obviously the more abstractions applied that deviate from the biology, the less likely the system will behave as expected, so would require some testing to determine the cost.
To explain my thoughts on the first 32 bits of the memory layout, I was thinking from the perspective of a 32 bit x86 register. For reference:
Placing the bit for active state in the most significant bit of AH, and the bit for predictive state in the next most significant bit, this arrangement would allow for some fast operations for storing history of the cell’s active and predictive states.
For example, with the above arrangement, a couple of shift operations will push previous history back one slot, move the current state into the first history slot, and clear the bits for the next cycle:
shl eax, 1
shr ax, 2
Of course, a history of predictions is probably not very useful, so instead shifting ax right one slot then walking through the cell’s connections, AND’ing the active bit from T-1, T-2, T-3, etc would be much more useful (for predicting the cell’s state 1, 2, 3, etc steps into the future).
Anyway, I’m curious to get some ideas from other folks on memory layout strategies.