One of the biggest unspoken challenges I had when exploring HTM theory and trying to build applications was: how do you choose and configure encoders to translate your raw input data into SDRs? This problem has been acknowledged by many, and there’s a paper or two on the problem of encoders, but very little analysis has been done on them.
I’ve wanted to understand the problem of representations in HTM-like Binary Population Neural Networks and understand how information is represented and transformed through the system. However, we must start from the beginning of how data is encoded in the first place.
What always frustrated me from the beginning is that the output of encoders are not SDRs, according to their definition. They are not always sparse, and they are usually localist, not distributed. In fact, according to the classical definition of distributed representation, (every element is participating in the representation of the content), its questionable if a binary array can ever be distributed since each element has only one state in which it is expressing information “1”. The “0” represents “no information”, or rather “no evidence of the thing I represent”.
So what kind of properties can we derive from an encoders output? Here I show some visualizations of a scalar encoder on the unit interval. In particular, I call this a Fixed Weight Encoder since the output is guaranteed to have the same number of active bits.
There’s quite a lot going on in this figure and I want to break it down by pieces. This is the simplest possible scalar encoder. Here, the parameter w=1 which means 1 bit is active at any time. By setting w=1, the scalar encoder becomes a one-hot encoder.
The x-axis indicates every possible input value to the encoder along the unit interval [0,1]. Each subplot is aligned along the x-axis, so you can see how the properties of a changing input affects the properties of each subplot
Starting from the bottom:
Row 1 shows the encoded binary array for each value of x. The array is oriented sideways so we can see it change as x changes.
Row 2 shows two properties, the weight of the encoding, which is the total number of 1-bits in the output, and the location of the crossover points, the locations along the x-axis where the code changes and by how many bits. Here the weight is constant since we are using a Fixed Weight Encoder, and the crossover points are evenly distributed and only one point at a time. If we complicate the types and numbers of encoders, this plot will get more interesting.
Row 3 shows how similarity changes between two randomly chosen values. This tries to evaluate how well this encoder adheres to the principle of “similar inputs produce similar outputs”. Given this is a one-hot encoding, similarity for a value only exists in the discrete region it falls into. Given higher w and additional encoders, more graduated relations of similarity emerge between similar encoded inputs.
Row 4 shows discrete “bin” representations of each of the bits. If the input value falls into a bin, the corresponding bit is activated. This gives a well-defined meaning for each of the encoder’s output bits. In particular, the meaning of the 0’th bit, b_0, would be something like:
b_0 = \left\{ \begin{array}{ c l } 1 & \quad \textrm{if } 0 \leq x \leq 0.14 \\ 0 & \quad \textrm{otherwise} \end{array} \right.
Here’s a plot with w=4.
We can see a much smoother transition on the similarity plots since the encoding retains some property of “similar inputs, similar outputs”. Why don’t we call this property SISO.
Another way to view SISO is through a self-similarity heatmap:
Along the diagonal, we see the maximum possible amount of similarity.
To clarify, our metric of similarity is the number of common bits between two representations. This is equivalent to the cosine similarity or dot product for vectors. For binary arrays, this can be computed by an AND() operation between followed by a POPCOUNT() operation which counts the number of set bits.
I hope people find this interesting since I have a lot more visuals and variations. This extends to a couple variations of the scalar encoder, place cells, grid cells, and generalized periodic encoders. It also extends to 2D and n-D, but I think focusing on 1D for the moment gives us the opportunity to understand how these representations work.