Trading byte size for noise

In a system where size in bytes of an SDR is of concern (such as synchronizing between multiple threads, or transmitting over a network), the thought occurred to me that indices to the 1 bits could be relative to each other. If the space is too large to fit into the data type used, noise could be added.

For example an SDR encoded as indices to the one bits, such as:

[2,16,254,267,509,513,752,763,1020,1023 …]

Could be encoded as relative indices:

[2,14,238,13,242,4,239,11,257,3 …]

In this case, lets say we want to encode the indices as uint8, which has a rang of 0 - 255. One of the values (257) is too large. However, if we insert a single bit of noise, then we could use uint8 (thus halving the footprint compared to the next size up uint16):

[2,14,238,13,242,4,239,11,255,2,3 …]

My understanding of the principles tells me that this should be fine (HTM being inherently noise tolerant). Wondering if anyone has explored this before, and has any thoughts.

What occurs to me is that in a stream that has large blocks of zero bits (more likely with topology), there would be something of a pattern in the noise (every 250’th bit on, in the case of uint8). Has anyone explored the affects of noise involving certain bits that are always on (versus randomized noise that changes every time step), or even patterns of bits that are on in a system that relies on topology?

A slightly different take on the same issue …

1 Like

The offset is limited by the number of input bits. It is easy to store the on indices as Int16 or Int32, and don’t impact the memory usage practically speaking, since not every SDR will be stored, just the currently produced one. Using offset adds to processing and is a limited optimisation.

Let’s try some numbers and see if this is important or not.
Start with an image processing application, using a modest 1K x 1K map.

Each cell may have 8 dendrites for proximal and 16 for distal, Each dendrite may have a large number of potential synapse locations, say 1024.

The data structure for each synapse is a connection location and a permanence value. For a large map (more than 65K neurons) that means a permance byte for each location and a 32-bit address for the connection location.
1,000,000 x 24 x 1024 x 3 bytes = 73,728,000,000

Using paths means a single address for the path and just the list of permanence values, a three-fold reduction in storage space. The path table should be able to fit in the L1 cache for a dramatic speedup in memory access.
1,000,000 x 24 x (1024 + 2) x 1 bytes = 24,624,000,000

24 gb is a large memory footprint but not ridiculous for modern hardware.
You don’t see a lot of 75 gb main memory machines.

How is the path stored? What is used to define it?

I agree with @Bitking that this could potentially be used for further optimizations within the HTM algorithms, but that aside, I was actually thinking about cases where size in bytes matters for reasons other than memory reduction (i.e. not a single-threaded implementation of HTM). Consider sharing information between processes over a network (or even between cores on a single PC). You need to keep the window of synchronization as small as possible by reducing the size of data that must be processed in series. This system could in fact consume more memory, not less (each shard in a cluster architecture would need its own working copy of the activation SDRs of other shards, for example). The benefit in this case is enabling parallel processing to increase performance, not reducing the memory footprint.

I see. I am not sure. It would still make sense to store them in larger data types instead of processing them to obtain individual large ones.

Another way to think about it, is imagine yourself a node located in Mumbai, part of a large HTM network spanning the globe. You are responsible for doing temporal memory, and you are relying on a node located in Redwood City to tell you what were all the active cells at timestep T in the layer you are part of before you can begin processing timestep T + 1. The bottleneck in this scenario is not going to be how fast you can convert relative indices into absolute. The bottleneck will be network latency. Any way you can compress the size of transmitted data will impact how quickly you can begin working on the next timestep.

I thought this seemed obvious - my bad.
The paths are stored in an array just like any other data structure in the program.

I am using random walks with direction bias to generate the paths now.
I am thinking of adding a branch feature where after a certain size run I start a new run from a prior generated node selected at random. This would change the sampling density vs distance from the cell body.
You can get fancy with this.
One Rule to Grow Them All: A General Theory of Neuronal Branching and Its Practical Application
http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000877&type=printable

Each dendrite has a random selection of the generated paths, fixed at the time of map creation.

The reason for multiple paths is that a single fixed pattern has aliasing issues - like a Moiré pattern.

There is a surprising amount of literature (to me anyway) that discusses dendrite shapes and how they get that way.

The single dendritic branch as a fundamental functional unit in the nervous system

Generation, description, and storage of dendritic morphology data

Assisted morphogenesis: glial control of dendrite shapes

Conserved properties of dendritic trees in four cortical interneuron subtypes
https://www.nature.com/articles/srep00089

Check out the reference list on this link:
Modelling Dendrite Shape from Wiring Principles

There is a lot going on inside the dendrite - it’s not just a passive wire. I continue my studies to see if any of this aids in learning patterns. An example:

Dendritic geometry shapes neuronal cAMP signaling to the nucleus
https://www.nature.com/articles/ncomms7319

1 Like

BTW, the numbers @Bitking broke down could be directly applied to the classic Spatial Pooling algorithm with very little effort. A single index to one of a list of pre-defined paths (or a seed to define it procedurally) and list of permanence values would have a notably smaller memory footprint compared to the classic implementation.

The application to TM may be a bit less clear-cut since the classic algorithm doesn’t define the “potential pool” in advance like in SP. But the concept could be applied there as well. Rather than forming the potential pool over time as the system runs, instead potential pools could be given to each new segment that is grown, and be given a corresponding list of permanence values. This is of course a deviation from the classical algorithm, so would take some testing to see what impact it would have on behavior of the system, capacity, etc.

1 Like

I just thought of a way to handle this without introducing noise (which kind of makes the original point of this thread irrelevant… but still was a great discussion!)

255 could be given a special meaning “skip 255 indices” (i.e. it doesn’t set a 1 bit like 0 - 254 do). Whenever the actual relative index 255 needs to be encoded, it can simply be input as [255,0].

1 Like

So these arrays are all stored in cache memory?

I see. So this compression will happen only when the transmission of data takes place and won’t be how the indices are managed by the layers?

Yes that is the immediate application that I am using it for (for my experiments with a concurrent HTM implementation)

1 Like

That path leads to madness: WItness extended ASCII!

Once you take the first step you can’t stop. Before you know it you will have a zoo of special code pages.

1 Like

‰µª† Ðö ¥ºµ µæñ¿

1 Like

With any luck, yes.

And of course - the overall reduction in memory footprint.

I am working out the details of a system that will need a dozen or so largish maps and as it is it will have to run off several workstation class machines with good GPUs so anything that lets me shoe-horn this into less hardware will save me a bundle.

Hmph… What I was asking is whether caching is the integral and important part of your model which leads to the memory optimisation?

Huffman code your deltas.

Btw:

Another way to think about it, is imagine yourself a node located in
Mumbai, part of a large HTM network spanning the globe. You are
responsible for doing temporal memory, and you are relying on a node
located in Redwood City

Me likey this.