RDSE implementation is wasting buckets?


#1

As I noted in another thread, I’ve been working on a JavaScript implementation of the Random Distributed Scalar Encoder (RDSE). I’m having trouble understanding why it is wasting buckets when the resolution of the encoder is below 1.0. Have a look at this quick video demo. Sorry about the crappy audio, I recorded it with my laptop mic while sitting on my recliner. :wink:

I’m not sure if my implementation is incorrect, or if I don’t understand something about the RDSE. It seems really wasteful to have more buckets than values.


#2

I think it encodes real numbers not just integers. You are stepping in integers, but if the resolution is 0.25 a step of 1.0 will cover 4 buckets.


#3

I would rather use a 1D coordinate encoder, or better yet @mrcslws’s sampling linear encoder.


#4

Aha, that makes perfect sense. I didn’t think about that. I think I can take advantage of that properly. Thanks, Felix!


#5

:sunglasses:

I think Matt will like to hear my perspective here.

I agree that the sampling linear encoder might be the ideal stateless scalar encoder.

  • By default, it’s a good bounded scalar encoder, i.e. with a min and max.
  • If you hash its output bits, it’s a good unbounded scalar encoder.
    • This is what you’re referring to.

The RDSE doesn’t use a hash. It is stateful, storing a memory of encodings that it has used. So it’s going to use up more memory and will potentially be slower, and its output will vary from trial to trial. Given the same parameters, two RDSEs will likely encode 42 differently, even if their parameters are the same.

On the other hand, the RDSE guarantees the overlap properties of its encodings. You can’t get that anywhere else. So I think the RDSE has a niche, maybe a large one. In experiments it’s comforting to know that hash collisions aren’t causing weirdness, and in real-world applications the number of input scalars might be low enough to avoid memory issues, especially when compared to an HTM’s synapses.

(We could talk about the “middle out” varying precision of the sampling linear encoder, but I’ll save for another day.)


#6

Hi @rhyolight as @floybix and @mrcslws say, this is because the RDSE is stateful. The way it works is that each bucket’s encoding differs from its neighbours by exactly one bit. If you have a new input value greater than your current biggest bucket, you need to add buckets one by one (changing one bit “at random” each time) until you have a bucket for the new value. The same applies in the negative direction.

This means that a naive implementation would have an encoding which depends on the entire and exact sequence of inputs seen, and the which is a really bad idea. Even worse, if you use the global random number generator, your encoding will change depending on who else is pulling numbers out of the RNG. The only option is to ensure that the bucket creation process is deterministic, and that means that you have to use a private RNG and a starting “center”, and you must build out your buckets in both positive and negative directions whenever you need to extend the range.

Using your example in the video, let’s say you start off with a center of 500 and a resolution of 0.25. If the first input is 600, you’ll have to grow 400 buckets up from 500 to 600, and 400 down from 500 to 400. So you’ll end up with 800 (or 801) buckets, many of which might never be used (in your example all the non-integer buckets are waste).

NuPIC does this by using its offset parameter to identify the center of the RDSE, and it holds a private stateful RNG in its random field. But NuPIC appears to have a bug because it doesn’t do the growing in both directions (see the createBucket() method. This means that the encoding will differ depending on whether you grow upwards or downwards when you need a new bucket.

For a very early study of how this works, see this page.


#7

I noticed this, but I don’t understand why it matters as long as encodings are not “shared” between different models? Even though each RDSE might encode data differently depending on the order it sees the data, in isolation every model will get a solid representation from the RDSE.


#8

That’s correct, @rhyolight, but it’s important to have a deterministic encoding of each input value, regardless of the order of presentation. This allows you to change out the HTM network structure, use different sized layers with different parameters, etc, turn on and off learning, and have reproducible results. More importantly, you can aggregate inputs over time, use the median or mean of the aggregates, change the aggregation size, etc. And you can start your sequence earlier or later, repeat parts of the sequence, and so on. If your encoding depends on the presentation sequence this becomes very difficult or impossible.


#9

On a similar note, why is it that the number of bucktes is restricted to 1000?
I tried playing around with it (changing the constant in the source code) and already at 1700 there was a noticible slow down.
What should I do if I wish to have descent resolution and descent range of values at the same time?
A 10,000 buckets could do. Now what encoder should I go for?


#10

A post was split to a new topic: Example usage of Coordinate Encoder