Is this a valid number encoder?

When learning about scalar encoders I found it somewhat surprising that the values 420 and 429 did not have any semantic overlap unless the bucket size was increased. Even if it were increased the values 52420 and 52429 still did not share the 524xx semantic overlap. Does the RDSE encoder resolve this?

As 52420 and 52429 share a 80% semantic similarity, I was wondering if a simple encoder could represent this. There is a row for each base 10 increment: 0, 10, 100, 1000, etc. and each digit is placed in a cell within its row.

0…0000000001 - 9
10…0010000000 - 2
100…0000100000 - 4
1000…0010000000 - 2
10000…0000010000 - 5

As the rows go from 0-9 all trailing 0s will be represented as 10000000000.
Just to give another set of examples to show the semantic similarities between large values.

I can imagine this type of encoding could be very useful for values with decimal places, as the values can be multiplied into integers to have an accurate SD representation.

There is probably a major issue with this type of encoding. I’d like to know if this type of encoding wouldn’t work and what issues it may have.

1 Like

It also depends on the range of the encoder (min/max). Even with a very small bucket size, 420 and 429 will have significant semantic overlap when min=410 and max=440.

Again, the min and max settings of the encoder are very important here. Imagine if max was 1 trillion. Those two values would be very similar no matter the bucket size.

That’s an interesting idea, but what makes 10 so special? What about translating the number into straight binary? or hex or octal? (Just playing devil’s advocate here :japanese_ogre:)

1 Like

Hi Matt, thanks for the feedback!

Will they have any semantic overlap if min=0, max=trillion? I guess it would be nice to always have an absolute semantic overlap regardless of how large the max.

Is that with the scalar encoder or the RDSE, or both? In that case it might mean I don’t fully understand how these encoders semantically represent values.

I just used base 10 as an example - I guess any base could be used.

Have you seen this?

I’m really sorry about the man bun

Yup, I’ll probably read the paper you suggested in the video.

I suppose 52420 and 52429 was a bad example when talking about large ranges. As far as I understand the scalar encoder won’t represent any overlap between 52000 and 52888 without enlarging the bucket size into the hundreds. However, with this encoder these two values will have 40% overlap because the 52xxx have two of the same bits. The same will be true for any scale - 52000000 and 52888888 (52xxxxxx) will have a 25% (2 bit) overlap.

You’re right. There is a trade-off that needs to be made. You can’t have very high resolution without giving up semantic similarity in some situations where you might want it.

Your idea might work great, you just have to code it up and test it out with HTM and see how it works.

You may also create a logarithmic encoder, which may be more inline with how people perceive large numbers.