When learning about scalar encoders I found it somewhat surprising that the values 420 and 429 did not have any semantic overlap unless the bucket size was increased. Even if it were increased the values 52420 and 52429 still did not share the 524xx semantic overlap. Does the RDSE encoder resolve this?
As 52420 and 52429 share a 80% semantic similarity, I was wondering if a simple encoder could represent this. There is a row for each base 10 increment: 0, 10, 100, 1000, etc. and each digit is placed in a cell within its row.
0…0000000001 - 9
10…0010000000 - 2
100…0000100000 - 4
1000…0010000000 - 2
10000…0000010000 - 5
As the rows go from 0-9 all trailing 0s will be represented as 10000000000.
Just to give another set of examples to show the semantic similarities between large values.
I can imagine this type of encoding could be very useful for values with decimal places, as the values can be multiplied into integers to have an accurate SD representation.
There is probably a major issue with this type of encoding. I’d like to know if this type of encoding wouldn’t work and what issues it may have.