Using SP is expensive, I’m looking for algorithms to generate suitable SDR’s directly from the Encoder.
Below is the code of one I created. Pls comment on what benefits/drawback you can see.
The normal procedure is to Encode the values and then pass it trough Spatial Pooler,
but it is expensive operation. What if we can encode directly and skip SP.
This is a Naive algorithm I came up to do just that, here is how it works.
The algorithm depends on 3 parameters.
- vsize : the number of bits of the SDP
- spaOnbits : sparsity percent OR number of bits
- olap_nbits : how many bits do neighboring items overlap
First we have to calculate capacity i.e. how many items we can represent.
For example if the SDP is 1000 bits, spa-bits is 20 and neighboring items dont overlap,
then the capacity is 1000/20 = 50 items i.e. we can generate randomly 50 items the
51st will necessarily generate item that will overlap with some other item.
With me so far ?
Now if we allow 10 bits overlap, we can generate 1000/(20-10) = 100 items, before we
generate item that will create item with bigger overlap.
Now that we know the capacity, how do we do it :
We keep a set of all free for use bits. In the beginning its all of the 1 … 1000.
We generate spa-number-of-bits randomly and remove them from the free-set.
Next we pick randomly olap-number of bits from the previous item (in this case the first one) and non-olap-number of bits from the free-set, where :
self.non_olap = self.spa_nbits - self.olap_nbits
Then we rinse and repeat until we exhaust all the possible items…
Here is a quick test…
In [112]: ne = NaiveSkipEncoder(minimum=0,maximum=100,vsize=1000,spaOnbits=20,olap_nbits=10)
In [111]: np.array([ne.decode(ne.encode(x)) for x in range(0,100)])
Out[111]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
Also the similarity maps seem to show that similarity is ~preserved.
BTW the Encoder I crated depends on some other libs I use, so if you want to use it you may need to implement those. Most notably the iLexicon which is KV-SDR store.
more examples of the item :
In [112]: ne.lex[3]
Out[112]: 16:78:95:155:239:277:294:313:336:344:349:353:378:410:507:580:718:757:887:979
In [113]: ne.lex[4]
Out[113]: 16:123:155:217:239:261:271:283:294:313:353:507:580:589:676:711:872:887:898:979
In [114]: ne.lex[3] // ne.lex[4]
Out[114]: 0.500 <- similarity
In [115]: ne.lex[3] / ne.lex[4]
Out[115]: 10 <-- overlap
In [117]: ne.lex[3] // ne.lex[5]
Out[117]: 0.250
In [116]: ne.lex[3] // ne.lex[6]
Out[116]: 0.050
and encoded values :
In [119]: ne.encode(55) // ne.encode(54)
Out[119]: 0.500
In [120]: ne.encode(55) // ne.encode(56)
Out[120]: 0.500
In [121]: ne.encode(55) // ne.encode(57)
Out[121]: 0.500
In [122]: ne.encode(55) // ne.encode(58)
Out[122]: 0.250