Hello guys! I’m trying to create a semantic encoder using HTM and Word2Vec with 50 dimensions. So, i put 50 scalar encoders in multi-encoder and convert dimensions into SDR. Then I get a prediction SDR. The question is: how i can convert this SDR into the word? Do I need using a classifier or not?
You have an encoder take takes a word as input and gives you a SDR output.
When you have an output SDR and want to know wchich word in your domain it represents I think you should calculate the SDR that most overlaps the corresponding SDRs of your encoder.
How large is the end encoding? Seems like too much data IMO. You should really look into Cortical.IO’s product, which can convert words into SDRS and back (and much, much more).
Here’s a link to Cortical.io’s web api demo: http://api.cortical.io
You can test out Fingerprint (SDR) production for several text formats…
I trying a several different configuration for input/output size. Encoder input (n) = 625, encoders count = 50, total output = 31250 with 2% active columns. I also tried set encoder input to 325.
Thanks! I will try to calculate overlaps from my SDR.
With an input this large, your spatial pooling size is also going to need to be very large. I don’t think this approach is going to work because compute cost increases quickly as you increase the SP size (that’s a lot of potential connections!)
I suggest you find a way to decrease the size of your input space. I am not sure how you can do this and also represent so many distinct semantic features in the input.
One way to reduce size of the input space would be to leverage topology, where semantically similar bits are physically closer to each other in the encoded representation than dissimilar ones. This would allow you to perform a simple scaling algorithm to reduce the size of the input space. This of course would require changing your encoding strategy.
If you decide to write a new encoder anyway, I would recommend exploring other strategies than stacking scalar encoders. One idea that comes to mind might be modding the SP algorithm to work in high dimensional space, so you can round the vectors and use them directly.