Hello people,
I’m currently creating an HTM application that will classify snippets of text, where each snippet belongs to a category. Each snippet is of varied length from 4 till ~200 words. My idea is to send in SDRs of each word as a sequence for each snippet, i.e.
A->B->C->category1
During training the last SDR will be the category of the snippet (seen above). Then during testing I was thinking of excluding the category and thus letting the TM predict the correct category, i.e.
A->B->C->get prediction of next input from the TM
And then labeling the sequence based on the prediction of the TM.
So my question is, if someone knows, will the varied length of sequences pose a problem to the TM? Should I normalize the text snippets to length 4, sort of ad hoc, and expect better results?