Does sequences varying length pose a problem for the TM?

Hello people,

I’m currently creating an HTM application that will classify snippets of text, where each snippet belongs to a category. Each snippet is of varied length from 4 till ~200 words. My idea is to send in SDRs of each word as a sequence for each snippet, i.e.

A->B->C->category1

During training the last SDR will be the category of the snippet (seen above). Then during testing I was thinking of excluding the category and thus letting the TM predict the correct category, i.e.

A->B->C->get prediction of next input from the TM

And then labeling the sequence based on the prediction of the TM.

So my question is, if someone knows, will the varied length of sequences pose a problem to the TM? Should I normalize the text snippets to length 4, sort of ad hoc, and expect better results?

No, but you must be sure to call reset() between sequences.

1 Like

5 posts were split to a new topic: Do I need an SP between my word encodings and the TM?