MemoryError when predicting with NuPIC

wtz5pp · July 19, 2017, 5:51pm

Hi everyone,

I’m relatively new at using NuPIC and I’ve been trying to modify the code provided in the NuPIC examples for word prediction (https://github.com/numenta/nupic/tree/master/examples/prediction/category_prediction) to work with a larger data set (specifically, the penn tree bank data set). However, it only seems to run successfully when I limit the number of categories by limiting the data set size. When I try to run the model through the whole data set (~10000 unique words), it gives a MemoryError after 2 calls of Model.run().

The token files are generated successfully; the error only happens when I call Model.run(). I’ve already tried modifying the “maxCategoryCount” parameter to 10000, but the same error occurred. I’m not entirely sure what the problem is. The exact error is reproduced below:

Any help would be greatly appreciated!

rhyolight · July 19, 2017, 6:03pm

Hi @wtz5pp and thanks for posting to the forums. Can you try using the SDRCategoryEncoder and see if that helps?

wtz5pp · July 19, 2017, 6:26pm

SDRCategoryEncoder has another parameter “n”; what would be a good value of n and w to use in this case?

rhyolight · July 19, 2017, 6:31pm

This space might still be too large to represent. Think about it like this. If you have over 10K unique categories to encode, you need an input SDR that is at least 10K cells. Each cell would represent a unique value all by itself. I think you have too many unique values to represent.

We typically try to generalize a bit when creating encoders. Each category probably is not entirely different from others. Some may need to be encoded so there is semantic similarity between them. What do the categories represent? Can you list some typical values?

wtz5pp · July 19, 2017, 6:37pm

Each category is a different word (i.e. the, and, it, although, sheep…) --there are around 10,000 unique words in the data set.

rhyolight · July 19, 2017, 6:41pm

Instead of representing words as categories, you should get semantic fingerprints from Cortical.IO. Here are some resources:

wtz5pp · July 19, 2017, 7:01pm

Thank you so much for the help!

The MemoryError doesn’t appear anymore after changing the encoder, so I think I’ll try the SDRCategoryEncoder first before looking into cortical.io.

rhyolight · July 19, 2017, 8:01pm

Ok, but I’m not sure you will be very successful without encoding some semantic similarity between terms.

Topic		Replies	Views
Help to with NLP in NuPIC NuPIC	3	612	April 8, 2019
Raw TM Test (no SP) NuPIC encoders , temporal-memory , category-encoding	30	1373	June 10, 2018
Is there a maximum length for the inputVector of the Spatial Pooler? NuPIC spatial-pooling , question	8	718	December 4, 2017
Output crash / problem with PassthroughEncoders NuPIC	1	1049	November 2, 2017
Help with a basic python error NuPIC newbie	23	1685	August 30, 2018

MemoryError when predicting with NuPIC

Related topics