CLA / SDR Classifier Bug in nupic.core

I tried to use both the CLA and the SDR classifiers as implemented in nupic.core and they cause segmentation faults.

SDRClassifier CLA (std::vector<uint>{1}, clAlpha, clAlphaVal, 0);
ClassifierResult CLAresult;
[...]
CLA.compute(nlines, outputSDR_Cells, inputBucket, input, FALSE, learn, infer,
            &CLAresult); //segfault

I think this is a bug in the main compute implementation of the classifiers, but maybe I am not doing it correctly?
What I can say for sure is that the segmentation fault happens if and only if calling CLA.compute() because at the moment I not doing anything further with the result.
When trying to debug I input constant bucket values like 0 and this does not cause memory errors, but of course it canā€™t do anything usefulā€¦

CLA.compute(nlines, outputSDR_Cells, 0, input, FALSE, learn, infer,
            &CLAresult); //runs fine

I think the bug is in ClassifierResult but I am not sure:
In my case ClassifierResult CLAresult; was declared outside of the compute loop, therefore it was not going out of scope before a new write would occur to it.
By declaring CLAresult just before CLA.compute (in the same loop) the bug does not manifest anymore.

@ycui and @natoromano
I really do not know if Dorin uses your classifier correctly.
Do you have any workable sample C++ for demonstrating how to use your classifier?
or is it really a bug in your code?

Itā€™s OK I got it working, but the bug is still there - I suspect in ClassifierResult::createVector(). That is supposed to not create a new vector if the step number as hash key is already there, but probably something goes wrong, because in some cases I was also getting ā€œdouble freeā€ errors at the end of the program when the result object would finally go out of scope.

I might have figured out the bug but I need confirmation.
Think what would happen when the classifier calls createVector() and requests a size 2048 and the step was previously written to, so it just gets v = it->second; in return but v was previously allocated with a size 10 because that was the only bucket value seen so far.

vector<Real64>* ClassifierResult::createVector(Int step, UInt size, Real64 value)
{
    vector<Real64>* v;
    map<Int, vector<Real64>*>::const_iterator it = result_.find(step);
    if (it != result_.end())
    {
        v = it->second;
    } else {
        v = new vector<Real64>(size, value);
        result_.insert(pair<Int, vector<Real64>*>(step, v));
    }
    return v;
}

@dorinclisu Sorry for my late response. I have been traveling in the past few weeks. It looks like you are using the classifier correctly. An example of the classifier usage can be found here

I think the bug might be real. I am not really familiar with the C++ version of the SDRClassifier. Could you tell us under what circumstances do you see the segmentation fault and ā€œdouble freeā€ errors? It would be great if you can share your input/output pairs for us to reproduce the error. Thanks!

I donā€™t remember the things I tried that got me double free errors, but the segmentation fault happens everytime the classifier compute() function writes more than 2 times to the same ClassifierResult instance.
For example, as in the unit test, adding 2 more computes to the same result will cause not only some assertion errors, but also:

c.fastCompute(0, input1, 4, 34.7, false, true, true, &result1);
c.fastCompute(1, input1, 4, 34.7, false, true, true, &result1);
c.fastCompute(2, input1, 4, 34.7, false, true, true, &result1);
*** Error in `./unit_tests': free(): invalid next size (fast): 0x00007f5e58003290 ***
Aborted (core dumped)

If this is run outside gtest, then the error is just a simple ā€œsegmentation faultā€.
The way I got around this is by destroying the result object and creating a new one before every compute(), however this defeats the whole purpose of having an associative array inside the result.

Thanks. I have created an issue in nupic.core and we will look into it.

1 Like

Hi! This indeed looks like a bug. Good catch!

The ClasifierResult instance should be cleared in some way when calling compute. Your get around seems correct until this is fixed, and should not yield errors.

Sorry about that!

@Hopding,

Can you confirm whether this problem ā€œleakedā€ into HTM.Javaā€™s SDRClassifier ? I doubt it, but I thought Iā€™d ask if its ā€œClassificationā€ object is being reused as I assume the coreā€™s ā€œClassifierResultā€ is?

1 Like

@cogmission Sure, Iā€™ll check as soon as I have a chance. From what I recall, I donā€™t believe the Java classifier reuses the ā€˜Classificationā€™ result, though.

1 Like

Looks like every invocation of compute, a new Classification object is created and returned: https://github.com/numenta/htm.java/blob/master/src/main/java/org/numenta/nupic/algorithms/SDRClassifier.java#L386. Assuming the call to compute requests inference, otherwise it just returns null. :+1:

1 Like