Hi I am playing with the one hotgym opf anomaly example in nupic. For every 100 data points, I save the model and then next time I load the model to continue the computation. I observed the prediction results are quite different at some point, comparing to running the entire dataset in memory. Then I also tried for every 600 data points, save and load the model. the results start to diverge at much later data points. I was wondering if load/save can cause the precision loss?
This certainly should not happen. Can you provide some more evidence of this behavior? Like predictions from a normal run against your data vs. a run where you serialize and resurrect your model in the middle of it.
Yes, one can easily reproduce my results. Just put something like the following in the the loop to save and load the model at every x data points
if (counter % x == 0):
_ print “Read %i lines…” % counter_
_ model.save(my_path)_
_ model = model.load(my_path)_
you can try different x values to see how the results change. I am attaching results for every 100 and 600 data points (https://drive.google.com/open?id=0B4TNsSMedgSoZHgxVjZsWURIMFE), and if x >= 3945, the results will be the same as running the entire data set in memory.
Indeed, see the disabled tests https://github.com/numenta/nupic/blob/64400aa71982adb8069ec595ecbd4b9950d23183/tests/integration/nupic/opf/opf_checkpoint_test/opf_checkpoint_test.py#L461-L483.
@rhyolight, both of the above-mentioned disabled tests reference the issue NUP-1864 in Numenta’s JIRA. NUP-1864 is closed for some reason, but should be opened. Also, check with @subutai - I think he may have had an explanation about this discrepancy.
Thanks, @vkruglikov and @rainyyun for reporting. I’ve created a nupic.core issue to cover this problem on the OS tracker.
I believe this is due to the fact that the current serialization converts floating point numbers to strings and back again. Converting to string leads to a slight loss of precision so results can diverge, though qualitatively there should be little effect on accuracy.
With the new Capn Proto based serialization, this issue should go away, but it would be good to verify.