Problem running newly saved / loaded SP & TM

  1. 20 000 iterations before serialization and 5000 after
  2. Now - Yes.Model fed at the same interval
    3)Events continuous.
  3. No, the model trained once. And comes back only to make inferences.
  4. The model works just fine before saving, so my question relates only to the issue I have while saving/restoring my model.
  5. No, only a sequence of values matters in this case

What version of NuPIC are you using? Be sure you are using 1.0.5.

Requirement already satisfied: nupic in /home/japanes/calc/venv/lib/python2.7/site-packages (1.0.5)

Please double check that you have nupic.bindings==1.0.6 by running pip list | grep nupic.

nupic 0.8.0
nupic-studio 1.1.3 /usr/local/lib/python2.7/dist-packages
nupic.bindings 0.7.0

Wow . It is very strange(

Ok, please uninstall nupic completely, then reinstall with pip and try again?

I try again but nothing changed

What does pip list | grep nupic say you have installed now?

nupic 1.0.5
nupic.bindings 1.0.6

That looks right. I still don’t quite understand the problem. Can you explain more how the model’s behavior is different after a save? Do the anomaly scores change drastically? Was the model performing well before the save? If so, how do you define “well”?

My script before serialization:

> def runLearning(numRecords):
>  
>   learning_time = time()
>   with open("test3.csv", "r") as fin:
>     reader = csv.reader(fin)
>     headers = reader.next()
>     reader.next()
>     reader.next()
> 
>     for count, record in enumerate(reader):
>       print "Count",count
>       if count >= numRecords: break
> 
>       # Convert data string into Python date object.
>       #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
>       # Convert data value string into float.
>       event_value = float(record[2]) # device 1
>       event_value_3 = float(record[4]) # device 3
>       event_value_2 = float(record[3]) #device 2
>       # event_value_7 = float(record[8]) # device 7
>       bezline_all = float(record[10])
>       pres_data    = float(record[11])
>       flow_value  = float(record[0])
>       # To encode, we need to provide zero-filled numpy arrays for the encoders
>       # to populate.
>       eventBits = numpy.zeros(eventEncoder.getWidth())
>       eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
>       eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
>       presBits = numpy.zeros(pressEncoder.getWidth())
> 
>       baseline_Bits = numpy.zeros(baselineEncoder.getWidth())
>       flowBits = numpy.zeros(flowEncoder.getWidth())
> 
> 
>       # Now we call the encoders to create bit representations for each value.
>       eventEncoder.encodeIntoArray(event_value, eventBits)
>       eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
>       eventEncoder2.encodeIntoArray(event_value_2,eventBits_2)
>       pressEncoder.encodeIntoArray(pres_data,presBits)
> 
>       baselineEncoder.encodeIntoArray(bezline_all,baseline_Bits)
>       flowEncoder.encodeIntoArray(flow_value, flowBits)
> 
> 
>       # Concatenate all these encodings into one large encoding for Spatial
>       # Pooling.
>       encoding = numpy.concatenate(
>         [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
>       )
> 
>       # Create an array to represent active columns, all initially zero. This
>       # will be populated by the compute method below. It must have the same
>       # dimensions as the Spatial Pooler.
>       activeColumns = numpy.zeros(spParams["columnCount"])
>       # activeColumns1 = numpy.zeros(spParams["columnCount"])
> 
> 
>       # Execute Spatial Pooling algorithm over input space.
> 
>       sp.compute(encoding,True,activeColumns)
> 
>      # sp.compute(encoding1, True, activeColumns)
> 
>       activeColumnIndices = numpy.nonzero(activeColumns)[0]
> 
>       # Execute Temporal Memory algorithm over active mini-columns.
>       tm.compute(activeColumnIndices, learn=True)
> 
>       activeCells = tm.getActiveCells()
> 
>       # Get the bucket info for this input value for classification.
>       bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
>       bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
>       bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]
> 
> 
> 
>       # Run classifier to translate active cells back to scalar value.
>       classifierResult = classifier.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx,
>           "actValue": event_value
>         },
>         learn=True,
>         infer=False
>       )
>       classifierResult1 = classifier1.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx_3,
>           "actValue": event_value_3
>         },
>         learn=True,
>         infer=False
>       )
> 
>       classifierResult2 = classifier2.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx_2,
>           "actValue": event_value_2
>         },
>         learn=True,
>         infer=False
>       )
>       learning_time_end = time()
>       print "Time",(learning_time - learning_time_end)
>   with open("out_sp.tmp", "wb") as f1:
>     sp.writeToFile(f1)
>   with open("out_tm.tmp", "wb") as f2:
>     tm.writeToFile(f2)
> 
> if __name__ == "__main__":
>   runLearning(20000)

My script after serialization:

def runTesting(numRecords):
  testing_time = time()
  global  result_testing,oneStep,result_testing1,oneStep1,result_testing7,oneStep7,result_testing2,oneStep2
  with open("test3.csv", "r") as fin:
    reader = csv.reader(fin)
    headers = reader.next()
    reader.next()
    reader.next()

    for count, record in enumerate(reader):
      print "Testing count",count
      if count >= numRecords: break

      # Convert data string into Python date object.
      #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
      # Convert data value string into float.
      priv = count
      event_value = result_testing[count]
      event_value_2 = result_testing2[count]
      event_value_3 = result_testing1[count]
      # event_value_7 = result_testing7[count]
      pres_data = float(record[11])
      bezline_all = float(record[10])
      flow_value  = float(record[0])

      # bezline = float(record[10])
      # encoding = float[record[9]]
      # To encode, we need to provide zero-filled numpy arrays for the encoders
      # to populate.
      eventBits = numpy.zeros(eventEncoder.getWidth())
      eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
      eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
      # eventBits_7 = numpy.zeros(eventEncoder7.getWidth())
      presBits = numpy.zeros(pressEncoder.getWidth())
      flowBits = numpy.zeros(flowEncoder.getWidth())
      baseline_Bits = numpy.zeros(baselineEncoder.getWidth())

      # Now we call the encoders to create bit representations for each value.
      eventEncoder.encodeIntoArray(event_value, eventBits)
      eventEncoder2.encodeIntoArray(event_value_2, eventBits_2)
      eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
      # eventEncoder7.encodeIntoArray(event_value_7, eventBits_7)

      baselineEncoder.encodeIntoArray(bezline_all, baseline_Bits)
      flowEncoder.encodeIntoArray(flow_value, flowBits)
      pressEncoder.encodeIntoArray(pres_data,presBits)

      # Concatenate all these encodings into one large encoding for Spatial
      # Pooling.
      encoding = numpy.concatenate(
        [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
      )


      # enc = numpy.concatenate(encoding, encoding)
      # Create an array to represent active columns, all initially zero. This
      # will be populated by the compute method below. It must have the same
      # dimensions as the Spatial Pooler.


      colum_count = sp2.getColumnDimensions()
      print "Columncout:", colum_count
      activeColumns = numpy.zeros(colum_count)


      # Execute Spatial Pooling algorithm over input space.
      sp2.compute(encoding, False, activeColumns)


      activeColumnIndices = numpy.nonzero(activeColumns)[0]


      # Execute Temporal Memory algorithm over active mini-columns.
      tm2.compute(activeColumnIndices, learn=False)

      activeCells = tm2.getActiveCells()

      # Get the bucket info for this input value for classification.
      bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
      bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
      bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]
      # bucketIdx_7 = eventEncoder7.getBucketIndices(event_value_7)[0]


      # Run classifier to translate active cells back to scalar value.
      classifierResult = classifier.compute(
        recordNum=count+20000,
        patternNZ=activeCells,
        classification={
          "bucketIdx": bucketIdx,
          "actValue": event_value
        },
        learn=False,
        infer=True
      )
      classifierResult1 = classifier1.compute(
        recordNum=count+ 20000,
        patternNZ= activeCells,
        classification={
          "bucketIdx": bucketIdx_3,
          "actValue": event_value_3
        },
        learn=False,
        infer=True
      )

      classifierResult2 = classifier2.compute(
        recordNum=count+ 20000,
        patternNZ= activeCells,
        classification={
          "bucketIdx": bucketIdx_2,
          "actValue": event_value_2
        },
        learn=False,
        infer=True
      )
      # Print the best prediction for 1 step out.
      oneStepConfidence, oneStep = sorted(
        zip(classifierResult[1], classifierResult["actualValues"]),
        reverse=True
      )[0]
      oneStepConfidence1, oneStep1 = sorted(
        zip(classifierResult1[1], classifierResult1["actualValues"]),
        reverse=True
      )[0]

      oneStepConfidence2, oneStep2 = sorted(
        zip(classifierResult2[1], classifierResult2["actualValues"]),
        reverse=True
      )[0]
      print("1-step: {:16} ({:4.4}%)".format(oneStep, oneStepConfidence * 100))
      testing_time_end = time()
      print "Time testing", (testing_time_end - testing_time)
      results.append([oneStep])
      results1.append([oneStep1])

      results2.append([oneStep2])
      result_testing.append(oneStep)
      result_testing1.append(oneStep1)
      result_testing2.append(oneStep2)
      # result_testing7.append(oneStep7)

    with open('result_graphic.csv', 'w') as csv_file:
        csv_writer = csv.writer(csv_file)
        headers = ("prediction_1","event_1","prediction_3","event_3","prediction2","event2","encoding","bezline","flow","pressure","id","time")
        csv_writer.writerow(headers)

        for l in range(len(result_testing)):
            if result_testing[l] == 1:
                   res5.append(1)
            else :
                    res5.append(0)
            if sum_event[l] == 1:
                 evnt5.append(-1)
            else :
                evnt5.append(0)
            if result_testing1[l] == 1:
                res3.append(3)
            else:
                res3.append(0)
            if sum_event3[l] == 1:
                evnt3.append(-3)
            else:
                evnt3.append(0)
            if result_testing2[l] == 1:
                res2.append(2)
            else:
                res2.append(0)
            if sum_event2[l] == 1:
                evnt2.append(-2)
            else:
                evnt2.append(0)
            print "Len prediction 1", len(res5)
            print "Len event 1", len(evnt5)
            print "Len prediction 3", len(res3)
            print "Len event 3", len(evnt3)
            print "Len prediction 2", len(res2)
            print "Len event 2", len(evnt2)
            print  "Encoding ", len(encoding_csv)
            print "Len baseline", len(bezline)
            print "Len flow", len(flow_rate)
            print "Len pressure", len(pressure)
            csv_writer.writerow([res5[l],evnt5[l],res3[l],evnt3[l],res2[l],evnt2[l],encoding_csv[l],bezline[l],flow_rate[l],pressure[l]])
    testing_time_end = time()
    print "Time testingL",(testing_time_end - testing_time)
    return results

if __name__ == "__main__":
  runTesting(4000)

I’m sorry @Sergey but I don’t understand why you posted that code. I want to know about how well your model is behaving before and after you serialize / resurrect it from disk. We should be talking about the same code, not two versions of it.

Also, when I say model behavior, I am talking about how well it is doing what you want it to do. I am assuming the model prediction accuracy (anomaly score) changes drastically as soon as you resurrect the model and continue processing data?

I’m sorry . I don’t understand previous question. I hope you are not very angry at me :slight_smile: Before serialization I have about hundred inferences. But after serialization I don’t have any inferences at all. I hope I answered your question

I think the problem is only a language barrier. I will read the code better :wink:

I see your first example is where you computed 20,000 rows of input and saved both sp and tm. :+1:

I see your 2nd example is where you want to load the sp and tm instances you saved in the first script, but I do not see the code that actually loads them from the file system. Just like there is a writeToFile function on each, there is also a readFromFile function to get them back into memory. Also see the serialization guide. I hope that helps you!

I did that before this function:

with open("out_sp.tmp", "rb") as f1:
      sp2 = SpatialPooler.readFromFile(f1)
    with open("out_tm.tmp", "rb") as f2:
      tm2 = TemporalMemory.readFromFile(f2)
    classifier = SDRClassifier(
        steps  = [1],alpha=0.5050,verbosity= 0
    )
    classifier1 = SDRClassifier(
        steps=[1], alpha=0.5050, verbosity=0
    )
    classifier2 = SDRClassifier(
        steps=[1], alpha=0.5050, verbosity=0
    )

So you are saying that when you run the 2nd script, nothing prints to the screen and there are no errors? If so, I think you should investigate using a debugging tool to find out where you are snagged.

Yes . I try investigate it. But maybe you advise me , what can I check at first. I will improve my English Skills :slight_smile:

I suggest you find a python debugger like pdb, or else keep adding print statements until you find out where the process got hung up.

Sorry. I want to talk about my question once again. I want to increase n in my ScalarEncoder :

baselineEncoder = ScalarEncoder(name = “baseline”,w = 21, n = 2625,minval= 51,maxval=75,forced= True)

flowEncoder = ScalarEncoder(name=“flow”, w=15, n=1050, minval=0, maxval=6,forced = True)
encodingWidth = (eventEncoder.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
+eventEncoder1.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
+eventEncoder2.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth())

And I got the next errors :

File “experiment_load.py”, line 76, in
sp2 = SpatialPooler.readFromFile(f1)
File “/home/japanes/calc/venv/local/lib/python2.7/site-packages/nupic/serializable.py”, line 94, in readFromFile
proto = schema.read_packed(f)
File “capnp/lib/capnp.pyx”, line 2962, in capnp.lib.capnp._StructModule.read_packed (capnp/lib/capnp.cpp:61515)
File “capnp/lib/capnp.pyx”, line 3554, in capnp.lib.capnp._PackedFdMessageReader.init (capnp/lib/capnp.cpp:69069)
capnp.lib.capnp.KjException: capnp/serialize.c++:197: failed: expected totalWords <= options.traversalLimitInWords; Message is too large. To increase the limit on the receiving end, see capnp::ReaderOptions.
stack: 0x7f9e8dbb297b 0x7f9e8dbb2a1c 0x7f9e8daa4f87 0x4b669c 0x7f9e8da95d28 0x4b0c93 0x4c9f9f 0x4c2705 0x4ca088 0x4c2705 0x4c24a9 0x4f19ef 0x4ec372 0x4eaaf1 0x49e208 0x7f9ea5866830 0x49da59

Maybe I have problems with capnp.
Thanks for your help !