Problem running newly saved / loaded SP & TM

Sergey · June 20, 2018, 2:39pm

I have two scripts. One of them learn model (Algorithms API), the second script gets inference. But my learned model doesn’t save. I used this code for saving model :

with open(“out_sp.tmp”, “wb”) as f1:
sp.writeToFile(f1)
with open(“out_tm.tmp”, “wb”) as f2:
tm.writeToFile(f2)

And for loading I used this code :

eventEncoder = ScalarEncoder(name=“event”, w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder1 = ScalarEncoder(name=“event1”, w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder7 = ScalarEncoder(name=“event7”, w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder2 = ScalarEncoder(name=“event2”, w=7, n=14, minval=0, maxval=1,forced=True)
#eventEncoder2 = ScalarEncoder(name=“event2”, w=9, n=18, minval=0, maxval=1,forced=True)
baselineEncoder = ScalarEncoder(name = “baseline”,w = 21, n = 315,minval= 49,maxval=64,forced= True)
pressEncoder = ScalarEncoder(name = “pressure”,w = 21, n = 462,minval= 44,maxval=66,forced= True)

flowEncoder = ScalarEncoder(name=“flow”, w=11, n=143, minval=0, maxval=13,forced = True)
encodingWidth = (eventEncoder.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
+eventEncoder1.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
+eventEncoder2.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth())

encodingWidth1 =(eventEncoder1.getWidth()
+ flowEncoder.getWidth())

with open(“out_sp.tmp”, “rb”) as f1:
sp2 = SpatialPooler.readFromFile(f1)
with open(“out_tm.tmp”, “rb”) as f2:
tm2 = TemporalMemory.readFromFile(f2)
classifier = SDRClassifier(
steps = [1],alpha=0.5050,verbosity= 0
)
classifier1 = SDRClassifier(
steps=[1], alpha=0.5050, verbosity=0
)
classifier2 = SDRClassifier(
steps=[1], alpha=0.5050, verbosity=0
)
I hope fo your help.
Thanks a lot.

rhyolight · June 20, 2018, 3:26pm

Are there errors? What happens when the script runs? If no file is created, there should be an error to stdout.

Sergey · June 20, 2018, 3:50pm

Files created .There were no erros. But I haven’t good prediction(inference). I can show all my script. Can it help ?
Thanks a lot for your help !

rhyolight · June 20, 2018, 4:04pm

What does your data look like?
What does your model and encoder configuration look like?

Sergey · June 20, 2018, 6:12pm

This is my encoder configuration:

eventEncoder = ScalarEncoder(name="event", w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder1 = ScalarEncoder(name="event1", w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder7 = ScalarEncoder(name="event7", w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder2 = ScalarEncoder(name="event2", w=7, n=14, minval=0, maxval=1,forced=True)
#eventEncoder2 = ScalarEncoder(name="event2", w=9, n=18, minval=0, maxval=1,forced=True)
baselineEncoder = ScalarEncoder(name = "baseline",w = 21, n = 315,minval= 49,maxval=64,forced= True)
pressEncoder = ScalarEncoder(name = "pressure",w = 21, n = 462,minval= 44,maxval=66,forced= True)

flowEncoder = ScalarEncoder(name="flow", w=11, n=143, minval=0, maxval=13,forced = True)
encodingWidth = (eventEncoder.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
                 +eventEncoder1.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
                 +eventEncoder2.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth())

This is my model :

     with open("test3.csv", "r") as fin:
        reader = csv.reader(fin)
        headers = reader.next()
        reader.next()
        reader.next()

        for count, record in enumerate(reader):
          print "Count",count
          if count >= numRecords: break

          # Convert data string into Python date object.
          #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
          # Convert data value string into float.
          event_value = float(record[2]) # device 1
          event_value_3 = float(record[4]) # device 3
          event_value_2 = float(record[3]) #device 2
          # event_value_7 = float(record[8]) # device 7
          bezline_all = float(record[10])
          pres_data    = float(record[11])
          flow_value  = float(record[0])
          # To encode, we need to provide zero-filled numpy arrays for the encoders
          # to populate.
          eventBits = numpy.zeros(eventEncoder.getWidth())
          eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
          eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
          presBits = numpy.zeros(pressEncoder.getWidth())

          baseline_Bits = numpy.zeros(baselineEncoder.getWidth())
          flowBits = numpy.zeros(flowEncoder.getWidth())


          # Now we call the encoders to create bit representations for each value.
          eventEncoder.encodeIntoArray(event_value, eventBits)
          eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
          eventEncoder2.encodeIntoArray(event_value_2,eventBits_2)
          pressEncoder.encodeIntoArray(pres_data,presBits)

          baselineEncoder.encodeIntoArray(bezline_all,baseline_Bits)
          flowEncoder.encodeIntoArray(flow_value, flowBits)


          # Concatenate all these encodings into one large encoding for Spatial
          # Pooling.
          encoding = numpy.concatenate(
            [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
          )

          # Create an array to represent active columns, all initially zero. This
          # will be populated by the compute method below. It must have the same
          # dimensions as the Spatial Pooler.
          activeColumns = numpy.zeros(spParams["columnCount"])
          # activeColumns1 = numpy.zeros(spParams["columnCount"])


          # Execute Spatial Pooling algorithm over input space.

          sp.compute(encoding,True,activeColumns)

         # sp.compute(encoding1, True, activeColumns)

          activeColumnIndices = numpy.nonzero(activeColumns)[0]

          # Execute Temporal Memory algorithm over active mini-columns.
          tm.compute(activeColumnIndices, learn=True)

          activeCells = tm.getActiveCells()

          # Get the bucket info for this input value for classification.
          bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
          bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
          bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]



          # Run classifier to translate active cells back to scalar value.
          classifierResult = classifier.compute(
            recordNum=count,
            patternNZ=activeCells,
            classification={
              "bucketIdx": bucketIdx,
              "actValue": event_value
            },
            learn=True,
            infer=False
          )
          classifierResult1 = classifier1.compute(
            recordNum=count,
            patternNZ=activeCells,
            classification={
              "bucketIdx": bucketIdx_3,
              "actValue": event_value_3
            },
            learn=True,
            infer=False
          )

          classifierResult2 = classifier2.compute(
            recordNum=count,
            patternNZ=activeCells,
            classification={
              "bucketIdx": bucketIdx_2,
              "actValue": event_value_2
            },
            learn=True,
            infer=False
          )
          learning_time_end = time()
          print "Time",(learning_time - learning_time_end)
      with open("out_sp.tmp", "wb") as f1:
        sp.writeToFile(f1)
      with open("out_tm.tmp", "wb") as f2:
        tm.writeToFile(f2)

My data look like column with digits, maybe I don’t understand your question about data. Sorry

rhyolight · June 20, 2018, 6:26pm

Let me restate your words so to better understand you. Tell me if I am wrong.

You have two types of data: binary and scalar. There are events that occur, which are either 0 or 1.
Your data is many scalar values over time, each encoded into a portion of a larger encoding that combines them. No time semantics are encoded. The model runs and is fed data and makes predictions. You save the model. The model starts back up where it left off, but it’s behavior is different.

Questions:

How exactly is the model’s behavior different before and after serialization?
Is the model fed data at the same interval?
Are events continuous or do they only happen once?
When the model saves and comes back online, does the data continue where it left off? Or will there be gaps in the data while the model is offline?
What percentage of the input space is each field taking? Is each one getting enough space? Are the activations large enough that they make a different so the SP?
Does human time keeping make a difference at all? Are there periodic patterns can mark against a calendar at all? Like daily patterns?

In any case, each “event” could be better represented as a category. If you answer my questions I’ll keep trying to help.

Sergey · June 20, 2018, 8:07pm

20 000 iterations before serialization and 5000 after
Now - Yes.Model fed at the same interval
3)Events continuous.
No, the model trained once. And comes back only to make inferences.
The model works just fine before saving, so my question relates only to the issue I have while saving/restoring my model.
No, only a sequence of values matters in this case

rhyolight · June 20, 2018, 8:31pm

What version of NuPIC are you using? Be sure you are using 1.0.5.

Sergey · June 20, 2018, 8:49pm

Requirement already satisfied: nupic in /home/japanes/calc/venv/lib/python2.7/site-packages (1.0.5)

rhyolight · June 20, 2018, 8:58pm

Please double check that you have nupic.bindings==1.0.6 by running pip list | grep nupic.

Sergey · June 20, 2018, 9:00pm

nupic 0.8.0
nupic-studio 1.1.3 /usr/local/lib/python2.7/dist-packages
nupic.bindings 0.7.0

Wow . It is very strange(

rhyolight · June 20, 2018, 9:00pm

Ok, please uninstall nupic completely, then reinstall with pip and try again?

Sergey · June 21, 2018, 1:11am

I try again but nothing changed

rhyolight · June 21, 2018, 7:13am

What does pip list | grep nupic say you have installed now?

Sergey · June 21, 2018, 7:42am

nupic 1.0.5
nupic.bindings 1.0.6

rhyolight · June 21, 2018, 2:13pm

That looks right. I still don’t quite understand the problem. Can you explain more how the model’s behavior is different after a save? Do the anomaly scores change drastically? Was the model performing well before the save? If so, how do you define “well”?

Sergey · June 21, 2018, 2:15pm

My script before serialization:

> def runLearning(numRecords):
>  
>   learning_time = time()
>   with open("test3.csv", "r") as fin:
>     reader = csv.reader(fin)
>     headers = reader.next()
>     reader.next()
>     reader.next()
> 
>     for count, record in enumerate(reader):
>       print "Count",count
>       if count >= numRecords: break
> 
>       # Convert data string into Python date object.
>       #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
>       # Convert data value string into float.
>       event_value = float(record[2]) # device 1
>       event_value_3 = float(record[4]) # device 3
>       event_value_2 = float(record[3]) #device 2
>       # event_value_7 = float(record[8]) # device 7
>       bezline_all = float(record[10])
>       pres_data    = float(record[11])
>       flow_value  = float(record[0])
>       # To encode, we need to provide zero-filled numpy arrays for the encoders
>       # to populate.
>       eventBits = numpy.zeros(eventEncoder.getWidth())
>       eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
>       eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
>       presBits = numpy.zeros(pressEncoder.getWidth())
> 
>       baseline_Bits = numpy.zeros(baselineEncoder.getWidth())
>       flowBits = numpy.zeros(flowEncoder.getWidth())
> 
> 
>       # Now we call the encoders to create bit representations for each value.
>       eventEncoder.encodeIntoArray(event_value, eventBits)
>       eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
>       eventEncoder2.encodeIntoArray(event_value_2,eventBits_2)
>       pressEncoder.encodeIntoArray(pres_data,presBits)
> 
>       baselineEncoder.encodeIntoArray(bezline_all,baseline_Bits)
>       flowEncoder.encodeIntoArray(flow_value, flowBits)
> 
> 
>       # Concatenate all these encodings into one large encoding for Spatial
>       # Pooling.
>       encoding = numpy.concatenate(
>         [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
>       )
> 
>       # Create an array to represent active columns, all initially zero. This
>       # will be populated by the compute method below. It must have the same
>       # dimensions as the Spatial Pooler.
>       activeColumns = numpy.zeros(spParams["columnCount"])
>       # activeColumns1 = numpy.zeros(spParams["columnCount"])
> 
> 
>       # Execute Spatial Pooling algorithm over input space.
> 
>       sp.compute(encoding,True,activeColumns)
> 
>      # sp.compute(encoding1, True, activeColumns)
> 
>       activeColumnIndices = numpy.nonzero(activeColumns)[0]
> 
>       # Execute Temporal Memory algorithm over active mini-columns.
>       tm.compute(activeColumnIndices, learn=True)
> 
>       activeCells = tm.getActiveCells()
> 
>       # Get the bucket info for this input value for classification.
>       bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
>       bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
>       bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]
> 
> 
> 
>       # Run classifier to translate active cells back to scalar value.
>       classifierResult = classifier.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx,
>           "actValue": event_value
>         },
>         learn=True,
>         infer=False
>       )
>       classifierResult1 = classifier1.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx_3,
>           "actValue": event_value_3
>         },
>         learn=True,
>         infer=False
>       )
> 
>       classifierResult2 = classifier2.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx_2,
>           "actValue": event_value_2
>         },
>         learn=True,
>         infer=False
>       )
>       learning_time_end = time()
>       print "Time",(learning_time - learning_time_end)
>   with open("out_sp.tmp", "wb") as f1:
>     sp.writeToFile(f1)
>   with open("out_tm.tmp", "wb") as f2:
>     tm.writeToFile(f2)
> 
> if __name__ == "__main__":
>   runLearning(20000)

Sergey · June 21, 2018, 2:17pm

My script after serialization:

def runTesting(numRecords):
  testing_time = time()
  global  result_testing,oneStep,result_testing1,oneStep1,result_testing7,oneStep7,result_testing2,oneStep2
  with open("test3.csv", "r") as fin:
    reader = csv.reader(fin)
    headers = reader.next()
    reader.next()
    reader.next()

    for count, record in enumerate(reader):
      print "Testing count",count
      if count >= numRecords: break

      # Convert data string into Python date object.
      #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
      # Convert data value string into float.
      priv = count
      event_value = result_testing[count]
      event_value_2 = result_testing2[count]
      event_value_3 = result_testing1[count]
      # event_value_7 = result_testing7[count]
      pres_data = float(record[11])
      bezline_all = float(record[10])
      flow_value  = float(record[0])

      # bezline = float(record[10])
      # encoding = float[record[9]]
      # To encode, we need to provide zero-filled numpy arrays for the encoders
      # to populate.
      eventBits = numpy.zeros(eventEncoder.getWidth())
      eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
      eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
      # eventBits_7 = numpy.zeros(eventEncoder7.getWidth())
      presBits = numpy.zeros(pressEncoder.getWidth())
      flowBits = numpy.zeros(flowEncoder.getWidth())
      baseline_Bits = numpy.zeros(baselineEncoder.getWidth())

      # Now we call the encoders to create bit representations for each value.
      eventEncoder.encodeIntoArray(event_value, eventBits)
      eventEncoder2.encodeIntoArray(event_value_2, eventBits_2)
      eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
      # eventEncoder7.encodeIntoArray(event_value_7, eventBits_7)

      baselineEncoder.encodeIntoArray(bezline_all, baseline_Bits)
      flowEncoder.encodeIntoArray(flow_value, flowBits)
      pressEncoder.encodeIntoArray(pres_data,presBits)

      # Concatenate all these encodings into one large encoding for Spatial
      # Pooling.
      encoding = numpy.concatenate(
        [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
      )


      # enc = numpy.concatenate(encoding, encoding)
      # Create an array to represent active columns, all initially zero. This
      # will be populated by the compute method below. It must have the same
      # dimensions as the Spatial Pooler.


      colum_count = sp2.getColumnDimensions()
      print "Columncout:", colum_count
      activeColumns = numpy.zeros(colum_count)


      # Execute Spatial Pooling algorithm over input space.
      sp2.compute(encoding, False, activeColumns)


      activeColumnIndices = numpy.nonzero(activeColumns)[0]


      # Execute Temporal Memory algorithm over active mini-columns.
      tm2.compute(activeColumnIndices, learn=False)

      activeCells = tm2.getActiveCells()

      # Get the bucket info for this input value for classification.
      bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
      bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
      bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]
      # bucketIdx_7 = eventEncoder7.getBucketIndices(event_value_7)[0]


      # Run classifier to translate active cells back to scalar value.
      classifierResult = classifier.compute(
        recordNum=count+20000,
        patternNZ=activeCells,
        classification={
          "bucketIdx": bucketIdx,
          "actValue": event_value
        },
        learn=False,
        infer=True
      )
      classifierResult1 = classifier1.compute(
        recordNum=count+ 20000,
        patternNZ= activeCells,
        classification={
          "bucketIdx": bucketIdx_3,
          "actValue": event_value_3
        },
        learn=False,
        infer=True
      )

      classifierResult2 = classifier2.compute(
        recordNum=count+ 20000,
        patternNZ= activeCells,
        classification={
          "bucketIdx": bucketIdx_2,
          "actValue": event_value_2
        },
        learn=False,
        infer=True
      )
      # Print the best prediction for 1 step out.
      oneStepConfidence, oneStep = sorted(
        zip(classifierResult[1], classifierResult["actualValues"]),
        reverse=True
      )[0]
      oneStepConfidence1, oneStep1 = sorted(
        zip(classifierResult1[1], classifierResult1["actualValues"]),
        reverse=True
      )[0]

      oneStepConfidence2, oneStep2 = sorted(
        zip(classifierResult2[1], classifierResult2["actualValues"]),
        reverse=True
      )[0]
      print("1-step: {:16} ({:4.4}%)".format(oneStep, oneStepConfidence * 100))
      testing_time_end = time()
      print "Time testing", (testing_time_end - testing_time)
      results.append([oneStep])
      results1.append([oneStep1])

      results2.append([oneStep2])
      result_testing.append(oneStep)
      result_testing1.append(oneStep1)
      result_testing2.append(oneStep2)
      # result_testing7.append(oneStep7)

    with open('result_graphic.csv', 'w') as csv_file:
        csv_writer = csv.writer(csv_file)
        headers = ("prediction_1","event_1","prediction_3","event_3","prediction2","event2","encoding","bezline","flow","pressure","id","time")
        csv_writer.writerow(headers)

        for l in range(len(result_testing)):
            if result_testing[l] == 1:
                   res5.append(1)
            else :
                    res5.append(0)
            if sum_event[l] == 1:
                 evnt5.append(-1)
            else :
                evnt5.append(0)
            if result_testing1[l] == 1:
                res3.append(3)
            else:
                res3.append(0)
            if sum_event3[l] == 1:
                evnt3.append(-3)
            else:
                evnt3.append(0)
            if result_testing2[l] == 1:
                res2.append(2)
            else:
                res2.append(0)
            if sum_event2[l] == 1:
                evnt2.append(-2)
            else:
                evnt2.append(0)
            print "Len prediction 1", len(res5)
            print "Len event 1", len(evnt5)
            print "Len prediction 3", len(res3)
            print "Len event 3", len(evnt3)
            print "Len prediction 2", len(res2)
            print "Len event 2", len(evnt2)
            print  "Encoding ", len(encoding_csv)
            print "Len baseline", len(bezline)
            print "Len flow", len(flow_rate)
            print "Len pressure", len(pressure)
            csv_writer.writerow([res5[l],evnt5[l],res3[l],evnt3[l],res2[l],evnt2[l],encoding_csv[l],bezline[l],flow_rate[l],pressure[l]])
    testing_time_end = time()
    print "Time testingL",(testing_time_end - testing_time)
    return results

if __name__ == "__main__":
  runTesting(4000)

rhyolight · June 21, 2018, 3:06pm

I’m sorry @Sergey but I don’t understand why you posted that code. I want to know about how well your model is behaving before and after you serialize / resurrect it from disk. We should be talking about the same code, not two versions of it.

Also, when I say model behavior, I am talking about how well it is doing what you want it to do. I am assuming the model prediction accuracy (anomaly score) changes drastically as soon as you resurrect the model and continue processing data?

Sergey · June 21, 2018, 5:55pm

I’m sorry . I don’t understand previous question. I hope you are not very angry at me Before serialization I have about hundred inferences. But after serialization I don’t have any inferences at all. I hope I answered your question

Topic		Replies	Views
Raw TM Test (no SP) NuPIC encoders , temporal-memory , category-encoding	30	1371	June 10, 2018
Crashing on serialization SP model to file 'SpatialPoolerProto' error NuPIC	2	709	March 17, 2019
Feeding TM output back to SP manually NuPIC	5	766	February 2, 2017
Network API serialization NuPIC question	10	1254	May 15, 2018
AttributeError with model.save related to TMCPPShim object NuPIC bug , python	6	1025	June 19, 2017

Problem running newly saved / loaded SP & TM

Related topics