Problem running newly saved / loaded SP & TM

serialization

#1

I have two scripts. One of them learn model (Algorithms API), the second script gets inference. But my learned model doesn’t save. I used this code for saving model :

with open(“out_sp.tmp”, “wb”) as f1:
sp.writeToFile(f1)
with open(“out_tm.tmp”, “wb”) as f2:
tm.writeToFile(f2)

And for loading I used this code :

eventEncoder = ScalarEncoder(name=“event”, w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder1 = ScalarEncoder(name=“event1”, w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder7 = ScalarEncoder(name=“event7”, w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder2 = ScalarEncoder(name=“event2”, w=7, n=14, minval=0, maxval=1,forced=True)
#eventEncoder2 = ScalarEncoder(name=“event2”, w=9, n=18, minval=0, maxval=1,forced=True)
baselineEncoder = ScalarEncoder(name = “baseline”,w = 21, n = 315,minval= 49,maxval=64,forced= True)
pressEncoder = ScalarEncoder(name = “pressure”,w = 21, n = 462,minval= 44,maxval=66,forced= True)

flowEncoder = ScalarEncoder(name=“flow”, w=11, n=143, minval=0, maxval=13,forced = True)
encodingWidth = (eventEncoder.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
+eventEncoder1.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
+eventEncoder2.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth())

encodingWidth1 =(eventEncoder1.getWidth()
+ flowEncoder.getWidth())

with open(“out_sp.tmp”, “rb”) as f1:
sp2 = SpatialPooler.readFromFile(f1)
with open(“out_tm.tmp”, “rb”) as f2:
tm2 = TemporalMemory.readFromFile(f2)
classifier = SDRClassifier(
steps = [1],alpha=0.5050,verbosity= 0
)
classifier1 = SDRClassifier(
steps=[1], alpha=0.5050, verbosity=0
)
classifier2 = SDRClassifier(
steps=[1], alpha=0.5050, verbosity=0
)
I hope fo your help.
Thanks a lot.


#2

Are there errors? What happens when the script runs? If no file is created, there should be an error to stdout.


#3

Files created .There were no erros. But I haven’t good prediction(inference). I can show all my script. Can it help ?
Thanks a lot for your help !


#4
  • What does your data look like?
  • What does your model and encoder configuration look like?

#5

This is my encoder configuration:

eventEncoder = ScalarEncoder(name="event", w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder1 = ScalarEncoder(name="event1", w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder7 = ScalarEncoder(name="event7", w=7, n=14, minval=0, maxval=1,forced=True)
eventEncoder2 = ScalarEncoder(name="event2", w=7, n=14, minval=0, maxval=1,forced=True)
#eventEncoder2 = ScalarEncoder(name="event2", w=9, n=18, minval=0, maxval=1,forced=True)
baselineEncoder = ScalarEncoder(name = "baseline",w = 21, n = 315,minval= 49,maxval=64,forced= True)
pressEncoder = ScalarEncoder(name = "pressure",w = 21, n = 462,minval= 44,maxval=66,forced= True)

flowEncoder = ScalarEncoder(name="flow", w=11, n=143, minval=0, maxval=13,forced = True)
encodingWidth = (eventEncoder.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
                 +eventEncoder1.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth()
                 +eventEncoder2.getWidth()+flowEncoder.getWidth()+baselineEncoder.getWidth())

This is my model :

     with open("test3.csv", "r") as fin:
        reader = csv.reader(fin)
        headers = reader.next()
        reader.next()
        reader.next()

        for count, record in enumerate(reader):
          print "Count",count
          if count >= numRecords: break

          # Convert data string into Python date object.
          #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
          # Convert data value string into float.
          event_value = float(record[2]) # device 1
          event_value_3 = float(record[4]) # device 3
          event_value_2 = float(record[3]) #device 2
          # event_value_7 = float(record[8]) # device 7
          bezline_all = float(record[10])
          pres_data    = float(record[11])
          flow_value  = float(record[0])
          # To encode, we need to provide zero-filled numpy arrays for the encoders
          # to populate.
          eventBits = numpy.zeros(eventEncoder.getWidth())
          eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
          eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
          presBits = numpy.zeros(pressEncoder.getWidth())

          baseline_Bits = numpy.zeros(baselineEncoder.getWidth())
          flowBits = numpy.zeros(flowEncoder.getWidth())


          # Now we call the encoders to create bit representations for each value.
          eventEncoder.encodeIntoArray(event_value, eventBits)
          eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
          eventEncoder2.encodeIntoArray(event_value_2,eventBits_2)
          pressEncoder.encodeIntoArray(pres_data,presBits)

          baselineEncoder.encodeIntoArray(bezline_all,baseline_Bits)
          flowEncoder.encodeIntoArray(flow_value, flowBits)


          # Concatenate all these encodings into one large encoding for Spatial
          # Pooling.
          encoding = numpy.concatenate(
            [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
          )

          # Create an array to represent active columns, all initially zero. This
          # will be populated by the compute method below. It must have the same
          # dimensions as the Spatial Pooler.
          activeColumns = numpy.zeros(spParams["columnCount"])
          # activeColumns1 = numpy.zeros(spParams["columnCount"])


          # Execute Spatial Pooling algorithm over input space.

          sp.compute(encoding,True,activeColumns)

         # sp.compute(encoding1, True, activeColumns)

          activeColumnIndices = numpy.nonzero(activeColumns)[0]

          # Execute Temporal Memory algorithm over active mini-columns.
          tm.compute(activeColumnIndices, learn=True)

          activeCells = tm.getActiveCells()

          # Get the bucket info for this input value for classification.
          bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
          bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
          bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]



          # Run classifier to translate active cells back to scalar value.
          classifierResult = classifier.compute(
            recordNum=count,
            patternNZ=activeCells,
            classification={
              "bucketIdx": bucketIdx,
              "actValue": event_value
            },
            learn=True,
            infer=False
          )
          classifierResult1 = classifier1.compute(
            recordNum=count,
            patternNZ=activeCells,
            classification={
              "bucketIdx": bucketIdx_3,
              "actValue": event_value_3
            },
            learn=True,
            infer=False
          )

          classifierResult2 = classifier2.compute(
            recordNum=count,
            patternNZ=activeCells,
            classification={
              "bucketIdx": bucketIdx_2,
              "actValue": event_value_2
            },
            learn=True,
            infer=False
          )
          learning_time_end = time()
          print "Time",(learning_time - learning_time_end)
      with open("out_sp.tmp", "wb") as f1:
        sp.writeToFile(f1)
      with open("out_tm.tmp", "wb") as f2:
        tm.writeToFile(f2)

My data look like column with digits, maybe I don’t understand your question about data. Sorry


#6

Let me restate your words so to better understand you. Tell me if I am wrong.

You have two types of data: binary and scalar. There are events that occur, which are either 0 or 1.
Your data is many scalar values over time, each encoded into a portion of a larger encoding that combines them. No time semantics are encoded. The model runs and is fed data and makes predictions. You save the model. The model starts back up where it left off, but it’s behavior is different.

Questions:

  • How exactly is the model’s behavior different before and after serialization?
  • Is the model fed data at the same interval?
  • Are events continuous or do they only happen once?
  • When the model saves and comes back online, does the data continue where it left off? Or will there be gaps in the data while the model is offline?
  • What percentage of the input space is each field taking? Is each one getting enough space? Are the activations large enough that they make a different so the SP?
  • Does human time keeping make a difference at all? Are there periodic patterns can mark against a calendar at all? Like daily patterns?

In any case, each “event” could be better represented as a category. If you answer my questions I’ll keep trying to help. :slight_smile:


#7
  1. 20 000 iterations before serialization and 5000 after
  2. Now - Yes.Model fed at the same interval
    3)Events continuous.
  3. No, the model trained once. And comes back only to make inferences.
  4. The model works just fine before saving, so my question relates only to the issue I have while saving/restoring my model.
  5. No, only a sequence of values matters in this case

#8

What version of NuPIC are you using? Be sure you are using 1.0.5.


#9

Requirement already satisfied: nupic in /home/japanes/calc/venv/lib/python2.7/site-packages (1.0.5)


#10

Please double check that you have nupic.bindings==1.0.6 by running pip list | grep nupic.


#11

nupic 0.8.0
nupic-studio 1.1.3 /usr/local/lib/python2.7/dist-packages
nupic.bindings 0.7.0

Wow . It is very strange(


#12

Ok, please uninstall nupic completely, then reinstall with pip and try again?


#13

I try again but nothing changed


#14

What does pip list | grep nupic say you have installed now?


#15

nupic 1.0.5
nupic.bindings 1.0.6


#16

That looks right. I still don’t quite understand the problem. Can you explain more how the model’s behavior is different after a save? Do the anomaly scores change drastically? Was the model performing well before the save? If so, how do you define “well”?


#17

My script before serialization:

> def runLearning(numRecords):
>  
>   learning_time = time()
>   with open("test3.csv", "r") as fin:
>     reader = csv.reader(fin)
>     headers = reader.next()
>     reader.next()
>     reader.next()
> 
>     for count, record in enumerate(reader):
>       print "Count",count
>       if count >= numRecords: break
> 
>       # Convert data string into Python date object.
>       #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
>       # Convert data value string into float.
>       event_value = float(record[2]) # device 1
>       event_value_3 = float(record[4]) # device 3
>       event_value_2 = float(record[3]) #device 2
>       # event_value_7 = float(record[8]) # device 7
>       bezline_all = float(record[10])
>       pres_data    = float(record[11])
>       flow_value  = float(record[0])
>       # To encode, we need to provide zero-filled numpy arrays for the encoders
>       # to populate.
>       eventBits = numpy.zeros(eventEncoder.getWidth())
>       eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
>       eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
>       presBits = numpy.zeros(pressEncoder.getWidth())
> 
>       baseline_Bits = numpy.zeros(baselineEncoder.getWidth())
>       flowBits = numpy.zeros(flowEncoder.getWidth())
> 
> 
>       # Now we call the encoders to create bit representations for each value.
>       eventEncoder.encodeIntoArray(event_value, eventBits)
>       eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
>       eventEncoder2.encodeIntoArray(event_value_2,eventBits_2)
>       pressEncoder.encodeIntoArray(pres_data,presBits)
> 
>       baselineEncoder.encodeIntoArray(bezline_all,baseline_Bits)
>       flowEncoder.encodeIntoArray(flow_value, flowBits)
> 
> 
>       # Concatenate all these encodings into one large encoding for Spatial
>       # Pooling.
>       encoding = numpy.concatenate(
>         [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
>       )
> 
>       # Create an array to represent active columns, all initially zero. This
>       # will be populated by the compute method below. It must have the same
>       # dimensions as the Spatial Pooler.
>       activeColumns = numpy.zeros(spParams["columnCount"])
>       # activeColumns1 = numpy.zeros(spParams["columnCount"])
> 
> 
>       # Execute Spatial Pooling algorithm over input space.
> 
>       sp.compute(encoding,True,activeColumns)
> 
>      # sp.compute(encoding1, True, activeColumns)
> 
>       activeColumnIndices = numpy.nonzero(activeColumns)[0]
> 
>       # Execute Temporal Memory algorithm over active mini-columns.
>       tm.compute(activeColumnIndices, learn=True)
> 
>       activeCells = tm.getActiveCells()
> 
>       # Get the bucket info for this input value for classification.
>       bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
>       bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
>       bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]
> 
> 
> 
>       # Run classifier to translate active cells back to scalar value.
>       classifierResult = classifier.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx,
>           "actValue": event_value
>         },
>         learn=True,
>         infer=False
>       )
>       classifierResult1 = classifier1.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx_3,
>           "actValue": event_value_3
>         },
>         learn=True,
>         infer=False
>       )
> 
>       classifierResult2 = classifier2.compute(
>         recordNum=count,
>         patternNZ=activeCells,
>         classification={
>           "bucketIdx": bucketIdx_2,
>           "actValue": event_value_2
>         },
>         learn=True,
>         infer=False
>       )
>       learning_time_end = time()
>       print "Time",(learning_time - learning_time_end)
>   with open("out_sp.tmp", "wb") as f1:
>     sp.writeToFile(f1)
>   with open("out_tm.tmp", "wb") as f2:
>     tm.writeToFile(f2)
> 
> if __name__ == "__main__":
>   runLearning(20000)

#18

My script after serialization:

def runTesting(numRecords):
  testing_time = time()
  global  result_testing,oneStep,result_testing1,oneStep1,result_testing7,oneStep7,result_testing2,oneStep2
  with open("test3.csv", "r") as fin:
    reader = csv.reader(fin)
    headers = reader.next()
    reader.next()
    reader.next()

    for count, record in enumerate(reader):
      print "Testing count",count
      if count >= numRecords: break

      # Convert data string into Python date object.
      #dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M")
      # Convert data value string into float.
      priv = count
      event_value = result_testing[count]
      event_value_2 = result_testing2[count]
      event_value_3 = result_testing1[count]
      # event_value_7 = result_testing7[count]
      pres_data = float(record[11])
      bezline_all = float(record[10])
      flow_value  = float(record[0])

      # bezline = float(record[10])
      # encoding = float[record[9]]
      # To encode, we need to provide zero-filled numpy arrays for the encoders
      # to populate.
      eventBits = numpy.zeros(eventEncoder.getWidth())
      eventBits_2 = numpy.zeros(eventEncoder2.getWidth())
      eventBits_3 = numpy.zeros(eventEncoder1.getWidth())
      # eventBits_7 = numpy.zeros(eventEncoder7.getWidth())
      presBits = numpy.zeros(pressEncoder.getWidth())
      flowBits = numpy.zeros(flowEncoder.getWidth())
      baseline_Bits = numpy.zeros(baselineEncoder.getWidth())

      # Now we call the encoders to create bit representations for each value.
      eventEncoder.encodeIntoArray(event_value, eventBits)
      eventEncoder2.encodeIntoArray(event_value_2, eventBits_2)
      eventEncoder1.encodeIntoArray(event_value_3,eventBits_3)
      # eventEncoder7.encodeIntoArray(event_value_7, eventBits_7)

      baselineEncoder.encodeIntoArray(bezline_all, baseline_Bits)
      flowEncoder.encodeIntoArray(flow_value, flowBits)
      pressEncoder.encodeIntoArray(pres_data,presBits)

      # Concatenate all these encodings into one large encoding for Spatial
      # Pooling.
      encoding = numpy.concatenate(
        [eventBits,flowBits,baseline_Bits,eventBits_2,flowBits,baseline_Bits,eventBits_3,flowBits,baseline_Bits]
      )


      # enc = numpy.concatenate(encoding, encoding)
      # Create an array to represent active columns, all initially zero. This
      # will be populated by the compute method below. It must have the same
      # dimensions as the Spatial Pooler.


      colum_count = sp2.getColumnDimensions()
      print "Columncout:", colum_count
      activeColumns = numpy.zeros(colum_count)


      # Execute Spatial Pooling algorithm over input space.
      sp2.compute(encoding, False, activeColumns)


      activeColumnIndices = numpy.nonzero(activeColumns)[0]


      # Execute Temporal Memory algorithm over active mini-columns.
      tm2.compute(activeColumnIndices, learn=False)

      activeCells = tm2.getActiveCells()

      # Get the bucket info for this input value for classification.
      bucketIdx = eventEncoder.getBucketIndices(event_value)[0]
      bucketIdx_2 = eventEncoder2.getBucketIndices(event_value_2)[0]
      bucketIdx_3 = eventEncoder1.getBucketIndices(event_value_3)[0]
      # bucketIdx_7 = eventEncoder7.getBucketIndices(event_value_7)[0]


      # Run classifier to translate active cells back to scalar value.
      classifierResult = classifier.compute(
        recordNum=count+20000,
        patternNZ=activeCells,
        classification={
          "bucketIdx": bucketIdx,
          "actValue": event_value
        },
        learn=False,
        infer=True
      )
      classifierResult1 = classifier1.compute(
        recordNum=count+ 20000,
        patternNZ= activeCells,
        classification={
          "bucketIdx": bucketIdx_3,
          "actValue": event_value_3
        },
        learn=False,
        infer=True
      )

      classifierResult2 = classifier2.compute(
        recordNum=count+ 20000,
        patternNZ= activeCells,
        classification={
          "bucketIdx": bucketIdx_2,
          "actValue": event_value_2
        },
        learn=False,
        infer=True
      )
      # Print the best prediction for 1 step out.
      oneStepConfidence, oneStep = sorted(
        zip(classifierResult[1], classifierResult["actualValues"]),
        reverse=True
      )[0]
      oneStepConfidence1, oneStep1 = sorted(
        zip(classifierResult1[1], classifierResult1["actualValues"]),
        reverse=True
      )[0]

      oneStepConfidence2, oneStep2 = sorted(
        zip(classifierResult2[1], classifierResult2["actualValues"]),
        reverse=True
      )[0]
      print("1-step: {:16} ({:4.4}%)".format(oneStep, oneStepConfidence * 100))
      testing_time_end = time()
      print "Time testing", (testing_time_end - testing_time)
      results.append([oneStep])
      results1.append([oneStep1])

      results2.append([oneStep2])
      result_testing.append(oneStep)
      result_testing1.append(oneStep1)
      result_testing2.append(oneStep2)
      # result_testing7.append(oneStep7)

    with open('result_graphic.csv', 'w') as csv_file:
        csv_writer = csv.writer(csv_file)
        headers = ("prediction_1","event_1","prediction_3","event_3","prediction2","event2","encoding","bezline","flow","pressure","id","time")
        csv_writer.writerow(headers)

        for l in range(len(result_testing)):
            if result_testing[l] == 1:
                   res5.append(1)
            else :
                    res5.append(0)
            if sum_event[l] == 1:
                 evnt5.append(-1)
            else :
                evnt5.append(0)
            if result_testing1[l] == 1:
                res3.append(3)
            else:
                res3.append(0)
            if sum_event3[l] == 1:
                evnt3.append(-3)
            else:
                evnt3.append(0)
            if result_testing2[l] == 1:
                res2.append(2)
            else:
                res2.append(0)
            if sum_event2[l] == 1:
                evnt2.append(-2)
            else:
                evnt2.append(0)
            print "Len prediction 1", len(res5)
            print "Len event 1", len(evnt5)
            print "Len prediction 3", len(res3)
            print "Len event 3", len(evnt3)
            print "Len prediction 2", len(res2)
            print "Len event 2", len(evnt2)
            print  "Encoding ", len(encoding_csv)
            print "Len baseline", len(bezline)
            print "Len flow", len(flow_rate)
            print "Len pressure", len(pressure)
            csv_writer.writerow([res5[l],evnt5[l],res3[l],evnt3[l],res2[l],evnt2[l],encoding_csv[l],bezline[l],flow_rate[l],pressure[l]])
    testing_time_end = time()
    print "Time testingL",(testing_time_end - testing_time)
    return results

if __name__ == "__main__":
  runTesting(4000)

#19

I’m sorry @Sergey but I don’t understand why you posted that code. I want to know about how well your model is behaving before and after you serialize / resurrect it from disk. We should be talking about the same code, not two versions of it.

Also, when I say model behavior, I am talking about how well it is doing what you want it to do. I am assuming the model prediction accuracy (anomaly score) changes drastically as soon as you resurrect the model and continue processing data?


#20

I’m sorry . I don’t understand previous question. I hope you are not very angry at me :slight_smile: Before serialization I have about hundred inferences. But after serialization I don’t have any inferences at all. I hope I answered your question