Serialization sizes :(

JustAnEngineerO · January 25, 2018, 12:42am

We are using capnproto to serialize the HTMPrediction model with the method below:

def writeToCheckpoint(self, checkpointDir):
    """Serializes model using capnproto and writes data to ``checkpointDir``"""
    proto = self.getSchema().new_message()

    self.write(proto)

    checkpointPath = self._getModelCheckpointFilePath(checkpointDir)

    # Clean up old saved state, if any
    if os.path.exists(checkpointDir):
      if not os.path.isdir(checkpointDir):
        raise Exception(("Existing filesystem entry <%s> is not a model"
                         " checkpoint -- refusing to delete (not a directory)") \
                          % checkpointDir)
      if not os.path.isfile(checkpointPath):
        raise Exception(("Existing filesystem entry <%s> is not a model"
                         " checkpoint -- refusing to delete"\
                         " (%s missing or not a file)") % \
                          (checkpointDir, checkpointPath))

      shutil.rmtree(checkpointDir)

    # Create a new directory for saving state
    self.__makeDirectoryFromAbsolutePath(checkpointDir)

    with open(checkpointPath, 'wb') as f:
      proto.write(f)

When we look at the size of the serialized files, the total is over 9 MB, if we use to_bytes() instead of to_bytes_packed() the size is 27 MB.

Can anyone explain why the size of the serialized buffer is so large? I just want to point this out in case something is wrong? Can anyone confirm this is working as it’s supposed to?

rhyolight · January 25, 2018, 12:45am

What are the model parameters you used to create this model? And how many rows of input data has it seen?

JustAnEngineerO · January 25, 2018, 12:47am

We are using the same model params from the hot gym example. We also used swarming and tried a lot of different model params, they don’t seem to change the result of the size.

But to clarify, same results with hot gym params. If this isn’t working how it’s supposed to please let me know, I will start looking into that code too.

rhyolight · January 25, 2018, 12:50am

It still depends on how much data the model has seen. It doesn’t surprise me that the model is that large if it has been running a long time and seen a lot of data. If that is unacceptable, you might be able to trim up some random segments before serialization and still retain the behavior you want.

JustAnEngineerO · January 25, 2018, 12:52am

The size measured is after it has only computed 300 samples.

breznak · January 25, 2018, 1:17am

I also experienced that models are too large.
You can make a test, compress the model. If it gets much smaller, it’s nupic fault, if it’s ± the same, you can’t do anything and the data is just that large.

JustAnEngineerO · January 25, 2018, 4:05am

@breznak

I was curious so I did compress the raw bytes with zlib. A 8.99 MB model was compressed to 1.91 MB

rhyolight · January 25, 2018, 4:32am

What do you think @scott?

breznak · January 25, 2018, 8:15am

To be fair, if it’s a new model, there’s gonna be lot of zeros. Try with a learned (300, 1000 transactions)…

rhyolight · January 25, 2018, 5:24pm

I just talked to Scott, and he thinks that 10MB in size is typical.

Topic		Replies	Views
Serializing HTMPredictionModel for anomaly detection with cap'n proto NuPIC	10	1232	December 31, 2017
Serializing an anomaly detection model NuPIC	6	732	May 14, 2018
Can I make OPF model serialization faster? NuPIC	4	426	September 6, 2018
"capnp.lib.capnp.KjException: src/capnp/dynamic.c++:1687: failed: Value type mismatch." when serializing a model that never ran NuPIC	1	521	January 23, 2018
Save/load the model in cloud storage NuPIC question , serialization	13	2009	August 4, 2016

Serialization sizes :(

Related topics