Requirements from Serialization

Currently:

  • nupic.core and nupic provide CapNProto serialization for most classes/regions. There is also YAML (where used?)

@chhenning made (temporarily) removal of the feature in his fork, because it caused code complications.

So, do we need serialization? What for? Which framework to use?

yes, but for me it’s too complicated.
For me:

  1. User runs a HTM model, serializes and continues the next day.
  2. User trains HTM; reads the model and re-runs on multiple (test) cases.

EDIT: Poll

  • cross platform (Linux, Win, OSX)
  • cross languages (C++, Py,…)
  • little intrusiveness (capnp introduces at lot of code and compiler warnings)
  • low complexity (e.g. not compiling *.capnp files)
  • header only (cpp only)
  • c++11 or later
  • text (zipped)
  • binary
  • fast dump (aka memory mapped file; speed)
  • I don’t need Serialization
  • any working (basic) serialization is OK
  • commonly used (high user base)
0 voters

So no cross platform?

So no cross platform?

Cross-platform would be nice to have, probably not necessary. For me, I don’t know what others do.
In future-proff vision it’s a solid requirement, as you can train models on a cloud/farm, and then distribute it.

Why does it matter? Do you have a framework that is platform dependent?

What do you mean by framework?
The easiest and fastest would be a shared memory file. Basically just dump the current process memory and reload when necessary. Obviously that’s not cross platform.

Moving this here.

Also,

What do you mean by framework?

Cereal, capnp, yaml …serialization libraries.

Should we vote on the following features:

  • cross platform
  • cross languages
  • intrusiveness (capnp introduces at lot of code and compiler warnings)
  • complexity (e.g. compiling *.capnp files)
  • header only (cpp only)
  • c++11 or later
  • text (zipped) or binary
  • fast dump (aka memory mapped file)

Yaml and CapnProto are both implemented everywhere in the core, side by side. Yaml being text based (think superset of JSON). CapnProto is a binary serialization. Both work as far as I know. Yaml being a little slower. The Zip packaging is there as well I think. I know we link with it.

Both are cross platform and cross language. Nether is header only but I don’t think it worth the trouble to rip out both of what we have and implement a new one. Besides, a header only implementation would only work on C++.

The idea of a memory mapped file is one that I have been thinking about as well. All data is automatically saved, even if the program crashes. No startup time and no shutdown time. But perhaps slower execution to perform the disk io in the background.

This would require doing our classes quite differently…For example, we cannot allocate from the heap for each class. (cannot use new) We would have to allocate a fixed block of space and map it to disk. Then we have to write our own memory allocation for classes. This is doable and there are probably some tools out there for doing this but it would take some re-thinking of the algorithms.

However, I don’t think this level of ‘Saving’ is needed at this point. Perhaps when the algorithms mature to the point were we can actually write a program that implements a true AGI, then it would be important. We are a long way from that point.

2 Likes

added a beautiful Poll, go for it :wink:

@David_Keeney What is a AGI?

Artificial Generic Intel… human like and above.

Artificial General Intelligence (https://en.wikipedia.org/wiki/Artificial_general_intelligence)

Came accross google’s flatbuffers. I think it’s worth looking into.

http://google.github.io/flatbuffers/

So we ended up implementing simpler, c++ only, cross-platform serialization using bitstreams (and ditching capnp). It is quite easy to implement serialization for your new class with this framework. You can see https://github.com/htm-community/nupic.cpp/blob/master/src/nupic/types/Serializable.hpp as the Serializable interface that is required.