Serialization for htm-community/nupic.cpp (the Community version of nupic.core)

Guys, I would like your opinion.

Background: A few of us hackers are in the process of re-working the code in htm-community/nupic.cpp. The plan is to separate the C++ library from the Python so it can be the bases of C++ only implementations and provide a cleaner interface for language interfaces. The third party utility libraries are being replaced with std:: libraries in C++17 and no (or almost no) dependency libraries. The directive is to retain the documented API and not change any HTM algorithm logic.

The Python interface would be implemented in a separate parallel repository using pybind rather than SWIG, supporting Python 2.7 and Python 3.6+. Additional programming languages (such as CSharp) could be added as parallel repositories as needed without changing the core C++ library.

A few years ago capnproto was introduced into nupic for a way to do faster serialization. It did work but it was very invasive and difficult to maintain. In an effort to simplify all of the modules, we are removing capnproto in favor of binary stream serialization (writing directly to a stream). This means the saved files will not be cross-platform compatible but they should write/read as fast as capnproto without the complications.

My current plan is to have the Serializable class as a base class for all classes that use the streaming serialization. It contains the following virtual functions:

  • void saveToFile(path) – contains default code to create ostream and calls save()
  • void loadFromFile(path) – contains default code to create istreadm and calls load()
  • void save( ostream) – subclass must implement
  • void load (istream) – subclass must implement
  • void write(ostream) – calls save(ostream) as backward compatibility with capnproto
  • void read(istream) – calls load(istream) as backward compatibility with capnproto

The problem is with the Network Class. The API defines:

  • void save(path) – creates Bundle directory writes Network & Link Classes to one file and each RegionImpl to its own file in the Bundle directory.
  • Network(path) – a constructor that loads from the Bundle directory and finds all of the parts.

It also has a write(ostream) and read(istream) for capnproto which streams everything to one file (or memory stream). The problem is that this file can get very large.

What I was thinking was to keep all four functions and perhaps to be consistent with serializable, add read(istream) and load(ostream) that do the same thing as read(istream) and write(ostream). Or should we brake the API in this case and provide only the streaming save(ostream) and load(istream) for the Network object with overloading for << and >> operators.

What are your thoughts?

P.S. Anyone interested in helping in this project are welcome to participate.

2 Likes

I would be interested in helping out with the refactoring/reimplementation of the C++ library. Can you point me to the appropriate sites/resources where you are coordinating this effort?

Hi Eric,
Great to hear! :slight_smile: Checkout nupic.cpp repo in htm-community on GitHub.

There’s lots going on and planned both in terms of programming and design.

Cheers, mark

Hi,

The core functions should not include serialization. It was a blocking point before. Serialization is a nice to have feature that made the whole code unusable.

The second point is that C++ is also a nice to have feature. But given the current (catastrophe) situation, I would second you with that.

So I suggest to create a brand new branch with a brand new architecture and a brand new software engineering team.

The main focus should be on algorithmic and functions, not on data structures that are closely linked to the everyday life abstractions such as Cell, Region…

I would be interested in having a closer look, but only in the case, where a real software engineering knowledge is transpiring from the work, avoiding the superficial complexity that makes the current code base obsolete.

Best

Welcome.

Currently there are three of us working on htm-community/nupic.cpp

@breznak , @Christian Henning, and myself @David Keeney

breznak and I are trying to move code into github.com/htm-community/nupic.cpp, feature by feature, from prototypes that Christian and I developed (https://github.com/dkeeney/nupic.core/tree/base, and https://github.com/chhenning/nupic). At each step we are testing and reviewing, and some ‘adjustments’ as we go.

You may join us in conversations: https://github.com/htm-community/nupic.cpp/issues

1 Like

Hi,
Many thanks for the links.

I would start by the beginning.

Here the SDR should be intensively used.

I have never seen any effort in NUPIC to produce an SRD representation on the stack.

People implement using containers such as vector, map… on the heap.

Now, for an SDR of let’s say 10000 elements, they expect speed by using pointers.

Would it help?

Best,

Are you volunteering to do this?
Remember that this is all volunteer work and these volunteers use the tools they know and they want to use.

2 Likes

Thank you for the links. I will take a look at the current status and see if there’s anything I can help you with.