2014 January Core API Meeting Notes

This meeting was called to discuss moving forward with an extraction of the “core” NuPIC API C++ codebase from the current nupic repository, which was the #1 request of the NuPIC community when polled.

Goals of the Meeting

  • Define technical tasks required for defining and extracting core components within NuPIC
  • Concrete, tangible tasks for the advancement of the API and the extraction of nupic-core

In Attendance

  • Subutai Ahmad
  • Scott Purdy
  • Austin Marshall
  • Matt Taylor

Terms

  • nupic-core: refers to the C++ codebase within NuPIC, which will be extracted into its own repository and contain its own API
  • nupic: the current NuPIC codebase as it is today, containing all C++ and Python code.

Topics Discussed

  • Extraction of C++ Core into nupic-core
  • Definition of nupic-core API
  • Renaming terms
  • How to handle encoders during extraction
  • Standardized serialization
  • CMake

nupic-core Defined

CLA as a technology is applicable to many industries and problem domains. The goal of nupic-core is to support as large a set of possible environments, languages, and platforms as possible. As such, nupic-core should contain only C++ code. It will contain reference implementations of the core algorithms within NuPIC. It should have as few external dependencies as possible (apr and boost are two that we rely on today). There will be a high bar for code changes within this repository, as it will be the central engine of NuPIC, and any changes (no matter how insignificant) could have an impact on the performance of the CLA.

nupic-core will be designed so that language bindings for other languages can be created easily. nupic-core itself though will not contain these bindings.

C++ Core Extraction

Current “core”

C++ Core currently contains:

  • support routines (sparse matrix libs, os independent code (such as a timer class), etc.)
  • reference spatial pooler implementation
  • Network API (Link, Region, etc.). These are described here

Some things that currently exist in python code should eventually be translated into nupic-core as pure C++:

  • reference “temporal pooler” implementation
  • core encoders: some of them are general enough to exist within nupic-core, and will likely be reused by many client projects.
  • OPF and CLA Model (??), currently in python

Details of the Initial Extraction

  • temporal pooler would still be in python
  • build system would be more complex because of dependency issues
    • how do we update CI and define a passing build?
    • python codebase (nupic) will still need to compile nupic-core and run integration tests against it
  • nupic-core needs its own set of tests around the official API
  • Could be done in a few days
  • curent nta dir has all C++ (the core)
  • python bindings are in lang/py
  • nupic-core needs its own set of tests

See NuPIC Core Extraction Plan for details.

nupic-core Extensibility

Users can provide their own TP or SP in whatever language they wish.

nupic-core API

There will be both a “high-level” and “low-level” API for nupic-core. At the high level, users will be able to define their own Network with the Network API. But for users who don’t require hierarchy or don’t want to use the Network API, there will also be a lower-level API with direct access to CLA components like SP and TP.

This API should support use-cases of the entire community, including distributed computiation and flow-based programming models. Anyone should be able to create their own nupic-core client in C++, or create language bindings for different runtimes.

Encoders

  • Want to enable users to write their own encoders in whatever language they want
  • Regions are pluggable (can write in language of choice as long as bindings exist)
  • Many encoders are very specialized and don’t belong in nupic-core
  • Regions work in a self-discover fashion
  • Encoders must be hard-coded in an init file currently
  • Scott wants to allow importing encoders to pass into nupic-core instead of a discovery system

Serialization

  • Network can be saved (network.save()), which calls each Region’s save routine

    • Each Region is responsible for its own serialization / deserialization
    • Currently the PyRegion implementation pickles itself
  • Region implementation must define serialization

  • Longer term, we need a language-agnostic serialization format within nupic-core

  • As long as a Network is serialized using standard encoders included in nupic-core, it should be transportable. If user provides custom encoders, it will not be transportable.

CMake

  • Would make the build simpler
  • Like a meta-make
  • Creates different developer environments
  • We want to consider switching the build system to be CMake based

Name changes

Before we solidify the nupic-core API, we should use this opportunity to make name changes.

  • Temporal Pooler might be renamed
  • Region in Network API might change