Re-Organisation of the C++, PY Community repos

I prefer to keep the core and all of the language interfaces together so we don’t have version issues. But if we must break it up into separate repositories then perhaps it would be something like

  • core (the algorithms implemented in C++) as a static library.
  • Execution Harness (Network and Region classes)
  • Python interface
  • C++ interface
  • C++/clr interface
  • C# interface

The problem is that each of the interfaces need include files from the core (not just a library). So if they are in separate repositories the repositories MUST be the same version and placed in parallel directories or nothing will build.

The original nupic repository (all of the Python code) is a separate thing. The Python 3 conversion will change those so we need a place to put them but that is not the current topic. The Java port is also a separate thing.

oops, the Python interface is actually two… 3.x and 2.7

@rhyolight how do you reference bigger groups here? #htm-hackers #committer-lounge , this is a big step, so it would be good to know feeling of most people. Or at least come up with some ITERATIVE process, so we can learn and adapt along the way. Maybe simply just separation of nupic.cpp (core) and nupic.bindings (from core) …??

@breznak have you read the discussions we have been having over that last few weeks?

Ok, I think we are getting to understand each other…

core (the algorithms implemented in C++) as a static library.

  • tests and include headers

Execution Harness (Network and Region classes)

sounds good. The serialization will be kept here? (I don’t think it’s a good idea to ditch it completely)

Python interface
C++ interface
C++/clr interface
C# interface

All the interfaces. So you propose it’s better to have separate interfaces “bindings.py”, “bindings.c#”, rather than all of them in one code? Probably yes, the repos will then be quite small and managable.

The problem is that each of the interfaces need include files from the core (not just a library). So if they are in separate repositories the repositories MUST be the same version

git-submodules, or ideally some dependency management (like pip for c++).
The problem you mention would manifest only with BREAKING changes in API, while you can still do a lot of development insude “your submodule” repo without breaking the outside consumers. Interfaces also enforce good coding practices.
My biggest point for this is that NuPIC is already quite old, so the interfaces would be stabilized more or less.

@chhenning I would be very interested in optimizing the hell of c++ HTM code. Would love to see your changes and merge them into the proposed structure!

I did some performance analysis …do you have a thread about #optimization ? If not, can we start one? I 've done

  • some micro-benchmarks
  • considered ditching the hand-made ASM
  • initial test with eigen
  • performance analysis of the HTM chain

Not quite.

What I prefer is to have everything as one repro.
That list is the sections that it contains.
We have been working on this for a while so that the Python interface is not as intertwined with the other code.

The Execution Harness calls serialization. The original code had two types of serialization, CapnProto and Yaml. The Yaml one is simpler (although slower). The CapnProto serialization was quite invasive and complicated. By removing it we simplified everything a lot.

The execution harness is not really an application although I guess you could use it as such. Read about it in the API specification (Network API).

If nobody is actually using the Network API and the Regions, it would simplify everything to drop that too. About half the code (I would guess) is in some way supporting that API.

I suggest you start a topic in #htm-hackers with a poll if you have questions about the preferences of the community.

1 Like

@rhyolight do you know if the Network ‘framework’ is being used? Or would it be simpler for people to learn if we just provided language interfaces to just the algorithms and perhaps some example code in each language showing how to connect them up and pass data around.

I personally think the Network harness is kind of nice (and efficient) but do people understand what it is for?

@breznak As @David_Keeney already pointed out we have a somewhat working community port. There is a visual studio solution which showcases the current structure. You can even run python inside the solution. :slight_smile:

A few things I did to nupic.core:

  • removed all python code and python helper code
  • removed dependencies such as Apache Portable Runtime (will use c++11 or boost)
  • removed cap’n proto. This might be controversial but so far it has caused too many issue. Amazingly a lot of compiler warnings disappeared. :wink: Eventually, we will have to find way to save and load networks.
  • Added the use of smart pointers to the engine (needed that for pybind)

I agree with you that nupic.core’s biggest offer are the algorithms. The network stuff is neat but there might be better solutions out there.

There is Eigen but there is also Blaze. I’m not sure which one fits our puposes best. Good start for benchmarking. No?

Getting all the better :wink: … is it this one? https://github.com/chhenning/nupic.core

There is a visual studio solution which showcases the current structure.

Is it this thread? or where is it drafted? https://discourse.numenta.org/t/nupic-core-c-bindings
I don’t have VS, but I can review the branch, some image or description would be helpful.

removed all python code and python helper code
removed dependencies such as Apache Portable Runtime (will use c++11 or boost)
removed cap’n proto. This might be controversial but so far it has caused too many issue. Amazingly a lot of compiler warnings disappeared. :wink: Eventually, we will have to find way to save and load networks.
Added the use of smart pointers to the engine (needed that for pybind)

All sound good to me, but the removed serialization. Is something (yaml) working? Do you think to get someting in a reasonable time? As for many experiments…you need a model that trains a loong time.

Getting all the better :wink: … is it this one? https://github.com/chhenning/nupic.core1

This one is old and should be deleted soon. Please use https://github.com/chhenning/nupic

Here is the structure that I have created in conjuction with @David_Keeney and @heilerm.

  1. nupic.core – static cpp lib
  2. nupic.core_test – nupic.core unit tests using gtest
  3. nupic.python.algorithms – pybind11 module
  4. nupic.python.engine – pybind11 module
  5. nupic.python.math – pybind11 module
  6. nupic.python27.algorithms – pybind11 module
  7. nupic.python27.engine – pybind11 module
  8. nupic.python27.math – pybind11 module
  9. nupic.python_test – cpp app to test python modules
  10. python3 - nupic python 3 port
  11. yaml-cpp - static cpp yaml lib

I also provide all necessary binaries to build the system. This obviously only works for Windows.

There are some more serialization methods like writeToString for some of the algorithms. Matrices can also be saved and loaded from file. But it’s all rather crude. I’m open to suggestions like boost::serialization (saving to zipped archive for instance) or cap’n proto, or cereal, etc. But I think we need to talk about what we actually trying to achieve?

These are binaries, folder structure, right? Not separate repositories?

From the hindsight, it seems to me that you focus more on “bindings, python, windows/platform”, while mine is on “c++, optimizations, refactoring”. It’s good if we can define both these views and can try to chew it.

This would be exactly “mine” :

?

nupic.python.algorithms – pybind11 module
nupic.python.engine – pybind11 module
nupic.python.math – pybind11 module
nupic.python27.algorithms – pybind11 module
nupic.python27.engine – pybind11 module
nupic.python27.math – pybind11 module
nupic.python_test – cpp app to test python modules

And these nupic.bindings.py(2) :

python3 - nupic python 3 port

separate nupic.bindigs.py3 (?)

yaml-cpp - static cpp yaml lib

even removed?, serialization.

I can see the benefits of merged py+cpp repos.

TL;DR: we more or less agree on the structure, just should consider STRICT isolation of separate layers. And possible modularization in repos: What will happen when @David_Keeney merges the c# bindings? And some others? And alternative implementations of the algorithms? …I think the repo might get huge, too crowded with issues and activities to manage.
Another thing is 3rd party projects that would like to use some functionality (swarming, python, c++ iib) and don’t want to pull,manage,build all the other stuff.

Or we could brainstorm the suggested alternative which provides way for the both worlds:

  • atomic repos
  • layered “OSI” model, where on a higher layer the API is shielded
  • Good thing is all of this can be changed later, unless you base your changes on forks of nupic.core, nupic.
  • It depends what you expect from the repo: your/python-feature repo, the structure is fine; community or generic-use repo, the design might change a bit.

Correct.

In a nupic installation the bindings are actually a mixture of python and cpp modules (algorithm, engine, math)

As far as I can see it’s just used for configuration.

I think for now we are good. In the future we might restructure. I’m still stuck in “small” issues like recompiling nupic.core with 64bit float vs 32floats. Or even 128 floats…

Hope you don’t take it the wrong way. Right now I really like having everything in one visual studio solution, including nupic. Great for testing and tinkering.
cmake is on the todo list!

Yes, let’s brainstorm!

1 Like

I’m eager to try this on linux

@David_Keeney we are using C++ Network API and Region for working with different hierarchies in htmresearch

I am looking for people (C++, Py,…) who would like to participate in the community Reviewers team. I think it’s important to see the code with more eyes (not only for bugs, but for design, usability). But I also hope for quick feedback and rapid developement in the community repo.

  • I’ve set “rules” for merging that at least one review is required to get the new code in.

Goals

Q: What would you like the goals, direction of a community repo would be?
After we get more ideas, I’ll make a poll again.

For me it is:

  • at start, continue with active depelopment based on numenta/nupic.core/py
  • active PR review, fixes, open for big changes,…
  • keep multiplatform support
  • keep high level of code (tests, reviews)
  • gradual development to new big features (optimization, py3, restructuring,…)
  • ease of use for the programmers,developers(=us :wink: )

All of this discussion is good. But we must keep in mind what this library is for. This is not a product or production application. This is a framework for experimentation.

From my viewpoint this library is intended for people who want to experiment with their own implementations of any of the components.

  • So the first priority is understandability of how the algorithms in the library work.
  • The second priority is flexibility in how the parts can be connected.
  • The third priority is flexibility on platform and programming language of the user.
  • Performance is important but not at the expense of the other three goals.

We should assume that the number of algorithms and variations on those algorithms will continue to grow as people find new things that work…after all that is the purpose of the library. Consequently do not expect any of the API to remain constant. As new ideas are presented we should try to incorporate them. For example, if someone discovers a new way to do SP and has a working implementation in Python we should help them port that to C++ and add it to the library if they are unable to do that themselves. I would not expect to see polished code being offered and that should be ok. Every module in the library must have a corresponding unit test module but don’t expect offered modules to have one so we must help them provide one.

@breznak your thing seems to be C++ optimization…that is great. You can help optimize the submitted modules as long as it does not make it harder to understand or loose flexibility.

Having said that, the actual layout of the library should be focused on how easy it is to understand even by someone that is not a professional programmer. In my opinion it is not mandatory for the community core library to be a clone of nupic’s production core as long as we can identify the changes they have made so we can incorporate them into our library. I expect there to be considerable deviation.

1 Like