Re-Organisation of the C++, PY Community repos

If nobody is actually using the Network API and the Regions, it would simplify everything to drop that too. About half the code (I would guess) is in some way supporting that API.

I suggest you start a topic in #htm-hackers with a poll if you have questions about the preferences of the community.

1 Like

@rhyolight do you know if the Network ‘framework’ is being used? Or would it be simpler for people to learn if we just provided language interfaces to just the algorithms and perhaps some example code in each language showing how to connect them up and pass data around.

I personally think the Network harness is kind of nice (and efficient) but do people understand what it is for?

@breznak As @David_Keeney already pointed out we have a somewhat working community port. There is a visual studio solution which showcases the current structure. You can even run python inside the solution. :slight_smile:

A few things I did to nupic.core:

  • removed all python code and python helper code
  • removed dependencies such as Apache Portable Runtime (will use c++11 or boost)
  • removed cap’n proto. This might be controversial but so far it has caused too many issue. Amazingly a lot of compiler warnings disappeared. :wink: Eventually, we will have to find way to save and load networks.
  • Added the use of smart pointers to the engine (needed that for pybind)

I agree with you that nupic.core’s biggest offer are the algorithms. The network stuff is neat but there might be better solutions out there.

There is Eigen but there is also Blaze. I’m not sure which one fits our puposes best. Good start for benchmarking. No?

Getting all the better :wink: … is it this one? GitHub - chhenning/nupic.core: Implementation of core NuPIC algorithms in C++ (under construction)

There is a visual studio solution which showcases the current structure.

Is it this thread? or where is it drafted? https://discourse.numenta.org/t/nupic-core-c-bindings
I don’t have VS, but I can review the branch, some image or description would be helpful.

removed all python code and python helper code
removed dependencies such as Apache Portable Runtime (will use c++11 or boost)
removed cap’n proto. This might be controversial but so far it has caused too many issue. Amazingly a lot of compiler warnings disappeared. :wink: Eventually, we will have to find way to save and load networks.
Added the use of smart pointers to the engine (needed that for pybind)

All sound good to me, but the removed serialization. Is something (yaml) working? Do you think to get someting in a reasonable time? As for many experiments…you need a model that trains a loong time.

Getting all the better :wink: … is it this one? https://github.com/chhenning/nupic.core1

This one is old and should be deleted soon. Please use GitHub - chhenning/nupic: Community Fork of Numenta's Platform for Intelligent Computing (NuPIC)

Here is the structure that I have created in conjuction with @David_Keeney and @heilerm.

  1. nupic.core – static cpp lib
  2. nupic.core_test – nupic.core unit tests using gtest
  3. nupic.python.algorithms – pybind11 module
  4. nupic.python.engine – pybind11 module
  5. nupic.python.math – pybind11 module
  6. nupic.python27.algorithms – pybind11 module
  7. nupic.python27.engine – pybind11 module
  8. nupic.python27.math – pybind11 module
  9. nupic.python_test – cpp app to test python modules
  10. python3 - nupic python 3 port
  11. yaml-cpp - static cpp yaml lib

I also provide all necessary binaries to build the system. This obviously only works for Windows.

There are some more serialization methods like writeToString for some of the algorithms. Matrices can also be saved and loaded from file. But it’s all rather crude. I’m open to suggestions like boost::serialization (saving to zipped archive for instance) or cap’n proto, or cereal, etc. But I think we need to talk about what we actually trying to achieve?

These are binaries, folder structure, right? Not separate repositories?

From the hindsight, it seems to me that you focus more on “bindings, python, windows/platform”, while mine is on “c++, optimizations, refactoring”. It’s good if we can define both these views and can try to chew it.

This would be exactly “mine” :

?

nupic.python.algorithms – pybind11 module
nupic.python.engine – pybind11 module
nupic.python.math – pybind11 module
nupic.python27.algorithms – pybind11 module
nupic.python27.engine – pybind11 module
nupic.python27.math – pybind11 module
nupic.python_test – cpp app to test python modules

And these nupic.bindings.py(2) :

python3 - nupic python 3 port

separate nupic.bindigs.py3 (?)

yaml-cpp - static cpp yaml lib

even removed?, serialization.

I can see the benefits of merged py+cpp repos.

TL;DR: we more or less agree on the structure, just should consider STRICT isolation of separate layers. And possible modularization in repos: What will happen when @David_Keeney merges the c# bindings? And some others? And alternative implementations of the algorithms? …I think the repo might get huge, too crowded with issues and activities to manage.
Another thing is 3rd party projects that would like to use some functionality (swarming, python, c++ iib) and don’t want to pull,manage,build all the other stuff.

Or we could brainstorm the suggested alternative which provides way for the both worlds:

  • atomic repos
  • layered “OSI” model, where on a higher layer the API is shielded
  • Good thing is all of this can be changed later, unless you base your changes on forks of nupic.core, nupic.
  • It depends what you expect from the repo: your/python-feature repo, the structure is fine; community or generic-use repo, the design might change a bit.

Correct.

In a nupic installation the bindings are actually a mixture of python and cpp modules (algorithm, engine, math)

As far as I can see it’s just used for configuration.

I think for now we are good. In the future we might restructure. I’m still stuck in “small” issues like recompiling nupic.core with 64bit float vs 32floats. Or even 128 floats…

Hope you don’t take it the wrong way. Right now I really like having everything in one visual studio solution, including nupic. Great for testing and tinkering.
cmake is on the todo list!

Yes, let’s brainstorm!

1 Like

I’m eager to try this on linux

@David_Keeney we are using C++ Network API and Region for working with different hierarchies in htmresearch

I am looking for people (C++, Py,…) who would like to participate in the community Reviewers team. I think it’s important to see the code with more eyes (not only for bugs, but for design, usability). But I also hope for quick feedback and rapid developement in the community repo.

  • I’ve set “rules” for merging that at least one review is required to get the new code in.

Goals

Q: What would you like the goals, direction of a community repo would be?
After we get more ideas, I’ll make a poll again.

For me it is:

  • at start, continue with active depelopment based on numenta/nupic.core/py
  • active PR review, fixes, open for big changes,…
  • keep multiplatform support
  • keep high level of code (tests, reviews)
  • gradual development to new big features (optimization, py3, restructuring,…)
  • ease of use for the programmers,developers(=us :wink: )

All of this discussion is good. But we must keep in mind what this library is for. This is not a product or production application. This is a framework for experimentation.

From my viewpoint this library is intended for people who want to experiment with their own implementations of any of the components.

  • So the first priority is understandability of how the algorithms in the library work.
  • The second priority is flexibility in how the parts can be connected.
  • The third priority is flexibility on platform and programming language of the user.
  • Performance is important but not at the expense of the other three goals.

We should assume that the number of algorithms and variations on those algorithms will continue to grow as people find new things that work…after all that is the purpose of the library. Consequently do not expect any of the API to remain constant. As new ideas are presented we should try to incorporate them. For example, if someone discovers a new way to do SP and has a working implementation in Python we should help them port that to C++ and add it to the library if they are unable to do that themselves. I would not expect to see polished code being offered and that should be ok. Every module in the library must have a corresponding unit test module but don’t expect offered modules to have one so we must help them provide one.

@breznak your thing seems to be C++ optimization…that is great. You can help optimize the submitted modules as long as it does not make it harder to understand or loose flexibility.

Having said that, the actual layout of the library should be focused on how easy it is to understand even by someone that is not a professional programmer. In my opinion it is not mandatory for the community core library to be a clone of nupic’s production core as long as we can identify the changes they have made so we can incorporate them into our library. I expect there to be considerable deviation.

1 Like

@thanh-binh.to Thank you. That is reason enough to include it in the C# interface.

We use it heavily internally. The OPF is built on top of it. I suggest you keep it, it provides the flexibility to construct layers and columns.

1 Like

Thank you for your points, will def. add them to the poll.

This is not a product or production application. This is a framework for experimentation.

I agree, in a way. I’m open for more rapid and extreme changes, but on the other hand, I’d like to have a fork that is “an actively developed continuation of the Numenta’s repositories”. So that Numenta can try to sync once in a while, if they wish, and people who build their apps on top of it can continue to use a fixed and developed descendant. So I’ll also add

  • compatibility (more or less) with the current Numenta API
  • rapid (vs conservative) development (API breakage)
  • unit-test coverage (each new feature is tested)
  • keep c++ / Py feature+API parity (vs the repos can separately diverge and live on their own)

Note, I’m collecting ideas to ask here, not that I’d agree with all the points I’m listing here.

library should be focused on how easy it is to understand even by someone that is not a professional programmer

I’m not sure about this one. Either they are scientists and focus mainly on the papers/NeuroSci, or programmers who focus (and know) the internal workings, or application users, who use products based on HTM (Grok, HTM schools, …)…imho

I would not really care about whitespaces and coding-style so much (always Matt had to punch me to do that :wink: )

…if someone discovers a new way to do SP and has a working implementation in Python we should help them port that to C++ and add it to the library if they are unable to do that themselves

Careful with this, of course we’ll do it if we like that or it’s uber cool feature; but you might soon end up porting code you are not interested at.

More ponts, ideas for the poll, what you want from future nupic?

1 Like
  • pure python functionality (no other dependency on bindings, for quick prototyping)
  • focus on Py repo
  • focus on c++ repo
  • provide releases for binary installs (pypi)

Great job coordinating, you all. It would be a Very Good Thing to get everyone working on the same forked codebase with a set of objectives. You seem to be doing in the right direction.

Just be careful about letting “new algorithms” into the project. When this happens, be very clear about where they originated, whether they are biologically inspired or not (cite papers). It will help in the future.

@rhyolight Ah, yes I agree. These should be HTM algorithms. I was thinking in terms of some of the variations of the HTM modules listed in the API specifications…like backtracking TM, and perhaps some more encoders and classifiers or even some monitoring tools. Hopefully some new things will eventually come out of Numenta’s current research that we can add.

I know you already set up the repositories, but…
My stab at it…

// core repository (C++)
/nupic.core
  /packages // [machine]/[vendor]/[operating system]/[package].[tar.gz|zip]
    /x64
      /gcc
        /linux (*.tar.gz)
          ...
        /windows (*.zip)
          ...
      /msvc
        /windows (*.zip)
          ...
    /x86
      /gcc
        /linux (*.tar.gz)
          ...
        /windows (*.zip)
          ...
      /msvc
        /windows (*.zip)
          ...
  /modules // can these go in separate repositories?
    /nupic.core
      /include
        /nupic // headers
          ...
      /src
        ...
    /nupic.core.cs // bindings
      /include
        /nupic // headers
          ...
      /src // header-only?
        ...
    /nupic.core.py // bindings
      /include
        /nupic // headers
          ...
      /src // header-only?
        ...
  /tests
    ...
CMakeLists.txt

// client repository (C++)
/nupic -> [nupic.core]
  /include
    /nupic // headers
      ...
  /src
    ...

// client repository (C#)
/nupic.cs -> [nupic.core, nupic.core.cs]
  ...

// client repository (Python)
/nupic.py -> [nupic.core, nupic.core.py]
  ...