Proposal to introduce pybind for move toward Python 3 compatibility

To answer my own question. As it turns out boost::python has lost some steam and there haven’t been many commits in the recent years.
Luckily, a new project pybind11 has arisen for the last few years and I think it looks like a good fit for nupic.core. A few highlights:

  • header only library
  • supports modern c++11 and beyond
  • active development
  • supports python integration of c++ code and embedding python in c++ code
  • has python’s numpy support
  • support for c++'s chrono and Eigen libraries

I have been toying around this lib and already replaced some code in nupic. The goal is to replace all python C API code!

I’ll present a proposal in the next few days and hope we can have an active discussion around it.

Thanks,
Christian

3 Likes

Is pybind11 a replacement for swig?

pybind11 doesn’t claim to be a replacement for swig. It’s really just a nice way to avoid native C python API calls and as such it’s in “competition” with boost::python. I honestly like it better than boost::python which needs to be compliant with old c++03 code.

The main purpose is create python c++ extensions but it also provides ways to embed python calls inside c++ code.

Hope this explains it.

Here is another explanation:

So do you imagine keeping swing or replacing it with pybind?

I have to understand a little more about nupic.core. But classes like PyArray, WrappedVector, and all the utility classes in PyHelpers.cpp look like they could be replaced with pybind11 equivalents.

If this is going to take a considerable amount of code changes (sounds like it will), you might think about creating a community fork so you don’t have to wait on nupic.core committers to merge your PRs. We are only planning on working on nupic.core to support ongoing research. It doesn’t change much, so you might be able to keep a community fork updated without too much trouble.

A post was split to a new topic: What is a community fork?

Here is a small example on how to exchange data between cpp and python. Imagine the Matrix class as nupic’s SparseBinaryMatrix or similar.

Please note the following:

  • using the pybind11 header only cpp lib
  • it doesn’t matter what branch of python is being used (2.7 or 3.6). The python c api is included and linked to when the module is generated
  • no need to define numpy interface classes, like nupic’s NumpyMatrixT, etc.

Good news, I have finished migrating the sparse_binary_matrix_test.py to python 3. For that I defined a new python module using pybind11 (calling it nupic_ext for now) and also made some minor code changes to the python 2.7 code.

Here are a few take aways:

  • the module can be compiled with python 2.7 and python 3.
  • no need for swig interface files, like sparse_matrix.i or any of Python’s C API functions
  • no need for any code changes in nupic’s core source code.
  • still no need for any code from the nupic’s py_support folder, like NumpyVectorT, etc.
  • using pybind11 allows the module definition to be included in a native cpp project, like this
    • that means, the module definition, module loading and debugging/testing can all be in the same project! It’s just c++11

The next steps are to migrate more tests and swig files and of course, to clean up the code…

Any comments are welcome.

@chhenning I’ll grab the latest from your fork. I had it working the other day… curious to see the updates.

@David_Keeney is still working on updating his fork as well - he’s trying to get a VS 2017 solution and update to Python 3. There’s obviously some overlap between you two.

He’s trying to get core updated to work with VS 2017 and then hopefully we can add some language bindings for C#, Java, Ruby, etc. This work sounds like it could simplify the setup process.

@heilerm Please note that I have created a new branch “pybind11”.

In general, as a MSVC user would like to have the binaries part of the repository so you don’t need to build anything yourself?

Using CMake, I think I would prefer to exclude binaries from the repository and instead configure the package registry as a solution for Windows. Other platforms could just use straight-up find_package. I would imagine core supports this… I haven’t played with it yet. I’m teaching myself with a sandbox at the moment, piecing everything together, bit-by-bit. This is all new to me.

@heilerm There are already instructions to get Windows build with cmake.

It’s not quite working anymore, I believe, but it should be a starting point, no?

A manifest would have to be defined. Also, nupic patches a few dependencies, which requires patch.exe - not available on Windows without GNU… so, that’ll have to be addressed. Would not a newer version of that dependency suffice? A separate distribution?

If you mean the build binaries… I guess the common approach is to use the CMake generator.

Yeah, it’s not working for me. @David_Keeney seems to be making progress, though. I’m working on a sandbox to see if an alternative solution can be offered that’s a little more Windows friendly, but the Python dependencies kind of put a damper on that. Trying to find the time to work on it. Not sure that I’m doing much more than playing at this point.

I’m a noob with cmake. But I assume you will be using vcpkg which needs to be installed during the cmake process. Why not just install patch as well?

By the way I have not patched anything when I created my Visual Studio solution.

CMake wouldn’t install vcpkg, I’d imagine. Not sure that’s the concern of that utility. That would be a solution for Windows package management, instead of apt-get. The repository CMake config could be updated to work with CMake’s find_package using the package registry. I think nupic already does this - or can do this - on unix systems. The manifest would update the CMake config to look for a package of a certain version and issue a warning if it’s a different version. That’s how I understand it… at the moment, at least. Still working on it, and looking at what nupic has.

Basically… CMake is configured to look for packages at a couple built-in or configured locations. CMake could be configured to point where vcpkg drops packages… or apt-get, etc.

Oh, yes I have mixed up cmake with the CI tool Appveyor. Have you considered instead of using your own sandbox just to create a appveyor script? The integration with github is very nice and it seems easy to setup a Windows machine with the right tools before you start with cmake.

nupic already relies on some CI tools for other platforms like Linux and Mac.

I haven’t look at it yet, but certainly will, although it looks like that’s more for CI than strictly development, which is where I’m at this moment. I thought I’d have the sandbox on Github last weekend, but… priorities. I’ll get it up soon. Very simplistic at the moment.

Also, I would love to see if there’s a pure C++ solution that could be offered. I’m sure it’d break current dependent packages, but if offered as a community fork and offered a migration path… I’m sure there’s a well-supported library that offers what numpy has, only in C++ without the interop. Not sure why pycapnp is still required yet and not just capnp.