Proposal to introduce pybind for move toward Python 3 compatibility

Guys, I have created the nupic.core fork. The original code is saved in the branch “from_20180121”.

The master contains the following changes:

  • removed all content from /external/
  • add all external header dependencies into /include/
  • replaced /src/nupic
  • replaced /src/test
  • added /src/yaml-cpp
  • added /external/vs2017 which includes visual studio solution plus binaries. It does not work right now.
  • removed /bindings/

The repo is here: https://github.com/chhenning/nupic.core

Let me know what you think.

1 Like

I also added my nupic fork here: https://github.com/chhenning/nupic-1

Same procedure as last time. I created a branch which serves as the copy Numenta’s nupic repo.

The master contains the following changes:

  • added /python_migration/ – contains the 2to3 script input and output plus command line output
  • replaced /examples/
  • replaced /src/
  • replaced /script/
  • replaced /tests/

Please note that this python 3 and very much work in progress.

1 Like

@chhenning Very nice. I can see where my c# interface can be inserted. Everything seems to be located in reasonable places. Cool. I can tell that was a lot of work.

Hopefully My Jury Duty will end tomorrow so I can get back to work on the C# interface.

@breznak I got thinking about the repository names. I think it is stupid of me to be quibbling about such a small thing. I am sure that what you have will work. I apologize.

1 Like

really no problem :slight_smile:

1 Like

@chhenning I wanted to review and pull your changes in Switch to pybind from SWIG by breznak · Pull Request #27 · htm-community/htm.core · GitHub
but it’s too big for me to chew, I’ll need your help, please.

removed all content from /external/

Note: in the future I’d suggest making more atomic changes. As you delete thousands of files, I’m unable to understand a “diff” to master, so I’d look at separate commits for their meaning and review them one by one. The “remove external” would better be understood if “removing AprUtil lib from external”, “removing SWIG from external”, “Boost headers moved to /include/”, …

add all external header dependencies into /include/

we don’t have any of ours headers in include/, right? I’m just noting that having separate 3rd party code clearly from ours is useful.

replaced /src/nupic
replaced /src/test

replaced means moved, added something, …?

added /src/yaml-cpp

This is moved from external? Could it be /src/external/yaml-cpp then?

added /external/vs2017 which includes visual studio solution plus binaries. It does not work right now.

  • for the lib/, bin/ there…are those prebuilt binaries, or hosts build them themselves(then .gitignore)
  • how about keeping the external/ files as is (much less “changes” in PR)?
  • and putting the VS project in /IDE/vs2017 ?

removed /bindings/

rmoved and broke support for py-bindings (for now), or you are already able to replace that with pybind?

1 Like

I have tests failing, but it’s most certainly just the changed structure where I don’t know where the files moved. Could you have a look please?

https://ci.appveyor.com/project/breznak/nupic-cpp/build/0.3.0.30#L71
https://travis-ci.org/htm-community/nupic.cpp/jobs/332109248#L556

What is the overall status of your PR? Like lin/win/osx builds, tests, py bindings,…?

I would like to break this into even smaller pieces and discuss (and merge) one by one, if you want to explain to me the individial changes; like

  • move boost first (takes huge number of file changes away)
  • why we need move from external to include/ , src/yaml-cpp?
  • fix all the build recepies so we have tests at hand
2 Likes

@breznak Thanks for looking into the changes. I agree it’s a huge change and I’m fairly certain it will be the only time like this. If you like we could do a more step by step approach to digest all the changes. I don’t mind redoing it. This way we could agree on the structure of changes.

For instance, I agree with you on having 3rd party stuff in a separate folder. The folder /include/ might not be the correct one. And it’s even incorrect for boost. We need some of the boost libraries like, system and filesystem and those sources are not in the repo right now.

Here is my suggested folder structure for external or should it be “3rd party”?

/external/boost
/external/gtest
/exteranl/pybind11
/exteranl/python36
/external/yaml-cpp
/external/zlib

Does that look correct?

One downside I’m seeing is that now the INCLUDE search path gotten a lot longer, since we have to pull in all the include path for each dependency.

Let me know what you think.

2 Likes

If you don’t mind doing the rebasing for the 3rd time, I’d much appreciate doing a coordinated step-by-step merging. That way we’ll ensure tests pass, functionality stays and others can more easily discuss your changes on a “signel feature” PRs.

Good point. I don’t have a strong opinion on /include/ vs /external/xxx_lib , as both have some +/-. Or would it be possible to keep the current /external/{src,lib,include}/{boost, gtest,…} ?

If you are up for the 3rd round of changes, can we start with

  1. the huge boost? Add the boost/windows stuff that you have. Optionally move it to the location (/include, /external/include/boost/, /external/boost) that you find fit.
  2. add pybind, (and remove swig + …)
  3. … I was guessing the changes you made in your fork.
    The only requirement I’d like to achieve is to make PRs which always leave us in working state with tests and same level of features. Is that possible?

I don’t see this in your code yet. Is it just amendments for py3, or do we really want to include and build from source all Py3, Py2.7?

No, I don’t mind and it’s actually not much work. It’s important we get this right.

I would prefer all headers to be in include/ Including the external ones. Makes it easier to config builds, correct?

What is boost/windows? You mean the prebuilt binaries?

Here is what I suggest in regards to boost:

  1. add the complete boost source tree under external/boost
  2. add the boost headers under include/boost

Same for all other dependencies.

The way I understand is that “HTM community” is self reliant. Meaning it keeps all dependencies in the repo. And in addition each dependency should be the complete source tree, including headers, source, tests, cmake scripts, etc.

Not sure this is possible for the big changes we doing now. How about we create a branch and do it there. Then when we are done we merge to master.

I think we decided to stick to python 3. So no need to include python2.7 headers.

Am I still making sense?

1/ rebase: perfect, thank you for your will to make this right, even with the obstacles!

2/ /include/ …yep, sounds good.

3/ boost …no, I noticed some new headers related to Windows platform (hope I was looking right).
Agree about the boost steps you’re suggesting, let’s start making PRs with the dependencies, and we can resolve the little details directly in the discussion above the codebase.

Hmm, I’m not sure about this one. We have been on/off in nupic with both camps - “we build all from source” vs. “try to use binary deps”. The All-Source makes easier to include the deps, but builds take longer (not much of a problem), but is more err prone (with regards to ‘something screwed up in my XYZ configuration’, mostly for non-dev users).
I think we included headers (no problems), and some libs that caused problems with official library (capnproto). Otherwise dependencies were mostly binary - via pip, numpy, python, swig, …

I would like to ask @rhyolight or some others who have been taking care of the dependencies, installs and users’ problems to weight in this question: “Use binary distribution of dependencies where possible, or rather build all ourselves from source”?

TL;DR: I’d shy away from including too much dependencies in our source build, rather make instructions to obtain them. Need advice from more involved ppl.

4/

Not sure this is possible for the big changes we doing now. How about we create a branch and do it there. Then when we are done we merge to master.

so at least as much as possible: we merge the deps, the restructuring you’ve made. Ideally we could also go on with the pybind replacement (?), then other changes can follow. This publish early, publish often approach would also make it simpler for people working on their forks/depending on your work to go head.

Yes, that’s basically the situation now with your repo. Just make the changes in small enough batches to make it review-able.

Correct, my bad. Cross off py2.7.

My question was (as above) more about including the headers? vs. “Install python3 + headers for your OS / Win, Linux, OSX instructions/”.

1 Like

I agree with you. Bundling dependencies bloats the size of the distribution, and complicates installations where those binaries are pre-installed and customized.

2 Likes

I have a small sample project here that uses CMake and tool chains to create a project. Just proof of concept I could reach those dependencies without the include folder.

I added:

  • apr
  • apr-util
  • boost
  • yaml-cpp
  • zlib
  • CapnProto

You could take CapnProto out of the CMakeLists.txt, if you don’t want to build it. My hope was that you could use different tool chains to bring in those dependencies. Apt-Get for Unix systems? Not sure if this is the most robust solution. You are limitied - when using vcpkg - to the following ports: https://github.com/Microsoft/vcpkg/tree/master/ports.

This is a very simple example:

1 Like

Hey, love the redux repo! :+1:

Esp. since you provide the VCPKG (something like linux repositories, right?) for MS Win.

…reviewed the repo: I’d really like the reduction, for Unix systems with decent package managers it would be a no-brainer! For Win, I’m not sure how people would handle the VCPKG stuff ?? Or can we somehow automate that on Win?

I’ve suggested Setup CI on AppVeyor · Issue #2 · mattheiler/nupic.core · GitHub , if you can make it run, we can apply the same procedure to nupic.cpp!

Hi guys, if you guys don’t want to include the dependencies that’s fine with me. We could provide documentation on how to build the dependencies on the individual platforms. Let me know and I’ll make the adjustments.

We don’t need these two libs anymore.

If there’s a decently easy way to install the dependency for all supported platforms: ie we can do it in the CI setups, then I’d say don’t include it in the source.

Ofc, it changes for cases where we need to compile with special flags or something, then it’s fine to keep it in.

Here is the list of all dependencies and how we are using it:

  • boost (header)
  • boost.filesystem (lib)
  • boost.system (lib)
  • boost.dll (not sure)
  • gtest (lib)
  • pybind11 (header only)
  • python36 (lib)
  • yaml-cpp (lib) I suggest we still include the source. Since nupic.core is using an outdated version which is hard to find.
  • zlib (lib)

When I say “lib” I mean we are linking to it at compile time. That could mean static lib or dynamic lib. We are now leaving it up to the user. In a “self reliant” repo we would make this decision.

On windows all libs can be build with vcpkg.

@breznak What should we do?

  1. Include all sources and headers
  2. Include headers but no sources. cmake will have to be configured
  3. Don’t provide headers nor sources. cmake will have to be configured

My preference is 1. but it has the already mentioned drawbacks of huge repo and slow builds.

1 Like

I’d be deciding between 1, and 3, with preference for 3/no dependency sources. That is for those deps where there is a convenient method to obtain them. With biggest drawback being that “random” things mess up a users installs.

I have a PR to bump it to the recent version. CC’ed you for AppVeyor problems there. Otherwise it works on Linux/OSX already.

In your opinion, how much trouble is at for a user to use that method to install required deps?

BUT, this thread is about pybind, and we need to move forward. I’d say unless you still have some open tasks where you’d need to resolve issues with dependencies, we go with

and implement your changes: removed libs, pybind mainly.

Then, we can go a new PR with @hellerm 's changes for the redux dependencies.
Thoughts? @hellerm, @chhening

Excellent. They aren’t configured for CMake, so they weren’t as nice to play with.

1 Like