I noticed you’ve opened some issues, are these the issues you’ve encountered during conversion? By the way, what is your method of converting python 2 to 3? Will your changes support both 2 or 3 or just 3?
Unfortunately it is not ready for prime time yet. We still need to replace SWIG with pybind11 and a few other goodies. (see Issues). Progress has been slow.
Jose, these are great questions and we need to re-visit these periodically to make sure we are keeping on track.
What does nupic community version mean? In terms of long-term goals and ownership, how does it differ from the nupic code provided by Numenta?
The nupic community version is a fork off of the Numenta nupic code set and is maintained entirely by community members. It remains under the Numenta open source license. The goal is to remain true to the HTM algorithms developed by Numenta as a foundation and provide a framework for the community to experiment with their own ideas and concepts without having to start from scratch.
What is the need for building a nupic community version? Is Numenta’s nupic unmanageable or reached its EOL?
The Numenta code base has been somewhat locked in an older technology with Python 2.7 and older C++ libraries. The Numenta staff have been focusing their attention on extending the model where it is most important rather than expending resources on upgrading to the latest tools. The older technology is probably not a problem for the Numenta staff but for the community members trying to learn the HTM model it is a distraction if not a deterrent.
Is there a consolidated effort for updating nupic & nupic.core to python 3?
Is nupic.core conversion purely a conversion effort or an upgrade as well?
The objective is to upgrade the nupic Python code to python 3 and reduce the complications of installing and learning the code for community members. The first effort is to just mechanically convert the Python code to python 3 using the conversion tools. If more community members are interested they can take on clean up and streamlining portions of the Python code as well.
My personal effort has been on the C++ library. The C++ portion of the code was originally intended to be just an extension of the Python code base to provide a little more performance. With the community version, the C++ portion is being flushed out to the point that it could also be a stand-alone library that can be used by community members that may be interested in experimenting entirely in C++. We are removing older unused logic, upgrading to C++17, removing most of the old dependencies, and streamlining the code. It is intended that it could also be a foundation library for CSharp and possibly other languages. We are targeting Linux, OSX, and Windows platforms, both 64bit and 32bit. The guide is to retain the nupic HTM API as much as possible.
The first step was to remove the capnproto serialization from the C++ library and replace it with simple binary streaming serialization. That step is nearly complete. This simplified the code base considerably. The next step is replacing the APR and boost dependencies with C++17 std:: library routines. Then we will replace SWIG with pybind11 and isolate the Python interface as an independent interface to the C++ library followed by an effort to streamline the infrastructure. Then we can port more Python modules to C++ as it seems appropriate.
As the Numenta staff expand the HTM model and firm up the logic we will extend the community C++ library to include these new algorithms.
Anyone interested in participating, take a look at the issues list in GitHub https://github.com/htm-community/nupic.cpp and join in the conversation. If you find something you would like to do, create a fork and make it happen.
I definitely support the community version efforts, so don’t take this comment the wrong way. I personally believe Python 3 compatibility in upstream NuPIC is necessary, and that it should be made to work with both Python 2 and Python 3. The reason is because there are still many other libraries which are not yet Python 3 compatible, so having compatibility for both provides the best opportunity for folks to include NuPIC in their projects. At some point down the road when Python 2 is past EOL and majority of legacy projects have themselves either been updated or EOL, then would be a better time to do a full refactor and take advantage of Python 3 specific improvements.
I agree that the ability to support both Python 2 and 3 is very desirable. We welcome any volunteers that would like to take on that project? You can start by adding an Issue to the nupic.py repository and describe what you would like to do. Perhaps starting with @chhenning 's contribution and testing for compatibility with both Python 2 and 3.
@David_Keeney Thank you very much for the overview it was extremely helpful for newcomers like me. Now I understand why there is a community version of nupic. Great work on the efforts done so far.
@Paul_Lamb I’d like to help on the conversion for sure. As of now, it is a bit hard to decide which code base I’d work on converting py2 to py3. My main concern is that I do not want our efforts to be duplicated. My assumption (maybe I’m wrong) is that these two codebases may have diverged already or will diverge in the future, so efforts made might not be beneficial to both codebases and impact would not be significant. What are your thoughts on this? At this stage, I’m still trying to have some time to read the code and understand its high-level architecture, this is sort of checking out what type of beast I’ll be dealing with for the conversion work. The code, by the way, is highly OO, I don’t have anything against it, it’s just that it requires a steeper learning curve than a loosely OO code. Another thing to determine is that for the upstream code if there is a need to convert the C++ code as well, because this will be a different full-time task besides the py code conversion. I will have a think about this.
I think that is probably a safe assumption, based on my past experiences with forks which undertake significant refactoring right out of the gate like this one. I’m definitely not here to try and impose my own views one anyone else’s efforts.
From my perspective, the correct move is to fork from upstream NuPIC and make the Python 2/3 compatibility changes with a goal of minimizing divergence from the upstream code to the extent possible (i.e. don’t take the opportunity to fix other things like tweaking code formatting, etc). Focus should be entirely on achieving the one singular goal with a minimalist mindset. When complete, submit a PR to merge the changes into upstream master. It will then be up to any downstream forks to decide whether or not it is worth resolving any conflicts that would arise from pulling the changes into their forks.
As far as the C++ code is concerned, we are trying to keep the NuPIC API unchanged as much as possible. The objective being that no matter which Python codebase is being used the C++ interface will be the same. However, there are a few breaking changes that we will document in the file htm-community/nupic.cpp/API_CHANGELOG.md
The first big change is the removal of CapnProto serialization in favor of binary stream serialization within C++.
Later we will be replacing SWIG with pybind11 but I do not expect that to have much if any impact on the Python code.
My initial goal is similar to this. At first use of nupic, I find it a bit disappointing that it wasn’t compatible with py3 yet, this was because I’ve got py3 apps in mind that I wanna build with it. Not a problem for now, but soon it will be. So why not achieve the simplest goal first - convert py2 to py3. This is almost a mechanical task, which I believe is good because it is straightforward, not easy but simple. I also would prefer the upstream to be the minimal set or the subset of all downstream code. This is to minimize divergence of code which can cause issues later both in the aspect of development and growth in understanding of HTM applications. Most importantly, I prefer that we prioritize maintaining the upstream one as it is still managed by Numenta, they have a stronger force and binding goals enough to keep this code intact and usable, this is just based on my experience working on public codebases that are managed by the public and a private company. I will get back to you regarding this effort. Please let me know of your thoughts, maybe let’s open another topic if that is fine.
Will the new nupic c++ api you guys are working on designed to be agnostic to the nupic.py component? Reason I asked is if we convert nupic upstream to py3, in the future there can be an option to switch to this new c++ api. Thanks.
Yes, the intent is to be compatible with py2 and py3 and to “look like” the C++ nupic.core code in the Numenta repository. We will do that by avoiding any change that would break the documented API. Some minor changes are unavoidable but they should be well documented and there will be Python side wrappers hide those changes to the main body of Python code.
Just my 2 cents, but the reason everyone’s switching from python2 to 3 is because python2 is going away. So I honestly don’t see the point in maintaining a both python versions in a single code base. Instead i recommend making python3 work well and then dropping version 2. This transition is going to be painful, like ripping a band aid off.
After considering the possibilities involved for this effort (convert to py3), in the aspect of support and continuity of nupic, and considering my self-interests as well, I think that it might be more favorable for me (at first) to write an HTM library with python3 and using python ML libs, after all the theory is well documented. My main interests are to learn HTM as best as I can, contribute to the community, and build practical apps from it. My limited knowledge about nupic tells me that it is not progressing at the moment and I can understand why, another thing is that test coverage is not clear which is very important in py3 conversion. Conversion to py3 is a big work and I believe it should provide a big immediate impact to make the effort worthy, however this is not guaranteed for nupic IMO. IOW I’m not quite sure if conversion to py3 will even continue smoothly even given a clear test coverage, because of a different effort done for a moving part (C++), and the question of “do people really care about py3 for now?” - I believe people care more about learning and building apps. It is hard to decide for now with my limited knowledge to nupic.
I recently live-streamed my investigation of porting NuPIC into a Python 3 project. I was just trying to get a sense of the work it would take to port NuPIC to Python 3, especially by removing some of the less-used and ill-fitting features of the code. I thought I would share this here in case anyone else is interested.
I’ll be talking soon about how we’ll be dealing with he Python 2.7 EOL.