The Yaml code is written to it’s old API, too. I don’t know what we’d get if it were upgraded, but there’s that to consider as well. If it can be replace with just one solution… fantastic.
I’d guess CapnProto was chosen for its format and performance. Here are some benchmarks. Cereal is a bit more than twice as slow, according to these benchmarks. I’d say CapnProto is still a nice fit.
I think moving away from Python in core entirely would be best. Working with just one language will reduce the complexity and… I’d imagine it’d perform better.
@chhenning I have removed the macros. That was sloppy conversion. I will upload it to my GitHub as soon as it compiles cleanly.
Now…what to do next.
What I am thinking is that I can rework the CMake so that the core C++ code is compiled separate from the interface modules. Right now the generated code is mixed in and compiled at the same time. The idea is to have a primary build target for the core C++, one for Python interface, and one for C# interface. The BUILD_ALL would still build them all.
I don’t want to duplicate work anyone else is doing so if there is something else that would be more useful let me know. At some point we need to put our work together.
@David_Keeney Working on cmake would be great, if that interest you of course. Also a appveyor script for CI would be cool.
As for removing python related stuff from nupic.core, have a look at the py_support, ntypes, and regions folders. I’m working on that but any help is welcome. BTW, there is a good chance that nupic might use some embedded python somewhere. At least the unit tests indicate it.
Ok, the changes to remove the comparability Macros has been uploaded to my GitHub.
There are lots of embedded calls into the Python library and the numpy library to implement the Python interface. I have yet to find anything else that calls them but I would not be surprised if the C++ code calls numpy to do something other than interface related things. I think that when we separate the C++ core from the interface code such that Python.h is not included anyplace we should be able to tell.
Currently arrays (at least some of them) either passed in or returned, use numpy arrays on the Python interface. What type of array structure should we return for C#? Array, Lists, ArrayList, or even Collection?
I think ntypes is used for Yaml serialization. Py_support is part of the Python interface, the regions folder I think is for Capnproto serialization to file but not sure.
I think separating the interface in a different build should help identify what code goes with what.
@David_Keeney I’ll fork your project and work from that.
Anyone have a rough plan at the moment? A good starting point we can all tackle?
I would vote for arrays. Does it have to be a broad configuration, or can we determine this as needed? I imagine we lose some usability as we gain flexibility, so the broader approach makes sense.
Not just in an include folder? I’m also wondering if it would make sense to break up the code into smaller projects, rather than its current monolithic structure.
I am still studying the code so there are still lots of unknowns. But I don’t think we need to break up the core routines. It is not that large. Just separate the interfaces during the build such that it is possible to build the core without the Python interface or any Python stuff. After the core is built, the user can build a Python interface or a C# interface as separate build targets. Not sure if the core should be a separate library and each interface be a separate library. I know C# prefers a dynamic library (dll) rather than a static library (lib).
Core:
algorithms
encoders
engine
math
os
utils
types
ntypes (may be part of Yaml which is obsolete)
regions (not sure where this goes, it is serialization but might be part of Python interface).
proto (This is serialization but is language independent.)
Interface: One target for Python and one target for C#
bindings – This will have both Python and C# portions
py_support (python only) – only this includes Python.h and numpy/arrayObject.h
Tests:
Unit Tests (built with the C++ Core)
Integration Tests (built with the Interface code)
Python Tests (something new…Python code that tests the interface)
C# Tests (something new…C# code that tests the interface)
External:
boost – used by core
gtest – used by the testing
apr (with iconv & util) – used by core
capnproto – used by core for serialization
*pycapnproto – interface; (Python only)
swig and pcre – interface only
yaml – depreciated serialization. could be removed.
*zlib – used by core for serialization
Just a note: If we change any data type or entry point in the core that is exposed to the Interface, we will have to modify the .i file in bindings and the .capnp file in proto. A corresponding change to the unit tests will also be required. so consider this before changing anything.
Extraction plan suggests a library per interface would be preferred. I think generating those libraries with core and packaging them would be sufficient - source control only core. Not entirely sure.
For projects, I’m thinking the same thing, I think…
@David_Keeney py_support is for python integration and embedding python into cpp. But there is more like pyRegion which is part of regions/ folder. Probably some more.
@chhenning Yes, regions is an odd folder. It is obviously Python interface related but does not seem to be tied to the SWIG files. The test for it is in the integration folder rather than unit tests. It is an entirely separate executable. Don’t know if that means anything.
Anyway, not sure what to do with it yet. Perhaps the server side of a client-server interface with python? For the moment I am setting those files aside and see what breaks without them.
@David_Keeney Here is my understanding. From a cpp standpoint you can make your code available to python, aka creating a python module or extension. On the other side you can call python from within cpp, aka embedding. That’s one thing.
The second is that when creating a nupic app you create a network with nodes (regions?), let’s call it a dataflow network. I’m not an expert here. But the way I understand right now is that these nodes or regions are usually cpp but pyRegion might open up the playing fields to python code. That would be one example of embedding python into cpp. Does that make any sense?
@chhenning Yes actually it does. When you create a Network you can use pre-built regions in the nupic.core or build your own in Python and link them into the network. So when you execute the network it will need to be able to call out to your Python generated region to execute that part of the network.
So the challenge is to build this so you can build your own region in C# or C++ or whatever as well as Python and have it executed from within the network. So every language will need to be able to allow callbacks.
I am not positive this is the way it works but reading the Network spec’s that seems to make since.
I almost have the Python support build separated from nupic.core so we can have a C++ only or C# only implementation that does not need Python to be installed. But there is a problem.
The file nupic/engine/RegionImplFactory.cpp includes nupic/regions/PyRegioin.hpp. This is so that it can create region objects that are implemented in Python. This module contains calls into the Python language interpreter. So even if you never instantiate a Python based region you still have to link with Python.lib.
This is going to take some re-design to get around this one. If anyone has any suggestions I am open to ideas.
I am taking the weekend off but I will resume work on this after Christmas.
@David_Keeney Funny, I’m working on my PyBindRegion as we speak. This is my attempt to create a PyRegion without any python c api calls but rather to use pybind11 functionality.
In regards to your question, I think you can move DynamicPythonLibrary into a separate compliation unit and then remove the PyRegion.hpp dependency. Are you trying to build a nupic.core without anything python?
Guys, I gotta admit that I have not been following your progress. But I see that it is continuing onward. I noticed you talking about serialization. If you want to simplify the build, remove capnp. It adds a TON of complexity and dependency on pycapnp, which has caused a lot of problems. If you don’t care about fast serialization, just leave it out.
@rhyolight Removing capnp serialization would indeed simplify things a lot but the current Python interface uses this rather extensively. Specifically in RegionImplFactory.cpp it uses capnp serialization to instantiate a Python implemented region. Without this, the core Cpp code could not call back into Python to execute functions in this class.
However, @chhenning is replacing the Python interface with pybind11. If successful, we could not only remove capnp but SWIG as well. What I am working on uses the SWIG and capnp interface but I would not complain about replacing the Python interface with pybind11 if it is simpler.
@heilerm For the C# interface I think the normal approach is to build a Managed C++ wrapper for the core C++ code, which C# can call directly. I have not yet started to look into how complicated this is but I think it can be done.
I don’t think any of this is ready for being put into the community gitHub yet, but we are making progress.
Guys, do you have a good idea of how to create a c# wrapper for nupic.core? Just asking since I came accross this example: https://github.com/ccerhan/LibSVMsharp
It might not be the best and is kinda old. Just as a reference.
@rhyolight@heilerm@chhenning
I updated my version of the nupic.core at https://github.com/dkeeney/nupic.core
This version builds two versions of nupic.core library. nupic_core_cpp.lib is the core library for C++ only client and does not require Python 3.x to be installed. nupic_core_py.lib is the core library containing both C++ and Python interfaces. Note that it does not build nupic_core.lib to avoid confusion with the production version.
We can now start looking at how to wrap the C++ interface with Managed C++ so it can be accessed using C#.
@heilerm@chhenning
For the C# interface, we will wrap our existing unmanaged C++ code with Managed C++ classes that are compiled with the C++/CLI switch. The ultimate authority for all things C++/CLI is the Microsoft manual https://msdn.microsoft.com/en-us/library/68td296t.aspx. However that is a bit like drinking from a fire hose.
My thinking is to add a new folder nupic.core\src\nupic\cs_support that will contain all Managed C++ wrappers, one file per managed C++ class. C# can directly link to the Managed C++ wrappers.
We will also need a test program in nupic.core\src\test\integration, probably a C# program to verify that the integration works. Ideally this should be written BEFORE we write the wrappers, using the API document as the guide.
And we will need a CMake module (comparable to nupic.core\src\NupicPythonInterface.cmake that I made for the Python build) which will build everything for the C# interface into a new dynamic library nupic_core_cs.dll that C# can link to.
We don’t need the SWIG interfaces. We can pass-through the capnproto calls for archiving serialization but I don’t think we need any new capnp definitions. We will not need anything like the PyRegion because the callback can be handled by the managed code wrappers. Since C# is primarily a Windows thing we don’t have to verify that this builds or even runs under Linux.
We will need to port at least part of the Python code in the nupic repository to C#…at least some of the sample code. The majority of the effort will be writing the wrappers. If either of you already have some Managed C++ wrappers written or would like to take on some of them, let me know so we don’t duplicate effort. I will Identify the classes that need to be wrapped based on Numenta’s API document then we can start coding.
@chhenning I think that when you have the pybind11 interface working we can fold that into the package without affecting the C# project. Remember that the Python code in the nupic repository (or at least part of it) needs to be ported to Python 3.x at some point.
Starting with the API Documentation I tried to identify those classes that are exposed at the interface. Classes that are only internal to the C++ code do not need to be exposed at the API interface. The list below is my first cut of a list of those classes that do need to be exposed. Note that some classes are still in Python so to complete the API we need to implement them in C++ so that C# will have access to them.
I welcome anyone’s input on this list. Are there more? Are there some that are not part of the API interface?
==================================================
From the viewpoint of API Documentation
These are the classes exposed to the client.
Engine
NuPIC C++
Network C++
Region C++
Dimensions C++ container
Collection C++ container
Vector C++ std:: class
ostringstream C++ std:: class
GenericRegisteredRegionImpl ?? – base class for new C# custom implemented regions
Regions (These All seem to be Python implemented classes backed by algorithm classes below)
AnomalyRegion
SPRegion
TMRegion
AnomalyLikelihoodRegion
KNNAnomalyClassifierRegion
KNNClassifierRegion
SDRClassifierRegion
Sensors
PluggableEncoderSensor (python)
RecordSensor (python)
ScalerSensor C++
ImageSensorLite C++
Encoder
Encoder (Python)
SDRCategoryEncoder (Python)
ScalerEncoder C++
DateEncoder (Python)
CoordinateEncoder (Python)
MultiEncoder (Python)
PassThroughEncoder (Python)
Algorithms
SpatialPooler C++
TemporalMemory C++
BacktrackingTM (Python)
BacktrackingTMCPP (??)
Connections C++
SDRClassifier C++
KNNClassifier (python)
ClassifierResult C++
SDRClassifier C++
Anomaly C++
AnomalyLikelihood (python)
Data
FieldMetaInfo (Python)
FileRecordStream (Python)
RecordStreamIface (Python)
StreamReader (Python)
Serializable C++ – part of capnp
Utils (parts that can be used to build custom Regions)