@csbond007, you said the problem went away after you switched to AWS Ubuntu 14 with 4 GB.
What about Ubuntu 15? Can you confirm that the tests run fine on Ubuntu 15 (no memset SEGFAULT, etc.)?
I am experiencing exact same SEGFAULT with the same backtrace and python run_nupic_tests.py -u hang behavior on Ubuntu 16.04 running in VirtualBox on OS X El Capitan. I configured it with lots of RAM - anywhere between 4GB and 12GB, and that didnât help.
The real root cause, as I discovered during my work on the manylinux nupic.bindings wheel, was due to the unanticipated confluence of runtime symbol preemption and c++ ABI incompatibility between nupic.bindings and pycapnp. I am going to document it here for âposterityâ:
When you encountered the SEGFAULT, you were running on Ubuntu 16.04, whose system headers and libraries were created using the updated c++11 ABI.
capnproto sources in pycapnp are compiled upon installation using the Ubuntu 16.04 toolchain with those new c++11 ABI
nupic.bindings was obtained either from from Numentaâs S3 or built elsewhere on either Ubuntu 12.04 or 14.04, using an older c++ ABI that is incompatible with the one on Ubuntu 16.04 (where pycapnp was built).
nupic.bindings includes its own copy of capnproto c++ sources that is very close to the one included in pycapnpâs python extension. However, nupic.bindingsâs capnproto c++ code was compiled as part of nupic.bindings using the older toolchain.
Neither pycpanp nor nupic.bindigns were hiding their symbols, so all the symbols in pycapnp and nupic.bindings extensions were public, including the similar capnproto symbols, subjecting both libraries to symbol preemption during runtime linking.
Notice in the stack trace quoted below that control from the destructor capnp::SchemaLoader::Impl::~Impl in pycapnp extension (capnp.so) is inadvertently transferred to methods compiled into the nupic.bindings extension _math.so. Recall that pycapnpâs capnp.so and nupic.bindingsâ _math.so were compiled on different platforms using incompatible c++ ABI. This explains the SEGFAULT on Ubuntu 16.04 (pycanp and nupic.bindings extensions were compiled using INCOMPATIBLE c++ ABI) and no SEGFAULT on Ubuntu 14.04 (pycanp and nupic.bindings extensions were compiled using compatible c++ ABI)
As part of the manylinux wheel effort, I have taken several steps to alleviate this issue in nupic.bindings build:
Hide all symbols in the nupic.bindings extension DSOs (except the python extension initialization function, of course) on *nix builds. This prevents unintended preemption of nupic.bindings symbols by other extensions and vise versa.
Exclude capnproto sources from nupic.bindings extensions build and forcing preload of pycapnp, thus forcing a single capnproto build to be used. This solves the âhangâ problem that both this threadâs author and I ran into. That problem resulted from an object created by pycapnpâs capnroto code (compiled on Ubuntu 16.04 with newer c++ ABI) being manipulated by nupic.bindingsâ capnproto code (compiled on a system with an older, incompatible c++ ABI). Itâs easy to see how this would lead to problems.
Link nupic.bindings extension DSOs with static libstc++. In combination with the first item above, this ensures that nupic.bindings extensions can catch exceptions whose c++ ABI changed in newer toolchains (e.g, std::ios_base::failure). See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66145 .
NOTE Step number 2 above is only a short-term solution as it relies on âpromiscuousâ behavior by a 3rd party python extension (pycapnp) that exposes all its symbols against python extension best practices. We will of course need to find a more robust solution for the long-term.