Segmentation Fault while running basic swarm

@rhyolight AWS Ubuntu 14 Instance with 4 GB Ram is running fine now. Including basic swarm test.:slight_smile: Thanks a lot !

@helge I have a Ubuntu 15 server with 16 GB RAM. Memory should not be a issue for that I guess.

1 Like

@csbond007, you said the problem went away after you switched to AWS Ubuntu 14 with 4 GB.

What about Ubuntu 15? Can you confirm that the tests run fine on Ubuntu 15 (no memset SEGFAULT, etc.)?

I am experiencing exact same SEGFAULT with the same backtrace and python run_nupic_tests.py -u hang behavior on Ubuntu 16.04 running in VirtualBox on OS X El Capitan. I configured it with lots of RAM - anywhere between 4GB and 12GB, and that didn’t help.

The real root cause, as I discovered during my work on the manylinux nupic.bindings wheel, was due to the unanticipated confluence of runtime symbol preemption and c++ ABI incompatibility between nupic.bindings and pycapnp. I am going to document it here for “posterity”:

  1. When you encountered the SEGFAULT, you were running on Ubuntu 16.04, whose system headers and libraries were created using the updated c++11 ABI.
  2. capnproto sources in pycapnp are compiled upon installation using the Ubuntu 16.04 toolchain with those new c++11 ABI
  3. nupic.bindings was obtained either from from Numenta’s S3 or built elsewhere on either Ubuntu 12.04 or 14.04, using an older c++ ABI that is incompatible with the one on Ubuntu 16.04 (where pycapnp was built).
  4. nupic.bindings includes its own copy of capnproto c++ sources that is very close to the one included in pycapnp’s python extension. However, nupic.bindings’s capnproto c++ code was compiled as part of nupic.bindings using the older toolchain.
  5. Neither pycpanp nor nupic.bindigns were hiding their symbols, so all the symbols in pycapnp and nupic.bindings extensions were public, including the similar capnproto symbols, subjecting both libraries to symbol preemption during runtime linking.
  6. Notice in the stack trace quoted below that control from the destructor capnp::SchemaLoader::Impl::~Impl in pycapnp extension (capnp.so) is inadvertently transferred to methods compiled into the nupic.bindings extension _math.so. Recall that pycapnp’s capnp.so and nupic.bindings’ _math.so were compiled on different platforms using incompatible c++ ABI. This explains the SEGFAULT on Ubuntu 16.04 (pycanp and nupic.bindings extensions were compiled using INCOMPATIBLE c++ ABI) and no SEGFAULT on Ubuntu 14.04 (pycanp and nupic.bindings extensions were compiled using compatible c++ ABI)

Simple as that! :wink:

1 Like

As part of the manylinux wheel effort, I have taken several steps to alleviate this issue in nupic.bindings build:

  1. Hide all symbols in the nupic.bindings extension DSOs (except the python extension initialization function, of course) on *nix builds. This prevents unintended preemption of nupic.bindings symbols by other extensions and vise versa.
  2. Exclude capnproto sources from nupic.bindings extensions build and forcing preload of pycapnp, thus forcing a single capnproto build to be used. This solves the “hang” problem that both this thread’s author and I ran into. That problem resulted from an object created by pycapnp’s capnroto code (compiled on Ubuntu 16.04 with newer c++ ABI) being manipulated by nupic.bindings’ capnproto code (compiled on a system with an older, incompatible c++ ABI). It’s easy to see how this would lead to problems.
  3. Link nupic.bindings extension DSOs with static libstc++. In combination with the first item above, this ensures that nupic.bindings extensions can catch exceptions whose c++ ABI changed in newer toolchains (e.g, std::ios_base::failure). See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66145 .

NOTE Step number 2 above is only a short-term solution as it relies on “promiscuous” behavior by a 3rd party python extension (pycapnp) that exposes all its symbols against python extension best practices. We will of course need to find a more robust solution for the long-term.

Hope this helps someone.

1 Like

Thank you, Vitaly! Your work is going to make it easier to install NuPIC on many systems for many people.