Segmentation Fault while running basic swarm

There is definitely something wrong with your NuPIC installation. I know you are on Ubuntu 15, and you said you were using gcc 4.7. Are you on a VM? At this point, I would suggest that you start fresh using the latest NuPIC code that works with gcc 5.2.

You are ahead of me, so I don’t have a valid wiki document on how to install on Ubuntu 15. You could use this one but ignore the stuff about switching GCC versions. If you get any errors during the installation at all, stop and paste them into pastebin or gist (not in this thread) and link to them here. I’ll try to help.

I am on gcc 5.2 now. installation went fine without any errors. cloned nupic and nupic.core

but when i did, i still get segmentation fault

cd examples/opf/clients/hotgym/simple
python hotgym.py

@csbond007 I just made some tests on Ubuntu 15.10 with gcc 5.2.1 and Python 2.7.10 and on Ubuntu 16.04 with gcc 5.3.1 and Python 2.7.11+. No segfaults at all.

I’ve created a gist with all required installation steps:

https://gist.github.com/hstm/b375369a22dc63c7a9d469da160f8adc

They are mostly taken from @rhyolight 's instructions (https://github.com/numenta/nupic/wiki/Compiling-NuPIC-on-Ubuntu-14#install-1).

If you follow them carefully step by step, everything should be fine.

python $NUPIC/scripts/run_nupic_tests.py -u

The unit tests is getting stuck here :tests/unit/nupic/encoders/random_distributed_scalar_test.py:437: RandomDistributedScalarEncoderTest.testVerbosity PASSED

OK. Please execute this in your terminal:

gdb -ex r --args python $NUPIC/scripts/run_nupic_tests.py -u

If the process hangs or segfaults, we should get some information.

Please follow @rhyolight 's advice and paste the output into pastebin or gist (not in this thread).

I had success on Ubuntu server 16.04 and mysql 5.7.12 with the swarming tests, using pull request https://github.com/numenta/nupic.core/pull/984.

  1. Run the swarming tests

     $ ./scripts/run_nupic_tests.py -w
    
  2. run hotgym

~/nta/nupic/examples/opf/clients/hotgym/simple$ python hotgym.py
INFO:__main__:After 100 records, 1-step altMAPE=23.183145
INFO:__main__:After 200 records, 1-step altMAPE=21.548877
INFO:__main__:After 300 records, 1-step altMAPE=21.227594
INFO:__main__:After 400 records, 1-step altMAPE=20.686270
INFO:__main__:After 500 records, 1-step altMAPE=20.417234
INFO:__main__:After 600 records, 1-step altMAPE=20.852339
INFO:__main__:After 700 records, 1-step altMAPE=20.907660
INFO:__main__:After 800 records, 1-step altMAPE=21.137106
INFO:__main__:After 900 records, 1-step altMAPE=20.875763
INFO:__main__:After 1000 records, 1-step altMAPE=20.789707
~/nta/nupic/examples/opf/clients/hotgym/simple$ echo $?
0

@rhyolight, I moved the conversation about mysql user configuration and NTA_CONF_PATH to its own thread: When swarming, got ERROR 1698 (28000): Access denied for user 'root'@'localhost' mysql

I am running on a AWS free tier Ubuntu box. OS version is Ubuntu 14.04.4 LTS
Default python version was 2.7.6.

Failed building wheel for pycapnp

sudo apt-get update -y
sudo apt-get install git g++ cmake python-dev -y
git clone https://github.com/numenta/nupic.core.git
git clone https://github.com/numenta/nupic.git
export NUPIC=$HOME/nupic

I don’t think this is necessary, because the current location of the nupic-default.xml file is the default location NuPIC will look for it. I remember updating this a couple years ago. Hopefully this is still the case?

I found MySQL mentioned on the Running Swarms page, and there is also a MySQL Settings page. Maybe we need something more comprehensive?

Tried with 2.7.11 also
sudo add-apt-repository ppa:fkrull/deadsnakes-python2.7
sudo apt-get update
sudo apt-get upgrade

ubuntu@ip-172-31-12-69:~$ python --version
Python 2.7.11

on running curl https://bootstrap.pypa.io/get-pip.py | sudo python
i am getting

The directory ‘/home/ubuntu/.cache/pip/http’ or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo’s -H flag.

Same errors as before

@csbond007 From your logs:

virtual memory exhausted: Cannot allocate memory

You need more RAM!

AWS Ubuntu 14 – @rhyolight – wil it work with 2 GB RAM ?

Ubuntu 15 – @helge – gdb -ex r --args python $NUPIC/scripts/run_nupic_tests.py -u
even with this it hangs at

tests/unit/nupic/encoders/random_distributed_scalar_test.py:437: RandomDistributedScalarEncoderTest.testVerbosity PASSED

@helge

kaustavsaha@ubuntu-precision-server:~/nupic/examples/opf/clients/hotgym/simple$ gdb -ex r --args python hotgym.py -u
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10

This GDB was configured as “x86_64-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from python…(no debugging symbols found)…done.
Starting program: /usr/bin/python hotgym.py -u
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7ffff25cb700 (LWP 25223)]
[New Thread 0x7ffff1dca700 (LWP 25224)]
[New Thread 0x7fffed5c9700 (LWP 25225)]

Program received signal SIGSEGV, Segmentation fault.
__memset_sse2 () at …/sysdeps/x86_64/multiarch/…/memset.S:78
78 …/sysdeps/x86_64/multiarch/…/memset.S: No such file or directory.
(gdb)

@helge

The backtrace

I don’t know. I would try with 4GB. The segfault is very likely because of this installation problem, so focus on that first.

Select at least an m3.medium instance (Micro instance fails to compile due to insufficient memory)

An m3.medium has 3.75 GB RAM

1 Like

@rhyolight AWS Ubuntu 14 Instance with 4 GB Ram is running fine now. Including basic swarm test.:slight_smile: Thanks a lot !

@helge I have a Ubuntu 15 server with 16 GB RAM. Memory should not be a issue for that I guess.

1 Like

@csbond007, you said the problem went away after you switched to AWS Ubuntu 14 with 4 GB.

What about Ubuntu 15? Can you confirm that the tests run fine on Ubuntu 15 (no memset SEGFAULT, etc.)?

I am experiencing exact same SEGFAULT with the same backtrace and python run_nupic_tests.py -u hang behavior on Ubuntu 16.04 running in VirtualBox on OS X El Capitan. I configured it with lots of RAM - anywhere between 4GB and 12GB, and that didn’t help.

The real root cause, as I discovered during my work on the manylinux nupic.bindings wheel, was due to the unanticipated confluence of runtime symbol preemption and c++ ABI incompatibility between nupic.bindings and pycapnp. I am going to document it here for “posterity”:

  1. When you encountered the SEGFAULT, you were running on Ubuntu 16.04, whose system headers and libraries were created using the updated c++11 ABI.
  2. capnproto sources in pycapnp are compiled upon installation using the Ubuntu 16.04 toolchain with those new c++11 ABI
  3. nupic.bindings was obtained either from from Numenta’s S3 or built elsewhere on either Ubuntu 12.04 or 14.04, using an older c++ ABI that is incompatible with the one on Ubuntu 16.04 (where pycapnp was built).
  4. nupic.bindings includes its own copy of capnproto c++ sources that is very close to the one included in pycapnp’s python extension. However, nupic.bindings’s capnproto c++ code was compiled as part of nupic.bindings using the older toolchain.
  5. Neither pycpanp nor nupic.bindigns were hiding their symbols, so all the symbols in pycapnp and nupic.bindings extensions were public, including the similar capnproto symbols, subjecting both libraries to symbol preemption during runtime linking.
  6. Notice in the stack trace quoted below that control from the destructor capnp::SchemaLoader::Impl::~Impl in pycapnp extension (capnp.so) is inadvertently transferred to methods compiled into the nupic.bindings extension _math.so. Recall that pycapnp’s capnp.so and nupic.bindings’ _math.so were compiled on different platforms using incompatible c++ ABI. This explains the SEGFAULT on Ubuntu 16.04 (pycanp and nupic.bindings extensions were compiled using INCOMPATIBLE c++ ABI) and no SEGFAULT on Ubuntu 14.04 (pycanp and nupic.bindings extensions were compiled using compatible c++ ABI)

Simple as that! :wink:

1 Like

As part of the manylinux wheel effort, I have taken several steps to alleviate this issue in nupic.bindings build:

  1. Hide all symbols in the nupic.bindings extension DSOs (except the python extension initialization function, of course) on *nix builds. This prevents unintended preemption of nupic.bindings symbols by other extensions and vise versa.
  2. Exclude capnproto sources from nupic.bindings extensions build and forcing preload of pycapnp, thus forcing a single capnproto build to be used. This solves the “hang” problem that both this thread’s author and I ran into. That problem resulted from an object created by pycapnp’s capnroto code (compiled on Ubuntu 16.04 with newer c++ ABI) being manipulated by nupic.bindings’ capnproto code (compiled on a system with an older, incompatible c++ ABI). It’s easy to see how this would lead to problems.
  3. Link nupic.bindings extension DSOs with static libstc++. In combination with the first item above, this ensures that nupic.bindings extensions can catch exceptions whose c++ ABI changed in newer toolchains (e.g, std::ios_base::failure). See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66145 .

NOTE Step number 2 above is only a short-term solution as it relies on “promiscuous” behavior by a 3rd party python extension (pycapnp) that exposes all its symbols against python extension best practices. We will of course need to find a more robust solution for the long-term.

Hope this helps someone.

1 Like

Thank you, Vitaly! Your work is going to make it easier to install NuPIC on many systems for many people.