Inconsistency between SpatialPooler in nupic and nupic.bindings

I’m doing very first steps with nupic.
I installed nupic and nupic.bindings for Python27 (x86) via pip on Windows 10 x64 machine.
I wrote very simple program for testing SpatialPooler inspired by this example: http://nbviewer.jupyter.org/github/numenta/nupic/blob/master/examples/NuPIC%20Walkthrough.ipynb.
Here it is:

import sys
from optparse import OptionParser
import numpy as np

from nupic.encoders import CategoryEncoder

def importAlgorithms(research):
    global SpatialPooler
    if(research):
        from nupic.research.spatial_pooler import SpatialPooler
    else:
        from nupic.bindings.algorithms import SpatialPooler
    

INPUT_DIMENSIONS = 15
INPUT_WIDTH = 3
COLUMNS_COUNT = 4
ACTIVE_PCT = 25
ACTIVE_COUNT = COLUMNS_COUNT * ACTIVE_PCT / 100
TRAIN_STEPS = 200

def train(sp, inputs, enc, count):
    for i in inputs:
        for step in xrange(count):
            output = np.zeros((COLUMNS_COUNT,), dtype="int")
            sp.compute(enc.encode(i), True, output)

def printConnections(headerMessage, sp):
    print headerMessage
    for column in xrange(COLUMNS_COUNT):
        connected = np.zeros((INPUT_DIMENSIONS,), dtype="int")
        sp.getConnectedSynapses(column, connected)
        print "col{}\t{} ({})".format(column, connected, np.sum(connected))


def main(argv = None):
    if(argv is None):
        argv = sys.argv

    parser = OptionParser()
    parser.add_option("-r", "--research", action="store_true", dest="research", help="do not use cpp bindings", default=False)
    (options, args) = parser.parse_args(argv[1:])

    importAlgorithms(options.research)

    inputs = ("cat", "dog", "monkey", "loris")
    enc = CategoryEncoder(w=INPUT_WIDTH, categoryList=inputs, forced=True)
    sp = SpatialPooler(inputDimensions = (INPUT_DIMENSIONS,),
                       columnDimensions = (COLUMNS_COUNT,),
                       potentialRadius = INPUT_DIMENSIONS,
                       numActiveColumnsPerInhArea = ACTIVE_COUNT,
                       globalInhibition = True,
                       synPermActiveInc = 0.03,
                       potentialPct = 1.0)

    printConnections("Connections before training:", sp)
    train(sp, inputs, enc, TRAIN_STEPS)

    print
    for i in inputs:
        columns = np.zeros((COLUMNS_COUNT,), dtype="int") 
        sp.compute(enc.encode(i), False, columns)
        print "{}:\t{} {}".format(i, enc.encode(i), columns)

    print
    printConnections("Connections after training:", sp)

    return 0

if __name__ == "__main__":
    main()

My problem is that when I run it with SptialPooler from numenta.research.spatic_pooler (-r flag) I get expected result:

cat:    [0 0 0 1 1 1 0 0 0 0 0 0 0 0 0] [0 1 0 0]
dog:    [0 0 0 0 0 0 1 1 1 0 0 0 0 0 0] [0 0 0 1]
monkey: [0 0 0 0 0 0 0 0 0 1 1 1 0 0 0] [1 0 0 0]
loris:  [0 0 0 0 0 0 0 0 0 0 0 0 1 1 1] [0 0 1 0]

Connections after training:
col0    [0 0 0 0 0 0 0 0 0 1 1 1 0 0 0] (3)
col1    [0 0 0 1 1 1 0 0 0 0 0 0 0 0 0] (3)
col2    [0 0 0 0 0 0 0 0 0 0 0 0 1 1 1] (3)
col3    [0 0 0 0 0 0 1 1 1 0 0 0 0 0 0] (3)

But without -r option (i.e. using SptialPooler from nupic.bindings.algorithms) SP is not learning correctly and I get very wrong results:

cat:    [0 0 0 1 1 1 0 0 0 0 0 0 0 0 0] [0 0 0 1]
dog:    [0 0 0 0 0 0 1 1 1 0 0 0 0 0 0] [1 0 0 0]
monkey: [0 0 0 0 0 0 0 0 0 1 1 1 0 0 0] [1 0 0 0]
loris:  [0 0 0 0 0 0 0 0 0 0 0 0 1 1 1] [1 0 0 0]

Connections after training:
col0    [1 1 0 0 0 1 0 0 1 1 0 0 0 1 0] (6)
col1    [1 0 1 0 1 1 1 1 0 1 1 0 0 0 1] (9)
col2    [1 1 0 1 0 1 0 0 0 0 0 0 0 0 1] (5)
col3    [0 0 0 1 1 1 1 1 0 1 1 1 0 0 0] (8)

What am I doing wrong there?

1 Like

I would like to know what version of NuPIC you are running. Please run this command from the command line to find out and paste your results here.

python -c 'import pkg_resources;n=pkg_resources.get_distribution("nupic");b=pkg_resources.get_distribution("nupic.bindings");print n.project_name, n.version;print b.project_name, b.version'
nupic.bindings 0.4.10```

When I run your program I get this:

Connections before training:
col0	[4294967296 4294967296 4294967297 4294967296 4294967296          1
 4294967297          0          0          0          0          0
          0          0          0] (25769803779)
col1	[         0 4294967297 4294967296 4294967296          0 4294967296
 4294967297          0          0          0          0          0
          0          0          0] (21474836482)
col2	[         0 4294967297 4294967297 4294967297 4294967297          0
          0          1          0          0          0          0
          0          0          0] (17179869189)
col3	[         0          0 4294967296          1 4294967296 4294967297
          0          0          0          0          0          0
          0          0          0] (12884901890)

cat:	[0 0 0 1 1 1 0 0 0 0 0 0 0 0 0] [0 1 0 0]
dog:	[0 0 0 0 0 0 1 1 1 0 0 0 0 0 0] [0 1 0 0]
monkey:	[0 0 0 0 0 0 0 0 0 1 1 1 0 0 0] [0 1 0 0]
loris:	[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1] [0 1 0 0]

Connections after training:
col0	[4294967296 4294967296 4294967297 4294967296 4294967296 4294967297
 4294967297          0          0          0          0          0
          0          0          0] (30064771075)
col1	[         0 4294967297 4294967296 4294967296          1 4294967297
 4294967297          1          0          0          0          0
          0          0          0] (21474836485)
col2	[         0 4294967296 4294967297 4294967296 4294967297 4294967297
          1          1          0          0          0          0
          0          0          0] (21474836485)
col3	[         0          0 4294967296          1 4294967296 4294967297
          0          0          0          0          0          0
          0          0          0] (12884901890)

The 4294967296 value is suspicious because it is 2^32 which means its probably the max number value. Math error somewhere? You run this program in Windows, right?

Hm, interesting, I don’t have those MAXINT values, only zeroes and ones there.
Yes I run it in Windows.
I have numpy 1.9.2
Do you get those 2^32 values if you run it with -r flag?
They are filled by this code:

    for column in xrange(COLUMNS_COUNT):
        connected = np.zeros((INPUT_DIMENSIONS,), dtype="int")
        sp.getConnectedSynapses(column, connected)
        print "col{}\t{} ({})".format(column, connected, np.sum(connected))

which looks like they should be zeroed and then filled by SpatialPooler.getConnectedSynapses

Interesting, I get similar problem (plenty of 4294967297 4294967296) if I change dtype="int" to dtype="int64" everywhere (I run it in Win32 Python).
But I still have my original problem if I explicitly specify int32 or uint32.
I’m going to try 64bit Python version.

Hm, no, running in 64bit Python doesn’t change anything

Wondering if any other @committer has some insight on this problem? :confounded:

But can you confirm that c++ and python SPs behave differently in your environment as well (i.e. it is not my environment specific)?

The code is very simple and just trains 4 columns to 4 input categories.
With python SP it works perfect, while changing to c++ SP doesn’t work the same way (connections change somehow but not as expected).

So I assume that it can be related to either how bindings are passing agrs (including numpy arrays) or how c++ SP is behaving compared to python implementation.

I would say that it can be related to how I write python code but it works with python SP ¯\_(ツ)_/¯

I also have the 4294967297, etc., and the issue goes away when I change the dtypes to ‘int32’. So it happens when numpy maps ‘int’ to ‘int64’. So our documentation probably shouldn’t recommend using “int”.

Anyway, the main issue is that our encoders use a numpy array type of ‘uint8’: https://github.com/numenta/nupic/blob/6924a904644bbb8cdba6ca7a62050cf8456614f8/src/nupic/encoders/base.py#L30

So our encoders don’t create outputs that are ready for the C++ SpatialPooler. If you create your own numpy array of type ‘uint32’ and use encoder.encodeIntoArray, it will work.

(This is strange – I think that line in ‘base.py’ should be

defaultDtype = numpy.uint32

).

A minor third issue: Glancing at your code, you’re going to have different synPermInactiveDec parameters for the two implementations because of https://github.com/numenta/nupic.core/issues/1104

1 Like

My humble view about the problem is same as you have said. I don’t think 1 and 0 are any longer enough to produce the data we seek

Thanks!
Indeed this works like a charm:

also thanks for the hint about difference in some parameters between implementations

:bow:

2 Likes

More like 1’s where there should be 0’s, due to bad type conversion :stuck_out_tongue:

I would like to know how an encoder works from you, if you don’t mind
:slight_smile:

Sure. Rather than hijacking this thread though, if you start your own I’ll be happy to post some info and talk about encoders.

1 Like

I already know how an encoder works. Converting input from number of lines to a selected number of o/p lines. I was asking an htm encoder differs or not?

@sea if I change all dtype="int" to dtype="uint32" it works fine

Mac OSX 10.13
nupic 1.0.3
nupic.bindings 1.0.2

We just added stricter SP input type checking in nupic.bindings 1.0.3, I suggest everyone upgrade.