Project to compare mraptor's bbHTM to biology

@robf brought up an idea in HTM Theory that I thought might make a good programming project, so sharing here.

Food for thought. If you read my explanation how I implemented Spatial Mapper : http://ifni.co/spatial_mapper.html

you will probably have to look for both biological equivalents of UNION-SDR’s and HAMMING DISTANCE inspired “search”.

The Hamming distance is very straight forward and does almost cut it, if you count each bit as equal.

Have you looked at any options to the Hamming Distance?

Hi @mraptor ! I read your bbHTM write-up. Very nice. Well done!

There’s one point, however, that struck me as potentially oversimplified: to avoid the memory consumption problem you reduce the permanence from a scalar to a binary value. A synapse is either “ON” or “OFF”. That is in contrast to the theory/biology where a permanence represents stages of growth of the synapse and its value is incremented and decremented using a Hebian-like rule. If you reduce the permanence to binary then there are no stronger and weaker connections (synapses), all are the same. For humans, however, repetition obviously is key to learning, there must be some kind of strengthening and weakening of connections/synapses.

So this simplification to me seems rendering the whole algorithm ineffective, and we probably need to come up with some other way of addressing the memory issue.

Second, you mention that your tests indicate that your HTM “works”. As I’m coding HTM myself I’m wondering what methods and what data do you use for testing and what result indicates that the code “works”.

Thanks.

Matt

You are right it is ON/OFF thing … my speculation/simplification above Numenta ideas is that I do the stepwise-hebian learning on “whole-dendrite-group-level” rather than on per “single-synapse”. if I can call it that :wink:
With the UNION-SDR case I can only learn, unlearning I hoped to do randomly (or later figure out something else). Union-SDR works because the transitions are self-selective and not that many and one-step transitions.

But now doing Spatial Mapper showed me I can use HAMMING distance (by switching bits as they drift towards the data) to do “hebian-group” learning and unlearning. (will try it in the coming days if this HD idea works).

As for testing the only way I test for now is how well it is following the source-signal with one-step ahead prediction. (If you look at the images I do : MAE,RMSE,MAPE,NLL and R2 scores, you can look at the stats.py module for details)

I would love if somebody explain how they test !? and what they test. How you test ?

Once I got more things done I wan’t to figure out how to do multi-step predictions. How do you do it ?

Matt, mraptor,

I think you are both making a mistake in your conception of what the TM does, and should be doing on this weights issue which Matt has raised. You are transferring the thinking habits of Deep Learning, Hebbian rules etc, where abstraction is essential the whole algorithm, and partial connection weights are necessary to capture abstraction.

But the TM is not an abstraction. It is a memory. It simply states if it has seen a sequence or not.

The implementation of “permanence” in Numenta’s conception is a partial remembering. It is not the same thing as NN connection weights, which are all about abstractions.

Partial remembering is not crucial to a transition memory application. If we had a big enough memory we could remember everything without a loss of performance. This is different to a NN connection weight, where the weight is essential to express abstractions, which are always partial.

I think HTM specifically excludes synapse weights. That’s why they call it permanence. Perhaps because Jeff observed in the biology that this was the case(?) And I believe the biology is directing us correctly on this point. It is one of the places where HTM is superior to NN’s and DL. There are theoretical reasons why I believe TM should be on or off, and not partial (You can’t have partially seen something. It is essential to the memory idea that it be on or off.)

As far as testing goes, if it remembers a sequence, it is working.

There is no implementation of generalization yet. Unless it is pushed into the SDR by the spatial pooler. Then you would just be testing how well the spatial pooler generalizes a state, and thus generalizes the predictions of that state.

To test this, I think @mrcslws did some nice tests on Hot Gym data (which showed some simple stats did better than the spatial pooler for that task.)

Hi @robf

I don’t understand how a binary permanence would suffice.

Example: Say you feed the sequence “a b” 10 times, then you feed the sequence “a c” 2 times. If you then feed “a” what should the prediction be?

If the permanence is skalar, the prediction would be “b” because the “a->b” connection/synapse is stronger. If the permanence is binary, predictions for “b” and “c” would be similar probable because both connections/synapses are “ON”.

Pls enlighten me if I’m missing something :slight_smile:

Matt

b or c :slight_smile: (@rhyolight how do you get rid of those goddam kindergarten emoticons? I googled it and only found the Discourse developers talking about how much people hate them, but no solution.)

Is that the way it works? I don’t think it should. I think, I hope, you are wrong that permanences compete.

Or maybe it is OK if they do. It would just be the implementation of a simple probabilistic prediction about the likelihood of states.

I don’t know, if they do that over all the columns of an SDR it might create interesting effects. A mixture of the probability of a sequence of columns, and the spatial pooler overlap of b and c (and eventually other symbols with overlapping columns.)

But generalizing using probabilities is ugly. I hope they haven’t sunk to that.

If they have just a pure transition memory that will be much better.

A truly distributed generalization can be in the SDR. Or it can be across connections in the TM as I believe it should be. I certainly don’t think it should be simple probability, like an old fashioned Markov model.

Wouldn’t that mean you would simply never see a black swan?

So I’m asserting that is the way it should be. The memory should be that both sequences are possible. Then you can generalize about them according to overlaps in the SDR’s, or looking at the network connectivity (what I want to do: generalizing over groupings of nodes which are richly connected.)

I’m hoping Numenta settled on that. That would be why they are called permanences. Otherwise they would have called them likelihoods. Surely?

Even if they do have them competing probabilistically, we should not. It’s not biological, and it is not useful (except most of the time :slight_smile: (Ugh, that cartoon thing again. You can’t even use it to close parentheses.)

Fixed in all future posts. :slight_smile:

Hi @robf

As far as I understand HTM and NUPIC, the permanence is definitely skalar not binary. Here’s an extract from Jeff’s white paper[1]:

We assign each potential synapse a scalar value called “permanence” which represents stages of growth of the synapse. A permanence value close to zero represents an axon and dendrite with the potential to form a synapse but that have not commenced growing one. A 1.0 permanence value represents an axon and dendrite with a large fully formed synapse. … The permanence value is incremented and decremented using a Hebbian-like rule. If the permanence value exceeds a threshold, such as 0.3, then the weight of the synapse is 1, if the permanence value is at or below the threshold then the weight of the synapse is 0.

So, yes, in the end the synapse is either ON/1 or OFF/0 but that switch depends on reaching a defined threshold. And for that we need to keep track of the skalar value which causes the memory consumption issue that @mraptor elude to.

Here are the corresponding parameters in the current NUPIC code [2]:

      @param synPermInactiveDec The amount by which the permanence of an
            inactive synapse is decremented in each learning step.
      @param synPermActiveInc The amount by which the permanence of an
            active synapse is incremented in each round.
      @param synPermConnected The default connected threshold. Any synapse
            whose permanence value is above the connected threshold is
            a "connected synapse", meaning it can contribute to
            the cell's firing.

[1] Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex, p.6, http://arxiv.org/pdf/1511.00083.pdf
[2] https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/SpatialPooler.hpp

1 Like

I agree Matt. My only argument is about the function of that scalar value.

My understanding is that it is a memory control mechanism (ironically.) Otherwise you will remember everything you see. It does not affect the function of the TM. The function of the TM should not be affected by binary activations, only its sensitivity to new memories.

Right. In the end it is either on or off. So in the end it is the same thing.

Having scalar permanence only stops the system reacting to every transient circumstance. As if you remembered the exact cloud pattern when you stepped out the door this morning. Not useful.

But as a test bed it is not going to matter. The system will work the same, just remember more quickly.

That’s the way I think it should work, anyway.

Hi @mraptor

I took a second look at your matrices for modelling the distal synapses in the TM. In your example of a region with 3 columns and 2 cells per column you use a 6x6 matrix for full connectivity. That means you also model

  • cells linking to themselves as well as
  • cells linking to other cells in the same column.

I’d think that both cases should be excluded from the range of potential synapses since by definition distal synapses process horizontal input (i.e. coming from other columns).

Yes that is true :slight_smile: that is why I have the option “loopy_cols” that cleans those connections before processing, if you wish to.
But generally I don’t use it. Don’t have the time to do detailed testing yet, but the option is there if you want to get more brain-like behaviour.

For the limited tests I’ve done it makes no noticeable change. If you switch your thinking cap from neuron&synapses to hamming-space&sdr-union I think you can ignore it.
My idea with the whole project is to go for generalization and abstractions until it breaks and then work backwards adding the complications of more believable brain constraints later.
Every complication slows the code and Python is slow enough already :slight_smile:

Hope that make sense.

Yes, makes sense. :+1: I’m kind of doing the same, using C. I’m currently looking at the model design and its performance implications. Since I haven’t run your htm, could you share some of the performance statistics, e.g.

  • how many columns/dendrites/synapses max do you use?
  • how large (memory consumption) is your network then?
  • how long (processing time) does it take to process a certain data set (e.g. NYC-Taxi)?

HTM is quite memory and computation (looping) intensive so I’m wondering whether a python implementation can be large and fast enough for meaningful tests with larger datasets.

Normally a region of 5x300 works fine … processes 1000 points of NYT in 30-40 secs in my slow 5Y71 CPU, process memory is about 200mb but this is mostly Python the binary matrix is small (1500x1500/8 bytes) i.e. adding more TM’s won’t consume much more memory, but it is primary CPU intensive.

 : dt.tests['5x300'].tm.memory.mem.kbytes
 : 274.658203125

And by the way I don’t swarm I just run it, there are not much options to tune yet, that will probably change when I add more modules.
(Still running on 1 CPU, looking to do multi-process when I start running several of them, It sucks there is no good Actor-like multiprocessing framework for Python.I even toyed with the idea of re-implementing the whole thing under Erlang/Elixir VM, but then I would have to figure how to adapt some C/C++ bitarray lib., arghh. )

BTW 5x300 works, because this means TM row (dendrites) is 1500 bits wide union, which is the lower bound of workable SDR size. I meant input is 300 bits then TM bitarray row is 1500 bits.
5x400 is 2000 SDR.

BTW, I’m not using Cython yet which will probably speed things alot and my current code is probably ripe for optimization at some point when I’m sure the whole thing work.

1 Like

I would love somebody who runs NuPic to provide similar statistics ?! To know what to strive for ! Anyone ?

I now figured out a mechanism to do slow learning and unlearning … it will be available in the next iteration of the software. It is still on/off synapse thing but it is partial when looked on all dendrites as whole :wink:

Keep in mind that NuPIC contains both python and C++ implementations of all HTM algorithms. We use C++ when we need speed.

Yes, Matt. I was referring to @mraptor’s implementation.

As for Nupic, I’m wondering what part of the algorithm is actually done in C++.

Also, when I look at the code of the TM implementation I don’t see where it leverages C++. Or did I miss sth?

@mraptor

Normally a region of 5x300 works fine … processes 1000 points of NYT in 30-40 secs in my slow 5Y71 CPU, process memory is about 200mb but this is mostly Python the binary matrix is small (1500x1500/8 bytes) i.e. adding more TM’s won’t consume much more memory, but it is primary CPU intensive.

Thanks for the info. As soon as mine is up and running I can share some similar numbers.