How Do Neurons Operate On Sparse Distributed Representations? A Mathematical Theory Of Sparsity, Neurons And Active Dendrites

By @subutai & @jhawkins

Please discuss this paper below.

Has this been published in the meantime? It’s be almost 2 years since the first preprint.

Not yet :frowning:

I have been working on handling reviewer feedback, which included some new derivations and running new simulations. It has been low priority (the Columns paper and associated work is higher priority) but I am slowly making progress. The main conclusions haven’t changed though.

@subutai,

I am troubled by the absence of considerations of “ambiguous input”,
i.e. overlap of multiple patterns, in this and the related spatial
pooler and sequence memory papers. Only “single pattern” inputs seem to
be considered and whether the system under investigation, single neuron,
spatial pooler or sequence memory produce the correct output for that
pattern. As per these papers, the single pattern presented on input may
have noise, defects etc. but a single pattern with noise or defects is
still a single pattern.

What about an ambiguous situation when let’s say the vision systems sees
something that could be the letter “B” or number “8” according to the
features, and only mechanisms higher up the processing chain, e.g. the
sequence memory or even higher cortical regions have the knowledge to
disambiguate between the two. Presumably the “B-or-8” pattern would be
represented as a superposition of binary vectors representing “B” and
"8", the way sequence memory has superpositions of possible next steps
(A-B-C vs. A-B-D example). Do e.g. single neurons and the spatial pooler
as a whole preserve this superposition so that it can be disambiguated
by the sequence memory.

Thanks

– Rik

Hi Rik,

Yes, this superposition of patterns is quite an important aspect of SDRs and HTM. The example you give of multiple predictions is exactly right - the TM can predict a superposition of next steps. It also shows up in pooling, where a single dendritic segment may represent many different patterns. In this case the synapses represent a superposition of all these patterns.

Many of the basic properties of SDRs are preserved but it’s easier to have mix and match errors, particularly with low dimensions.

We refer to superposition in our papers as the “union” property. In [1] it’s in Section 3.3, from the viewpoint of a segment that has learned a bunch of patterns. The same equations apply if you have a superposition of multiple input patterns. In [2] it’s discussed in Section 2G & H. It’s similar to the way Bloom filters work.

The TM can preserve this property (even for pretty large unions) and carry multiple predictions forward in time, but the SP doesn’t always preserve it if you have more than a few input patterns superimposed.

Thanks,

–Subutai

Hi @subutai,

Thanks, the “[2]” means which paper? I can’t find a section 2G & H in
either the spatial pooler or temporal memory paper.

In any case, “SP doesn’t always preserve it if you have more than a few
input patterns superimposed”, I like to see this experimentally verified
and quantified. I’ll look into doing this at some point. Surely this
ability to preserve superpositions depends on the model parameters
chosen. Here again we need a quantitative model – which will
materialize one day – where we can just compute the former given the
latter.

This is in the context of broadening the HTM model beyond brains into
machines and other territory. E.g. machine intelligence applications
might have a need to detect relevant patterns in thick soup of plausible
but ultimately irrelevant patterns. Think of following a conversation at
a cocktail party where all sorts of other conversations are going on.

Regards

– Rik

Sorry - forgot the references. I was referring to this one on arXiv:

Ahmad, S. & Hawkins, J. (2015) Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory.

I agree - it would be interesting to see it quantified. I think the main reason is that the SP is a competitive process that selects the best matching columns. If you are only selecting the top 40 out of 2048 (for example) you don’t have room for including too many superpositions. (You’d like to allocate at least 15-20 bits per pattern.) Yes, you may be able to improve this by changing some of these numbers.

Could somebody please elucidate equation (5) for me?

The paper says (page 5):

Eq. (4) computes the probability of a false positive, but what about false negatives? If a pattern
corresponding to one stored on a segment is corrupted with noise, it is possible that fewer than \theta synapses will overlap. Assume a dendritic segment D represents a subsample of some presynaptic activity pattern A_{t} using s synapses. Let A^{*}_{t} represent a corrupted version of A_{t} such that v of the ON bits are now off. If v is sufficiently small, i.e. v \leq s - \theta, the probability of a false negative is 0. As v increases the probability of a false negative, i.e. A^{*}_{t} \cdot D < \theta, increases. We can compute the probability of such false negatives in a similar manner as above, by using overlap sets. The number of vectors A^{*}_{t} that have exactly b bits of overlap with D is:

\lvert \Omega_{D}(a_{t},v,b)\rvert = {s \choose b} \times {{a_{t} -s} \choose {v-b}} (5)

I can’t see why that equation should hold. For instance, if a_{t} = 1000, s = 20, v = 1, and b = 20, is the number of vectors A^{*}_{t} with exactly 20 bits of overlap with D equal to {20 \choose 20} \times {{980} \choose {-19}}?

edit: change the equation number at the top of post