Help explain the probabilty of a false positive classification of a list of SDR vectors

hsm207 · November 4, 2018, 8:49am

Hello,

I don’t understand equation 9 in SDR chapter of the BAMI book:

My expectation of what equation 9 is supposed to calculate is that if for example we have a list of SDRs, X, with only two elements x1 and x2, then equation 9 is supposed to calculate the probability that a random SDR y will match any of the SDRs in X.

This means that we should calculate:

Probability false positive between x1 and y + Probability false positive between x2 and y

But from the way equation 9 is written, it looks to me that it is calculating:

Probability of no false positive between x1 and y + Probability of no false positive between x2 and y

Can someone please explain the intuition behind equation 9 and illustrate how it works using a small example?

rhyolight · November 5, 2018, 4:42pm

The formula for probability of a false positive is basically its overlap set divided by its uniqueness. You can see this clearly in the HTM School code that runs behind this episode.

github.com

htm-community/htm-school-viz/blob/master/static/js/ep2/matching.js#L26


var rightColor = "green";
var size = 40;




function sdrsMatch() {
    return SDR.tools.population(SDR.tools.overlap(sdr, noisySdr)) >= theta;
}


function updateDisplayValues() {
    var overlapSet = SDR.tools.getOverlapSet(sdr, theta, w);
    var falsePositiveChance = overlapSet / SDR.tools.getUniqueness(sdr);
    $('#false-positive-display').html(falsePositiveChance);
    $('#sparsity-display').html();
    $('#noise-display').html(noise);
    $('#theta-display').html(theta);
    if (sdrsMatch()) {
        $match.html('MATCH').removeClass('bg-danger').addClass('bg-success');
    } else {
        $match.html('NOPE').removeClass('bg-success').addClass('bg-danger');
    }
}

You can follow this down through the code to get a specific understanding. Here is how SDR uniqueness is calculated:

github.com

htm-community/htm-school-viz/blob/master/static/js/lib/sdrs/tools.js#L14-L23


function overflowSafeUniqueness(n, w) {
    var bigN = math.bignumber(n);
    var bigW = math.bignumber(w);


    var nf = math.factorial(bigN);
    var wf = math.factorial(bigW);
    var nwf = math.factorial(math.subtract(bigN, bigW));


    return math.divide(nf, math.multiply(wf, nwf));
}

And here is how we use it to get the overlap set:

github.com

htm-community/htm-school-viz/blob/master/static/js/lib/sdrs/tools.js#L175-L187


getOverlapSet: function(sdr, b, w) {
    var n = sdr.length;
    var wx = this.population(sdr);
    return this._getOverlapSet(n, wx, b, w);
},


_getOverlapSet: function(n, wx, b, w) {
    var term1 = this._getUniqueness(wx, b);
    var n2 = n - wx;
    var w2 = w - b;
    var term2 = this._getUniqueness(n2, w2);
    return math.multiply(term1, term2);
},

helena_Thielen · December 22, 2019, 4:53pm

Hey everybody,

I had the same issue… and in my oppinion the formular is not correct.
I also tried to reproduce the example down below, but without success.
Soop please let my explain myself:

We want the following probability:
P(y matches at least one of the x_i) = 1-P(y matches none of the x_i)

Now lets check a single probability (as shown in formular (4):
P(y matches x) = fp(\theta)
(fp(\theta) means the probabitity for a false positive)

According to that
P(y does not match x) = 1 - fp(\theta)

(I think we can assume that the matchings are independent) it follows:
P(y matche none of the x_i) = [1 - fp(\theta)]^M

Sooo in total we should have:
P(y matches at least one of the x_i) = 1-P(y matches none of the x_i)
= 1 - [1 - fp(\theta)]^M

I hope there is no mistake.
Well How I said I wanted to reproduce the results (with the result of fp_x(\theta) = 10^20).
Soo I wrote a simple code. And this code supported my caculations.

import scipy.special

n = 1024
w = 21
T = 14
M = 10

def false_positive(n, w, T):
    return overlap_set(n, w, T) / capacity(n, w)

def capacity(n, w):
    return scipy.special.binom(n, w)

def overlap_set(n, w, T):
    return scipy.special.binom(w, T) * scipy.special.binom(n - w, w - T)

def false_positive_set(n, w, T, M):
    return 1 - (false_positive(n, w, T))** M

def false_positive_set_alternative(n, w, T, M):
    return 1 - (1 - false_positive(n, w, T))** M

print(false_positive_set(n, w, T, M))
print(false_positive_set_alternative(n,w,T,M))

Please, correct me if I am wrong. But I think you can even see that in the (wrong) formular itself:
If the prob. for false positives is super small and the calculate ^M the number gets even smaller.
And finally we have a super small number, close to zero, and then subtracting that from 1. So the result of formular (9) has to be close to 1.

Well, I hope I could express myself. Please correct me if I am wrong.
Or give me some other feedback to that.
Thank you all very much (and merry christmas )

Topic		Replies	Views
Probability of false positive when classifying a set of vectors Numenta Theory	7	1271	May 16, 2016
Calculate Properties of random-uniform SDRs? Tangential Theories	7	648	December 17, 2019
Reliable Classification of a List of SDR Vectors Education	2	380	February 9, 2020
Why is the typical size of an SDR 2048 and the typical number of active (or "on") bits 40? Numenta Theory question	1	503	August 16, 2018
What false positive rate is the threshold from good to bad? Numenta Theory	2	782	May 26, 2016

Help explain the probabilty of a false positive classification of a list of SDR vectors

Related topics