Help explain the probabilty of a false positive classification of a list of SDR vectors

Hello,

I don’t understand equation 9 in SDR chapter of the BAMI book:

My expectation of what equation 9 is supposed to calculate is that if for example we have a list of SDRs, X, with only two elements x1 and x2, then equation 9 is supposed to calculate the probability that a random SDR y will match any of the SDRs in X.

This means that we should calculate:

Probability false positive between x1 and y + Probability false positive between x2 and y

But from the way equation 9 is written, it looks to me that it is calculating:

Probability of no false positive between x1 and y + Probability of no false positive between x2 and y

Can someone please explain the intuition behind equation 9 and illustrate how it works using a small example?

The formula for probability of a false positive is basically its overlap set divided by its uniqueness. You can see this clearly in the HTM School code that runs behind this episode.

You can follow this down through the code to get a specific understanding. Here is how SDR uniqueness is calculated:

And here is how we use it to get the overlap set:

2 Likes

Hey everybody,

I had the same issue… and in my oppinion the formular is not correct.
I also tried to reproduce the example down below, but without success.
Soop please let my explain myself:

We want the following probability:
P(y matches at least one of the x_i) = 1-P(y matches none of the x_i)

Now lets check a single probability (as shown in formular (4):
P(y matches x) = fp(\theta)
(fp(\theta) means the probabitity for a false positive)

According to that
P(y does not match x) = 1 - fp(\theta)

(I think we can assume that the matchings are independent) it follows:
P(y matche none of the x_i) = [1 - fp(\theta)]^M

Sooo in total we should have:
P(y matches at least one of the x_i) = 1-P(y matches none of the x_i)
= 1 - [1 - fp(\theta)]^M

I hope there is no mistake.
Well How I said I wanted to reproduce the results (with the result of fp_x(\theta) = 10^20).
Soo I wrote a simple code. And this code supported my caculations.

import scipy.special

n = 1024
w = 21
T = 14
M = 10

def false_positive(n, w, T):
    return overlap_set(n, w, T) / capacity(n, w)

def capacity(n, w):
    return scipy.special.binom(n, w)

def overlap_set(n, w, T):
    return scipy.special.binom(w, T) * scipy.special.binom(n - w, w - T)

def false_positive_set(n, w, T, M):
    return 1 - (false_positive(n, w, T))** M

def false_positive_set_alternative(n, w, T, M):
    return 1 - (1 - false_positive(n, w, T))** M

print(false_positive_set(n, w, T, M))
print(false_positive_set_alternative(n,w,T,M))

Please, correct me if I am wrong. But I think you can even see that in the (wrong) formular itself:
If the prob. for false positives is super small and the calculate ^M the number gets even smaller.
And finally we have a super small number, close to zero, and then subtracting that from 1. So the result of formular (9) has to be close to 1.

Well, I hope I could express myself. Please correct me if I am wrong.
Or give me some other feedback to that.
Thank you all very much (and merry christmas :blush: :santa:)

1 Like