Hey everybody,
I had the same issue… and in my oppinion the formular is not correct.
I also tried to reproduce the example down below, but without success.
Soop please let my explain myself:
We want the following probability:
P(y matches at least one of the x_i) = 1-P(y matches none of the x_i)
Now lets check a single probability (as shown in formular (4):
P(y matches x) = fp(\theta)
(fp(\theta) means the probabitity for a false positive)
According to that
P(y does not match x) = 1 - fp(\theta)
(I think we can assume that the matchings are independent) it follows:
P(y matche none of the x_i) = [1 - fp(\theta)]^M
Sooo in total we should have:
P(y matches at least one of the x_i) = 1-P(y matches none of the x_i)
= 1 - [1 - fp(\theta)]^M
I hope there is no mistake.
Well How I said I wanted to reproduce the results (with the result of fp_x(\theta) = 10^20).
Soo I wrote a simple code. And this code supported my caculations.
import scipy.special
n = 1024
w = 21
T = 14
M = 10
def false_positive(n, w, T):
return overlap_set(n, w, T) / capacity(n, w)
def capacity(n, w):
return scipy.special.binom(n, w)
def overlap_set(n, w, T):
return scipy.special.binom(w, T) * scipy.special.binom(n - w, w - T)
def false_positive_set(n, w, T, M):
return 1 - (false_positive(n, w, T))** M
def false_positive_set_alternative(n, w, T, M):
return 1 - (1 - false_positive(n, w, T))** M
print(false_positive_set(n, w, T, M))
print(false_positive_set_alternative(n,w,T,M))
Please, correct me if I am wrong. But I think you can even see that in the (wrong) formular itself:
If the prob. for false positives is super small and the calculate ^M
the number gets even smaller.
And finally we have a super small number, close to zero, and then subtracting that from 1. So the result of formular (9) has to be close to 1.
Well, I hope I could express myself. Please correct me if I am wrong.
Or give me some other feedback to that.
Thank you all very much (and merry christmas
)