For each minicolumn in the SP, I’m randomly assigning it to cover 85% of the input space. So let’s say the input space is 100 bits, I’d randomly assign 85 bits to each column. So it might look like:

Now each bit gets provided with a connection strength - a probability(?) based on (in the BHTMS video) the Bates distribution. R doesn’t have a Bates distribution package so for the moment I’m using a normalised set of random values from the Gaussian distribution. I trust that this will provide a similar result but I lose spread. I’m not sure what spread does in the scheme of things or how I can create it.

A sample (set at 0.5 for the mean) could look like:

I’d recommend you check out the source code for SP:

Specifically this part where the SP columns perms are initialized:

for columnIndex in xrange(numColumns):
potential = self._mapPotential(columnIndex)
self._potentialPools.replace(columnIndex, potential.nonzero()[0])
perm = self._initPermanence(potential, initConnectedPct)
self._updatePermanencesForColumn(perm, columnIndex, raisePerm=True)

It looks like all permanence values for each 1 SP column are uniform, so I think your use of normally distributed values is a departure from this. What I’m not totally sure about is what the perms are initialized to. There is a hard-coded parameter in the constructor to the ‘SpatialPooler’ class called ‘initConnectedPct’, though I’m not sure if this sets the init values for all perm values. Maybe you could verify @rhyolight or @subutai?

Yes, I diverged from the NuPIC code to establish the initial connections in a normal distribution around a center point. The most important thing is that perms are initially close to the connection threshold. The normal distribution not necessary, but makes more sense to me. I think @alavin suggested it to me years ago.

Thanks Matt. Has there been any experimentation done in regards to performance / learning for an SP configured as such compared with a set value? It would be interesting to see if it made a difference.

It’s been a while and I’m out of touch with HTM, but maybe this helps: The Bates distribution is a uniform distribution of random variables on the unit interval. Intuitively it makes sense to use a Gaussian distribution (with mean at the threshold) s.t. most of the density is around the that threshold. The implication being most permanence values won’t need to fluctuate much to converge, while still having flexibility to explore the space further away from the threshold; Bates may put too much density far away from the threshold.

Thanks for that clarification Alex.
So to distribute percentages I can generate a set of percentage values (0 - 1) and then use the Gaussian density function to allocate them across the bit indexes. I think this will give the same result as using a random Bates function except as you say - with a density closer to the required threshold. My confusion was in trying to constrain random Gaussian values to between 0 & 1 which didn’t make a lot of sense.