I see that you’re taking your array of synapses and converting it into an array of 1s and 0s, then splitting that array into equalsized partitions. Does that mean your synapse array holds a constant number of synapses for each column? (I’m assuming self.columnCount
here is an integer, not an array of integers)
If I’m understanding this correctly, a couple things slowing your code are:

(sa['permanence'] > self.synPermConnected)
is comparing lots of permanences that are 0. In the SparseMatrix, zeros are not stored or compared.

(encoding[sa['bitIdx']] == 1)
same as above. It’s doing this lookup for every potential synapse. In the SparseMatrix this comparison would only happen for synapses with permanence > 0.
 Same with the
&
. It’s comparing bools for every potential synapse.
 Each of these operations (
>
, ==
, &
, encoding[]
, np.split
) is allocating a new array. In the SparseMatrix, it loops over all the synapses and performs all the logic on a synapsebysynapse basis, never allocating intermediate arrays. (This is just the nature of C and Python – I don’t see a way you can solve this other than by writing this logic in C)
Fundamentally, you’re running the same algorithm as the SparseMatrix rightVecSumAtNZGteThreshold
, except that you’re also comparing potential synapses with permanence 0. And the vectorized approach to the algorithm is bound to be a bit slower, since it involves a fair amount of allocation.
There’s no particular optimization in the SparseMatrix’s overlap counting. It’s a pretty straightforward “loop over the synapses and count the how many connect to these bits”. The Connections
computeActivity
on the other hand does have an optimization where it maintains a reverse lookup from input bits to the synapses that connect to that input bit. With this approach you can count overlaps without iterating over every synapse – you only have to iterate over the active synapses, i.e. the synapses to the active input bits.
If you’re curious why the SparseMatrix doesn’t use this optimization: The SparseMatrix stores synapses in rows and it’s good at traversing rows, and this optimization would be equivalent to being able to efficiently traverse columns in the matrix. That would mean doing more bookkeeping in every method that mutates the matrix. So this optimization involves making the learning code slower in order to make the overlap computing faster. Maybe it’s worth it, maybe it’s not.