No influence of learning based on the permanence of proximal connections

I figured out an answer to the second question: with the current algorithm of selecting the active neurons, the ones which are sensitive to spatially smaller patterns, in most cases lose a competition to neurons sensitive to less specific, but bigger patterns.
I believe it’s possible to create a boosting algorithm which can solve this issue (I’m not talking about existing boosting here).
@rhyolight, @scott, is there any experience or ideas in this direction?

Those cameras are just the best sensor for testing sensorimotor concept where camera must slightly move for better seeing any non-moving objects like retina, Unfortunately, they provide asynchron events so that we have to modify Htm for handling them or must convert sensor data somehow in the regular time instance

A common post-processing step is to integrate events over a short synchronous window, e.g. 1ms.

Thank you for more detailed explanation, I didn’t get it after looking at some of the recommended by you materials. It really similar to the retina behavior, especially if it is possible to support receptors sensitive to the overcoming a threshold up and down at the same time (I mean by two different groups of receptors).
At the same time, unlike your camera, the retina has very uneven distributions of receptors, and there are much more cones in the fovea, which is responsible for detailed vision. So, here representation of the solid edge should be quite dense.
In any case, retina itself is just a part of the visual encoder, and I’m not sure about the sparsity of the resulting input to the neocortex. I think it should be quite dense just to be economically reasonable, and I know it can be dense because it works for me.
What I see from my experience, it is more important how to organize input to maximize semantics and capability for generalization (isomorphism in case of visual data), then support low sparsity.

@jakebruce it is clear that we can collect all events within a time window like 1ms. However, you have to face with many frames with no events e.g for scences in your garden midnight. We discussed this topic sometime ago, but from my understanding the current nupic does not support it.

Just wanted to say that you gave a very good answer to your own question. Also, to prevent confusion of other readers, I think you meant active columns rather than neurons.

@jakebruce The video and approach seems a very good fit for HTM. Lately I was thinking about a visual sensor that just captures the edges (I can access environment geometry) or color change in visual data to sparsify the input. The RGB color sensor I am using at the moment has fixed sparsity but it is not sparse at all and I think I am crippling HTM because of that. Columns need to map to a very large subset of the input bits. A sparse visual sensor became one of the priorities for the agent at the moment. I can detect the changes in intensity as well as discussed above. Thanks for the direction.

I am looking for the simplest starting point. Would I be good if I just turned on bits that changed intensity value by a threshold? Also, do we apply inhibition to the neighboring pixels? If not, why not?

It’s a good point in general, but in this exact case, since I was talking about SP only, it was about neurons or input for the correspondent columns, depending on how would you like to look at it.

Could you elaborate on it, what do you mean by that? Every element of SP is potentially connected to a fixed number of input elements. So it shouldn’t be any difference from this perspective is it sparse representation or not.

1 Like

Below are just observational thoughts which may be off.

It makes a difference on the overlaps of representations for similar inputs. If a column learns dense patterns, then each column is actually encoding more of the whole image (input space) rather than bits and pieces. So if you change the image slightly, either the active columns do not get effected or almost all of them change. Situation worsens as you increase density. In addition, every column of the same activation starts representing the same stuff. The things they represent overlap more as the density increases. So you lose distributedness on your representations. You can shrink the size of potential fields of columns to limit what they learn, but then you are not using all the information on your input which leads to underfitting, if your possible input patterns are rich enough.

Of course you can adjust some SP parameters to remedy some of this, which is what I do mostly. So there is that. However, just because the brain can recognize objects in chaotic images, it does not mean that its ability is fully utilized. That’s my general concern with dense inputs.

Oh, I forgot another reason for sparsity specific to my case; performance. Dense representations cost more because of the increased number of synapses per column to encode them.

How it’s possible? The part of the input space which covered by proximal connections is determined by percent of potential connections you use, so it doesn’t matter is it sparse or not.
Let’s say you use 70% of potential connections, so you initially connect each of SP’s element to 70% of all elements in your input space. If 1% or 80% of these connections are active, it is still the representation of 70% of your input space.

It confuses me too. At the level of columns, you always have fix sparsity (usually 2%). So you should have the same number of distal connections for sparse and dense input, and it can’t affect your performance. Unless you use your own implementation with not-fixed sparsity in SP.

Efficient implementations usually only consider the bits in the input that are on. Sparser input saves computational cost in that case.

1 Like

At the level of SP you’ll have 2% of active neurons in any case, so how more dense input can affect computations in TM?

Not in the temporal memory, but in the spatial pooler.

Consider the situation after columns adapting to the patterns through competition. You said it yourself, columns with more overlap dominates if others do not catch up which is a problem for variable sparsity. In time, every column adapts itself to the dense input because the ones that are connected to more active input bits dominate and the rest is encouraged to do so by competition. So if potentialPct is %70 and your input sparsity is %50, more synapses would become connected among the potential pool, compared to an input sparsity of say %5. Connected synapses are the ones that encode input data, not potential synapses. Denser input sparsity leads to connected synapses that cover more of the input space because of competition. If there was no inhibiton / boosting / bumping mechanisms to ensure every column is getting used and adapting to the input, what you said would be true.

@jakebruce is right about only considering the on bits. The efficient way of overlap computation only iterates active input bits and accesses the columns sampling from them. Even in vanilla implementation, sparser inputs would have better performance because you iterate connected synapses when computing the proximal input overlap, not potential synapses.

Edit: A couple of words to prevent confusion.

As I understood, @sunguralikaan talked about the performance of TM, which makes sense, since it’s more computationally heavy part of HTM. Nevertheless, even for SP I can’t see how it can affect performance, perhaps it’s possible in some realisations (but not in mine).

Briefly: in the loop where you compute the feedforward activations of your columns, if you loop over the list of active bits in the input, then that will take longer if there are more active bits in the input.

I see, it can work in this way, but to do it, you should keep indexes of proximal connections on the input side. I just prefer to follow real-world object-oriented approach, where proximal connections are part of a neuron object, and their relationships, states, and permanences are their properties. From this side it doesn’t matter is the input sparse or dense.

@jakebruce gave an example how it can influence the performance of SP, so if your algorithm works similar, it makes sense.
However, for most cases, an input has the smallest size comparing with SP and especially TM, so it shouldn’t be a big difference. Do you have a significant change in your case?