Not in the temporal memory, but in the spatial pooler.
Consider the situation after columns adapting to the patterns through competition. You said it yourself, columns with more overlap dominates if others do not catch up which is a problem for variable sparsity. In time, every column adapts itself to the dense input because the ones that are connected to more active input bits dominate and the rest is encouraged to do so by competition. So if potentialPct is %70 and your input sparsity is %50, more synapses would become connected among the potential pool, compared to an input sparsity of say %5. Connected synapses are the ones that encode input data, not potential synapses. Denser input sparsity leads to connected synapses that cover more of the input space because of competition. If there was no inhibiton / boosting / bumping mechanisms to ensure every column is getting used and adapting to the input, what you said would be true.
@jakebruce is right about only considering the on bits. The efficient way of overlap computation only iterates active input bits and accesses the columns sampling from them. Even in vanilla implementation, sparser inputs would have better performance because you iterate connected synapses when computing the proximal input overlap, not potential synapses.
Edit: A couple of words to prevent confusion.
As I understood, @sunguralikaan talked about the performance of TM, which makes sense, since it’s more computationally heavy part of HTM. Nevertheless, even for SP I can’t see how it can affect performance, perhaps it’s possible in some realisations (but not in mine).
Briefly: in the loop where you compute the feedforward activations of your columns, if you loop over the list of active bits in the input, then that will take longer if there are more active bits in the input.
I see, it can work in this way, but to do it, you should keep indexes of proximal connections on the input side. I just prefer to follow real-world object-oriented approach, where proximal connections are part of a neuron object, and their relationships, states, and permanences are their properties. From this side it doesn’t matter is the input sparse or dense.
@jakebruce gave an example how it can influence the performance of SP, so if your algorithm works similar, it makes sense.
However, for most cases, an input has the smallest size comparing with SP and especially TM, so it shouldn’t be a big difference. Do you have a significant change in your case?
The algorithmic complexity of Spatial Pooler overlap computation scales linearly with the number of active input bits if you do it the efficient way. Complexity is “O(N)” where N is the number of active input bits and you get better cache locality. Whether it is worth it to you or not is for you to answer. I had around 3000 potential synapses and 1000 connected synapses per column on dense inputs, so it is a concern for me.
However, I would rather focus on what I described about connected synapses in my above answer. That is the more important bit about dense representations which lead to losing distributedness on representations. The connected inputs of columns start to overlap more.
I totally agree that accuracy of the model is much more important on this stage, I just wanted to clarify the simpler point with performance first.
I still can’t get what is the fundamental issue with the dense representation of input in terms of accuracy. You said
and I agree that for some logic of inhibition / boosting can be none or even negative impact in case of a dense representation, but just because such logic was created for sparse ones.
From my point of view, if a dense representation of the input has more semantics, then it’s better to create an appropriate logic for inhibition / boosting. Unless there is a fundamental issue with the dense representation, which I’m not aware of.
I agree that it is what we should do if the dense representations were the objective. However, sparsity of HTM is a crucial factor of its function. @jhawkins says in some of his presentations along the lines that “If you can only remember a single thing from this presentation, it should be sparse distributed representations.” SDR is a very core idea to HTM because of its merits. Especially when the theory extends to hierarchies, the sparsity would really shine when you could work with unions of activations or when you try to crosscheck whether an SDR belongs to a union of SDR’s (temporal\union pooling). But then, this discussion becomes about the merits of sparse distributed representations which there are publications on.
For now, if the question is “Why should the inputs be sparse for the current SP algorithm?”, “Because they are designed around sparsity which suffer on dense inputs.” would be my short answer. If the question is “Why sparsity?”, there are publications by Numenta focused on answering that exact question such as .
I totally agree that sparsity is a cornerstone of HTM theory, but only for SP and TM.
@rhyolight even deliberately mentioned it in the video about encoders (or about SP - sorry, I don’t remember exactly).
I think there is misconception about sparsity of an encoder output caused by using SDR term for it, which is only technically similar to real SDR, but can be a DDR (with Dense for first D
What is really important, is to have semantics in it, plus satisfy some formal requirements like to have the same size for all patterns. As I figured out, it’s also important to have the close level of sparsity/density in case of using standard learning algorithm, but I believe it’s a technical thing, not fundamental.
For whom it’s interesting, I figured out a well-grounded answer for this question too. But only for one case: using topology.
There is a very unpleasant pitfall of applying topology: because of using local inhibition, in case of a dense representation, it’s hard to choose winners which reflect well the semantics of the input, because there are too many of them with the equally high number of active proximal connections.