assuming that i don’t wanna use any temporal memory, does having segments to every column in the above layer, ruin the functionality of the spatial pooler?
I changed the name of this thread from “Can i have segments to every column in the above layer?” to “What if a minicolumn’s receptive field includes all input cells?”. I think this frames the question more in HTM language.
That being said, I’m not sure what the answer is.
so which percentage of the input you usually use as the column’s receptive field for regions of 500, 1000, 2000 columns?
From what I have seen, it is about 85%. You can see this in this video.
The problem with columns having a potential pool of 100% of the inputs is that the columns actually connected cells will drift over time and eventually represent something completely different than when they started.
We want to allow representations to change through learning but have some bias or anchoring to prevent them from completely changing. Simply by reducing the potential pool to 85% or 50% will provide some level of anchoring.
To me, the question is typically more about resources. In my experience, I don’t see a noticeable difference when I change the potential percent even by a large amount. I would expect that an overly high percentage might result in a lot of overlapping scores during SP initially and probably take more inputs for the minicolumns to specialize.
For processing efficiency, I prefer to use a small number of potential synapses. In my demos, I have even gone as low as 2% potential percent (when using SP to project output from one layer to another) since I am concerned with resources when running in a browser. I’m sure that is not a typical use case though. I recall that NuPIC uses 50% potential percent by default. I’m guessing there is an ideal range, but I haven’t seen that analysis anywhere myself.
but from an answer i’d recieved earlier about temporal memory, i concluded that sparsity of the feedforward segments, is also crucial to the functionality of the temporal memory… am i right?
Yes, but don’t forget that the potential percent of SP is not directly related to the sparsity of minicolumn activations. The potential percent impacts the number of synapses that can “vote” during the minicolumn scoring step (to measure how well/poorly each minicolumn connects with the active input cells). The SP process then follows that step with a sparsification step, whereby only a sparse number of minicolumns with the highest scores are activated, and all other minicolumns are inhibited.
–EDIT–
I may have misinterpreted your question – my answer above assumes you meant “sparsity of minicolumn activations”. I do not believe sparsity of feedforward segments has any impact on TM (the classic TM algorithm doesn’t have any access to the proximal segments – it just picks up where SP leaves off, with a sparse collection of active minicolumns that hopefully have preserved the semantics of the input space.
– “as i understood, the number of contexts this algorithm can remember, is of linear order to the number of cells per minicolumn. and if 32 cells makes 32 different contexts possible, in a problem like natural language, that makes only one or two previous words, which definitely doesn’t get the job done. so don’t we need to recognize where we are in the larger sequence? i remember a paper by Numenta telling that it can predict not only the next letter, but to some extent, the next syllable and the next word.”
++ “It is not limited in the way you are speaking. Remember that there are potentially thousands of minicolumns, each looking at different spatial aspects of the input. They all have different receptive fields. Each one is looking at a specific aspect of the input space and recognizing those spatial patterns temporally. Each column is limited to the amount of temporal contexts one input can be recognized, but working together they put together a much richer picture of the spatio-temporal space.”
– "yes, but considering the language example again, is the spatial pattern of ‘A’ any different from the spatial pattern of ‘A’ ? and if 32 contexts make… one letter (that’s the exact number in my mother tongue ) so… what we can do with that? does it give us anything other than a one letter context?
++ “A and A’ spatial patterns will be the same. BUT there are not only 32 contexts for this spatial pattern. Each minicolumn will see a different part of “A” because they have different receptive fields for the input space. Each will create different temporal contexts for the receptive field of “A” that it sees. One column might recognize the bar across the letter. Other letters will also have a bar (like H). “A” is only recognized when many minicolumns predict that A is coming next, each looking at a different receptive field of the spatial input space.”
The conversation is a bit out of context, so I might need to read it a few more times. This seems to be related to processing words/letters visually, which is not something I have had any experience with myself.
One thing that stands out to me from the conversation:
This makes an assumption that the SP is only activating a single minicolumn for a particular input. This is not the typical use case. In a typical 2048 minicolumn layer with 2% sparcity, you are activating 40 minicolumns. TM then selects which individual cells in each of those 40 minicolumns become active to depict its context. This means a particular input theoretically has over 32^40 different ways that it’s context could be represented.