Thank you for taking the time to digest my wall of text. I had a hard time deciding on how much information was needed for the intended context.
I’m referring to the algorithm as described in HTM School with some of my own additions relating to inhibition and dendrite activity. I guess I’m a bit stuck in the nomenclature as it looked at least two years, or more, ago.
It would of course be best if the category stays stable but I’m not surprised that data of this sort can give unstable results. And yes, the training of the 2/3 layer is intended to give the same result as in the previous paper on object detection. As in, the first exposure activates as many possible object patterns as possible and for every new exposure, the state in layer 4 is biased by the state in layer 2/3. Layer 2/3 then uses this as a bias to further narrow down its own possibilities since the neurons in that layer have been trained to be activated by specific object patterns in layer 2/3.
My implementation follows my understanding of the previous paper on object detection. Maybe I’ve misunderstood how layer 2/3 is supposed to strengthen the internal connections between neurons in the same pattern. Either way, this training of neurons in layer 2/3 seems to help with the narrowing down of possible patterns with few exposures and my column performs worse if I remove this functionality.
Perhaps this is a result of me using a column with a magnitude fewer mini-columns than typically are being used in Numenta research and in Nupic. My reasoning when it comes to number of mini-columns is that if I get 100 mini-columns to perform well enough, having a magnitude more of them should result in dramatic improvements. I haven’t decided on what is enough but my gut feeling is that if I can reach a stable 50% with 100 mini-columns, stepping it up to the common 2048 mini-columns would make sense.
Further, getting a small network performing well enough to solve simple problems offers more opportunities when it comes to running the algorithm on very limited hardware.
Ah, ok. Then I’ll assume I’ve overlooked something. I’ll spend some more time with the paper.
This sounds a bit unlikely. If we look at the popular coffee mug example, many different coffee mugs will, taking subsampling and SDR attributes into account, appear very similar. For example, you can not sense colour with your fingertip so the same model of mug in a different colour will appear identical even though they in one sense are very different.
So, feeling a lip on the edge, a cylindrical form with a bottom and open top together with an ear starting somewhere close to the edge and terminating somewhere close to the bottom should make category detection very possible. It would, of course, be possible to get into more details with a finer sensor but I claim that moving from the category “mugs” to “mugs with texture on the outside” is a very small step.
I’d say that my results show that the ability to detect categories, even if not intended, seems to work on at least some level with a combination of location, sequence of sensory inputs and category biasing.
But, just to be clear, a network that has trained on mugs will of course not do well if you show it a cat or something from some other very different domain.
To me, this sounds like a description of what I’ve done. The sensory patch is small and projects to a small number of mini-columns. Sub-sampling removes even more of the information that is needed to properly separate a “1” from an “8” or a “4” from a “9”. Thus I let the sensor be exposed to overlapping patches (that are smaller than the training image) that offer separation of location for similar features and topological information that connects the features.