Hi everyone,
I’m a newcomer here, having read “On Intelligence” like a week ago and trying to keep up to date with your more recent research.
On Intelligence was a great read, and I thought it gave me a clear view of the organisation of the neocortex. With current HTM models and papers about the columns and allocentric stuff however, although excited at first, I feel now like I’m a bit disorientated, having lost sight of the big picture somewhere along the way.
Now, I understand that the hierarchical stuff between regions got somewhat relegated to the background while you focus on understanding the full capabities of the (macro)column and the intricate connections between layers. And having each and every patch of cortex, down to the primary sensory (and motor?) levels, able to get a notion of complex objects is mind-blowing. However this “shift in perspective” comes with the cost, if I understand correctly, of having some capacity limitation problems all over the place. Powerful granted, but a few hundreds of possible stable representations per macrocolumn now feels like it would be somewhat cramped to account for our minds capacity, especially if same few very familiar objects are each repeated across most columns of primary regions… I did come across a poster yesterday which hinted in fact at a more feature-focused storage, where full objects would more likely be combinations of features (coffee cup got cylinder and handle, things like that…) but I’m struggling in finding more details about that idea in your fully detailed papers or even vulgarization videos.
Detailing the feature-oriented insights you have would maybe also allow me to reconcile the fact that SDR “shall” be overlapping more when encoding semantically close, ehm… “things”, and that, according to the paper on the matter of a virtual hand palpating virtual object, current implementation of the simulation do not seem to do that.
Also, I have trouble bringing back together the current simulation of sequential memory with what I think our memory abilities should be (a set of beliefs mostly developed while reading “On Intelligence”). Okay the simulated layers of a current HTM implementation remembers sequences. Of notes, say, if input SDR was to represent notes. But there is nothing like a pitch invariant form here. In fact, even if the network output layers are able to form a somewhat more stable signal that was input to them, there is nothing like the scale-agnostic, translation-agnostic “invariants” I was imagining were at the core of the matter while reading the book. Having learned “Somewhere over the Rainbow” starting with a flat, in the current HTM model, all other notes in the sequences are remembered in the context of that starting flat, but were we to input F# as a starting point, all guesses from the network would be off.
I understand this could be a simple “okay we’ll tackle this later” issue. But to my mind there is something more fundamental at work here. What is more fundamental is that “unless you have perfect pitch”, you cannot even dream of remembering songs in this fashion. Our ability to remember the “relative” form comes either preferentially or before the ability, for only a few very trained (or talented ?) music-enthusiasts, to recognize an absolute version. And I do not believe this is a matter of encoding the raw audio signal in a relative form instead of absolute. It seems very likely, from the little I get about ear cells, that the most basic and direct information they can deliver is an absolute (range of) pitches.
So I was going to post about this yesterday, because this was puzzling me, then I had some possible insight. Automagically recognizing pitch “differences” given an absolute pitch signal requires a quite specific info transform. I was thus wondering if basal dendritic connection trees on pyramidal neurons could represent info transforms such as {(A and B and C) or (D and E and F) or (G and H and I)} (which I realize they can, from the little I’ve read about it, by having different dendritic segments for each expression in parenthesis and a synapse for each letter), which would give them the ability to respond well to a fixed spacing between two absolute values in a given subset of the range. The next cell could specialize into recognizing same spacing across another subset, The cell after into recognizing another spacing… until finally there’s enough of these cells in the output layer for covering the whole absolute range with all possible deltas. However, consistently reaching such encodings of constant deltas while leaving a multi-segment version of the current HTM model learn on its own seems unlikely. But then I got this déjà-vu feeling, between this problem and how “grid cells” mechanisms could be consistently generated. I don’t really know any detail about this, other than there seem to be something similar at stake here. Since your view on the matter of allocentric location signals now requires that a mechanism akin to grid cells is taking place all over the, ehm… place; and that my eyebrow-frowning about pitch invariance could be solved by having such spacing-between-values encodings be more commonplace than if left to chance alone, running current HTM implementation, I’m wondering if… I don’t know… if a mechanism akin to what regulates sparsity and per-column activity in the spatial pooler, would not make reaching such stable “delta recognizers” more likely. A mechanism from which grid-cell-like transforms would spontaneously “emerge” all over the place.
Just wanted to share that problem-similarity feeling. I don’t really know what to make of it from where I am, but maybe this could help somehow.
Now that I’m at it, another thing bothering me about sequences is the inability of the current model to account for sub-sequence recognition. Assuming I’m a poet, I would very likely play with sub-sequences of words. Imagining I’m good at algebra, I would quite likely want to factorize similar subsequences of symbols. As an engineer I’d likely want to abstract away or design copy devices for any recognized subsequences of any stuff. And if I were more skilled in music, I believe I could have fun with subsequences of notes as well. Yet nothing in the current HTM implementation of sequential memory allows any of the simulated layers to extract a subsequence of notes from its context, and that context spans all the way to the first note. Do you believe this is a question to be tackled along with attention mechanisms ? Or is there also something more fundamental to it as well ?
Regards,
Guillaume Mirey