This is more of a thought problem, instead of a question as to why this doesn’t actually happen (or maybe it does shrugs)
Thinking about how few neurons are active at any one time, and the reasons behind this, I’m wondering if it wouldn’t be more efficient - or even impossible - to condense the processing by using different triggering mechanisms (perhaps this might be what neurotransmitters do to some degree anyway). So, for instance, let’s say there’s an image that’s 3x3 (for simplicity) -and- R= red, G=green, B=blue.
Couldn’t this be fed into a network where each of the 9 neurons per layer would be capable of representing each of the 3 colors? So the horizontal connections would each be connected to each other. Each synapse would be capable of receiving and sending (let’s just call our pretend neurotransmitters R, G, and B respectively). So while any one neuron is in a B state, it’s horizontally reinforced by the other neurons in a B state by way of the B neurotransmitter. This keeps the horizontal active network separate from the others at any one time. Depolarization wouldn’t be necessary because each neuron is always active. And because no one neuron will ever need to represent more than one pixel, it would never be able to interfere with any pattern of any other color. Each pattern would be completely safe from interference. Instead of turning the neuron “on”, you would only need to change the neuron’s representational state.
I’m super new to a lot of this, so be kind - but it would seem to me, that this would - at the very least - save on material resources, no?
Implementing this digitally might require adding a list of ports and port handlers for neural communication for any one particular application and some sort of state management for the neuron. I have no idea what I’m even saying right now - I haven’t even looked at the code-base yet (which is, ironically, one of the main reasons I registered this account )
Well that’s exactly my question - I understand the basics of SDRs and why they’re sparse and how they work. But wouldn’t you be able to condense things by using mutually exclusive pattern types (which by themselves would fall within the parameters of SDRs) as long as each pattern can’t interfere with each other via frequency or neurotransmitter or whatever. If each type of pattern is completely independent from the others per input - could you not combine them physically? Giving the system a way to keep each pattern completely separate from the other patterns functionally but not physically would still work - right? Maybe I’m missing something.
Do you understand how spatial pooling works and how it relates to temporal memory? This is what led us to SDRs, and how they are useful in the brain. Spatial pooling groups neurons and forces them to have a common potential pool. This allows the brain to enforce sparsity on incoming sensory activations, which may not be sparse. Temporal Memory performs within the parameters established by SP, which normalizes the input. We’ve increased the sparsity of the SP in the past, but it leads to more computations (less performance) and receding added value. I think the brains settled on 2% because that is a sweet spot for computation.
From a neuroscience perspective, the brain has so many neurons that it’s hard to see what it does with them.
For example, in just one region, rats have something like 15,000 neurons for each whisker. The input encoding signals from that whisker is from something like 400 neurons in the thalamus. Even 400 seems like a lot to encode signals from just one whisker, which are things like vibration as the hair passes over a surface, direction it is bent, and how far it is bent.
Even if there isn’t any redundancy in the thalamus, that’s 40 neurons in the cortex per neuron in the input to cortex. That’s a similar scale to the 2% sparsity of SDRs, or 1 in 50 neurons on. Since not all neurons in the thalamus are active at a time, that’s actually a bit strangely close.
If you spend some time with the papers section at the Numenta mothership you will see that much of the recent activity has centered around what gets coded - feature locations - and how the massive code space offered by SDRs works well to remember the huge number of things a whisker might brush against.
If this seems overkill just think of the fantastic number of things that might be encoded in a strip of Braille dots and you should get some idea of what a mouse may be able to sense with its whiskers.
Why it seems like overkill to me will take some explaining. The fact that it seems like overkill just means more is going on, which maybe I just haven’t learned about yet, but I like to speculate.
I don’t think primary cortex represents larger, more complex objects because none of its cells stay on much longer than the stimulus. I’m not even sure it represents objects, at least in allocentric terms. Don’t the what and where streams start after primary cortex?
From the podcast, one question is whether locations get represented in terms of the sensory surface or the object’s surface. Maybe there’s a third option. In barrel cortex, there’s a map of the whisked space (at least in L2/3), which warps to match the whisking amplitude. Exactly what that would look like in other senses is ambiguous. It could be closely linked to behavior. For example, maybe it’s the space through which the sensor just moved, or the space where the sensor can move when only the fingertip, arm, or something else is moved. Maybe it’s not exactly spatial, and instead about timing during a movement.
There are a lot of possibilities, but it’s not location on or relative to the sensor, and it’s not location on the object or in Cartesian space. It’s also probably closely linked to behavior. That seems like a good starting point for both the what stream and the where/how stream. The locations are encoded in a way dependent on behavior, which helps determine the locations it can reach and how to reach them. The locations are also not on the sensor, so there’s a coordinate transform off the sensor’s surface which could lead into a coordinate transform onto the object’s surface.
One of the tenants I go by is that cortex is cortex - it all does much the same thing everywhere. Yes there are local differences but on a whole the same basic processes are done everywhere in the cortex.
With this in mind you have the primary visual cortex where you can feed in basic visual stimuli and get some idea what is being done with it.
From the original H&W papers in the 1960’s we have a pretty good idea that the cortex is mapping features of outside world is some version of the sensor space combined with extensive feature extraction. This mapping and aggressive feature extraction continues at the next map and so on up the hierarchy.
What is not always immediately obvious is that there is a counter-flowing stream of information that Numenta likes to identify as location information. This is used to help frame the sensations and on some local level - reduce noise and parse the input.
This makes the local task much more involved than just forming the space of the whisker or even the ensemble of whiskers.
Are you talking about feedback? I thought the location signal wasn’t feedback. I don’t know much about feedback and connections between regions so maybe that’s what I’m missing. Are you saying the location signal isn’t just a location signal?
I agree, but the coordinate transformations aren’t the same since the what and where streams both exist. The different inputs are probably the cause. All or most of the cortical circuitry needs to be applicable to both.
For example, it might need to convert from location on the retina to a map of the visual field which is insensitive to eye direction, and convert that to location relative to the arm. That’s a made up example, but there are egocentric coordinate systems besides location on the sensor. Conditions like hemineglect show that.
I’m not saying that the brain has a lot of neurons for no reason, just that I have no idea why egocentric regions need as many neurons as allocentric regions. They’re doing the same things, so I guess the mystery to me is how the same circuit does both depending on its inputs.
The brain is a tangled nest of signal paths. The WHAT and WHERE streams eventually recombine to end up in the temporal lobe to be registered as your experiential memory.
There is a counterflowing stream that matches much of this all the way back to the lowest level of sensation. Much of what I have read does not claim to know exactly what flows through this path but data from brain imaging makes it clear that these paths exist.
The DeepLebra model camp uses this as a top-down training signal to help form the bottom up representation. They paint a very compelling case for this interpretation.
To be clear, I’m not saying they’re different. I’m just trying to produce a conclusion based on an apparent contradiction (they need to be the same but it’s unclear how the same circuit deals with all types of coordinates), even though there’s probably an obvious answer to how the circuit can do both.
I assume the circuit can only do one type of coordinate processing, meaning the same manipulations or whatnot, with differences resulting only from differences in the inputs to a region. That might be wrong if learning is a big factor. For example, if it uses inputs as anchors and figures out how it gets between those places, it could learn path integration I guess without us needing to care about the types of coordinates. I’m not being very clear, sorry about that. A lot of what I’m saying is based on gut feelings.
Both streams process coordinates, but identifying location on an object seems pretty different from identifying location relative to the body. The same circuit performs two opposite coordinate transformations. I don’t think it can convert to location on the object in primary cortex and then add back in egocentric information in egocentric regions, because that might not work for everything. Maybe I’m wrong.
The what and where pathways both process coordinates, but it’s probably not easy to implement completely different coordinate transforms with the same circuit.
That’s why I think it might process coordinates which aren’t exactly egocentric or allocentric in primary cortex. Besides location relative to the body and location relative to the object, there are other things, like location in the space the sensor is moving through. A circuit which could do something like that seems more easily applicable to both egocentric and allocentric coordinates, by which I mean it’s easier for me to see how it could do both depending on the inputs, and it could produce a coordinate system in primary cortex which the what and where streams can both build on. I don’t have more than a vague notion of what this system would be like. I guess the point is that we shouldn’t limit our thinking to egocentric and allocentric.
I have what I am sure is a crazy idea how the transform from sensor-egocentric space to allocentric transforms works but I will share it here for public amusement:
The senses are put into relative space by “decorating” the posture control system.
We all possess an elaborate body sensing system that is hardwired to combine every joint and muscle from the ground all the way to the vestibular sensing / ocular pointing system. This has been trained and tuned every day of your life as a coordinate transformation system and is your basic reference frame. This reference frame extends to all the places you have learned to put your hands and feet - your personal space. This paper points out the relationship between egocentric sensation and spatial processing. My one quibble with this paper is that the proprioception defects happened well after the subjects had formed a language and allocentric relationships.
So how do we get from “here” to relate that to “out there?”
The rest of the world is on “the other end” of the vestibular system. Learning the relationship between your internal reference system and all its permutations and the perceived “out there” is the dimensional mapping that forms your transform to the allocentric universe.
The interaction of your hands and perceived space is formed by the learned relationship between the aforementioned personal reference frame and your visual learned perception of your near working field. There is a reason that damage to the upper end of the WHERE stream impairs your manual manipulation skills; this is where your vision processing and somatosensory streams join.
Well - tell me the 100 ways I am wrong about all this.
Yeah, that was ye olde epiphany smacking me upside the head. I wasn’t thinking it through all the layers. It may take a bit of ramping up for me. I’ve be a software engineer for 15 years but I’ve only very mildly tinkered with any sort of neural-net. I am learning tons though - this is extremely fascinating territory. I’m obviously in the midst of a lot of brilliant people here. That’s good. It means there’s a lot I can learn.
So - now that I get it - when you did ramp up sparsity, even though it was slow, was there any sort of benefit at all? Also, is there a glass ceiling to sparsity and layers? I mean, if you just added like 5 layers atop the already established 6 - does computational capacity begin to suffer? Would there be any benefit?
Simulating multiple layers is still research code. Our current OS project NuPIC is focused on simulating what’s happening in at least one of the layers, the SP/TM combo. I think SP could be happening in other places in the brain, even outside the neocortex. It’s a very useful process. Our current research re-uses the TM concept associated with movement, and for storing object behavior. But calling out what layers do what is still iffy.