Continuous Learning in Nervous Systems

Live soon!

By the way, I have been away from the forum for the past week, but I will catch up on all conversations this week. Thanks for your patience.

5 Likes

@mrcslws has, on occasion, referred to a basis representation for specific instantaneous neural states. Now, I’ve been working with linear combinations of basis functions as mathematical approximations to spatially and temporally varying phenomena for several years now. Although I don’t think I’ve heard Marcus expand on the details of how he thinks basis functions apply to neural network states, I believe I have a pretty good idea of the sense in which Marcus has been using these terms when speaking about SDRs. So, that being said, the following is my interpretation, and Marcus can chime in if he has a different take on the matter.

So, basis functions can be thought of as basic building blocks. For example, sine waves with different frequencies can serve as basis functions for a Fourier decomposition of a complex waveform. Another example: a linear function in 3D can be represented as a linear combination of four basis functions:

b0(x,y,z) = 1
b1(x,y,z) = x
b2(x,y,z) = y
b3(x,y,z) = z

f(x,y,z) = a0 * b0(x,y,z) + a1 * b1(x,y,z) + a2 * b2(x,y,z) + a3 * b3(x,y,z)

Higher-order representation can be obtained by adding more basis functions (e.g. basis polynomials). More generally, I think of basis functions as some characteristic spatial or temporal distribution of specific field value.

Okay. So that’s how I typically use basis functions to approximate the behavior of real world continuous field variables. To bring this back to neural network representations, imagine that you have a spatially varying sensory input that is being discretized by the individual responses of nerve cells (e.g. in the retina, or fingertip). These responses are spatially correlated and persistent such that whenever the sensor returns to a specific position, the response is likely to be nearly identical. Furthermore, let’s assume that the pattern of responses across all observed inputs are not random, but heavily dependent upon the structure and behavior of the world that is being sensed. Thus, it is highly likely that the same kinds of input patterns are going to be sensed quite frequently, but maybe in different combinations (i.e. this pattern is more intense with respect to that one; this pattern is translated/rotated with respect to that one, etc.).

If these input patterns are arriving in the brain as a collection of neural potential spikes that are correlated to the spatial variation of the persistent input stimulus, then the synapses on the dendrites picking up on these spikes will begin to learn those patterns. The connectivity of each dendrite becomes a sort of basis function/filter by being a coincidence detector. Thus, the SDR that results from the post-synaptic activations can be thought of as a projection of the input onto the basis-space represented by the dendritic synapses. This has the advantage of being a very efficient way of storing semantically similar input simply by learning what spatial patterns occur together most often and triggering SDR bits in proportion with the strength of those patterns in the input.

My thinking in this area has grown rapidly in the past year due to discussions here in the forum as well as my reading into the theory and application of sparse and redundant representations in signal and image processing applications. It’s a fascinating topic that I think can applied in a number of interesting ways.

[edit] @rhyolight; Feel free to move this to a new topic if you feel it needs it’s own thread. [/edit]

2 Likes

Thanks Eric for those ideas and the resources you pointed out. The projects from Perrinet’s lab look interesting!

1 Like

There is another discourse group focusing on continual learning:
https://continualai.discourse.group/

Off-topic: I was steered to this interesting paper on Deep Random Neural Networks:
https://arxiv.org/abs/2002.12287
There are activation functions that don’t work with back-propagation or evolution, however random construction followed by pruning should allow you to use them. I found that idea very interesting. I might try with hill-climbing, random projections and some of those difficult activation functions.

1 Like

Yes, we’re both using the word “basis” in the same way. I like to refer to two examples:

  1. The HTM spatial pooler learns a basis for representing the sensory input. It has a lot of overlap with the process you describe.
  2. The result in Yamins et al.'s 2014 “Performance-optimized hierarchical models predict neural responses in higher visual cortex” which gave impressive evidence that higher visual cortex is tuned to encode sensory input in a basis that enables novel objects to be linearly classified.

I talked a lot about that second paper a while back, first here: https://youtu.be/-Z7OhH2pULQ then some followup here: https://youtu.be/HLNWiq-nfrs?t=3974 . They train the network on one set of classes. Then they freeze the network and feed it a totally different set of classes and train a linear classifier on the network’s output. If the network has been well-trained on the statistics of the world, then it is often possible for a linear classifier to learn to classify these novel objects based on the neural network output (without the neural network needing any retraining). They showed that if the network had this property, it could also be used to predict higher visual neural activity. So the top layer of this network, and by implication the higher visual areas, have learned to represent sensory input in a basis that enables linear classification of novel objects. This is a cool idea.

1 Like

Wow @CollinsEM, this is a great explaination.

Do you think it’s possible that each biological minicolumn can get tuned to one of those bases?

And in the same reasoning, do you think there could be a fundamental set of bases that can encode every form on input (sensorial or abstract) our brain needs to process? A bit like a fundamental universal code?

It’s certainly possible, but I would be concerned that such a system might be too brittle for production use. The complementary idea to sparse representation is redundancy. For computational and storage efficiency, one would want the basis functions to be as orthogonal as possible. That would permit greater information density per degree of freedom. However, for robustness and possibly for generalization purposes, you would probably want to have a large number of basis functions which are semantically similar, but still unique. (The orientation selectivity across minicolumns in V1 comes to mind.) Such redundancy would allow the system to function in the presence of internal noise or damage, and could also permit greater nuance in the generated SDR’s.

Again, this is theoretically possible, but it’s hard to say if such a system would be practical. As we have seen, the representational capacity of SDR’s is immense. However, these SDR’s are generated through specific combinations of activated network connections. While the number and complexity of these connections could, in theory, increase to accommodate all possible representations; in practice we are dealing with finite computational resources. Also, these basis functions are not known a priori. They are learned (in the Hebbian sense). Therefore, the system would have to either be exposed to all possible sensory and abstract states, or have some internal capacity for synthesizing such states. I think the former would be impractical, and the latter would be improbable (though not necessarily impossible).

1 Like

Well, @jhawkins at some point suggested a cluster of 400 minicolumns, and I think he pictured a sea urchin-like object where each needle is the orientation of one minicolumn. I found that concept very intriguing.

If the minimum set of basis functions is much lower than 400, that would allow for overlap.

But could it be that this fundamental basis set is intrinsic to nature and fysics, and that the evolved structure of minicolumns is able to learn them, no matter what input they are exposed to?

I feel I’m freewheeling too much here. Do you have a good introduction into this topic? I found tutorials on signal processing, but they where very specific on a domain (mostly visual). I’d love to find a clue if a same set of fundamental basis functions could be applied to visual, sonar and different types of abstract representations.