Optimization algorithm of the brain

nunoo · December 9, 2017, 7:34pm

Assuming human beings at their most basic are driven by the need to reduce the level of surprise in the world, meaning prediction error, and that the physical translation of that is a mechanism to build a model of the world whose convergence engine is the minimization of the energy spent in surprises, that the brain is that model and that its plasticity is the mechanism to reach the lowest average energy consumption in time, what would be, from all the optimization algorithms out there, the one that would best model computationally the dynamics of neuronal plasticity ? Did nature decide to implement gradient descent ? Seems “too mathematical”!

Thank you!

rhyolight · December 11, 2017, 5:39pm

I disagree with the assumption that surprises are by nature a bad thing and should be avoided.

sheiser1 · December 11, 2017, 5:57pm

If I understand your question right I think you’ve come to the right place! Disclaimer I’m not a neuroscience expert by any means, but I think this concept of plasticity is something that HTM does completely differently than any other neural network I’d come across (as in HTM actually does it).

I’m not sure how much of the HTM algorithms you’ve gone through, but I think the model of dynamics you speak of is what the Temporal Memory is doing. If you haven’t already I’d highly recommend the HTM School videos on youtube. There are 2 so far on Temporal Memory, to me the theory behind the TM is the closest answer to your question (or at least coolest in my opinion).

TM Episode 1:

TM Episode 2:

If you want to get a nice clear visual overview of HTM theory and how the algorithms work I’d say watch all the episodes

dimitrispp · December 11, 2017, 5:58pm

In the context of the Bayesian Brain hypothesis, “Surprises” do not have an emotional valence. There is no assumption that they are “bad” or “good” or that should be “avoided” or “welcomed”. Mathematically, they are just a way of formulating the loss function. One can think of this as an alternative to “Rules”: minimising Surprise = learning a Rule.

sheiser1 · December 11, 2017, 7:06pm

The ‘surprise’ of a given input is quantified in the anomaly score, which is the proportion of cells that became active at a given time step and were not expected to. Since this can be very volatile from time step to time step depending on the noise level of the data, another value (the anomaly likelihood) is used to look at the total surprise over a certain time window relative to the system at large.

dimitrispp · December 11, 2017, 7:19pm

thanks, can you suggest a reference?

sheiser1 · December 11, 2017, 7:43pm

Certainly. Here are the things I would recommend looking at first:

HTM School on youtube, episodes 11 and 12 (though I’d definitely watch them all for a complete understanding)
Numenta’s BAMI book, which has a lot including algorithm pseudocode with theoretical explanations and visualizations: https://numenta.com/biological-and-machine-intelligence/
Subutai Ahmad talking about anomaly detection with HTM, they did a big study (‘NAB’) comparing anomaly detection methods in streaming environments: https://www.youtube.com/watch?v=Nf2BNqrSg28
The source code for temporal memory, which I’d recommend once you have a good sense of the pseudocode from the BAMI book. To test my understanding I first read and digested the pseudocode, then read through the source code and re-wrote my own pseudocode from it: https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/temporal_memory.py

Happy hunting

dimitrispp · December 11, 2017, 8:13pm

thanks

CollinsEM · December 11, 2017, 11:42pm

I’ve been working on a similar hypothesis for sometime now. The way I see it, there are roughly three general states that a given (intelligent) system might be in: bored, confused, and surprised. (There are likely more states, but these should suffice for this discussion.) For the sake of a more concrete metaphor (but at the risk of creating confusion) I’ll attempt to use HTM layers as an example. A bored network layer is successfully predicting each successive input with high confidence (i.e. no ambiguous predictions, nearly all predictive cells become active). A confused layer is also predicting successfully, but is currently making many simultaneous predictions (i.e. insufficient input data to resolve sequence ambiguities, many more predictive cells than are becoming active). Finally, a surprised layer is experiencing novel or unexpected input sequences (i.e. many non-predictive cells becoming active and/or columns bursting).

The over-arching theme of this hypothesis is that our attention is naturally drawn to phenomena that we are not currently able to predict with high confidence. Obviously, we evolved to ignore the mundane aspects of our environment and to focus our attention on the things behaving erratically or otherwise causing us uncertainty. If my hypothesis is correct (and I’m still working on the experiments to test it out), then there should be some natural behaviors that arise in response to these different states. These might include: expressing boredom and/or curiosity in the form of novelty-seeking, or being able to rapidly gather additional input to efficiently resolve ambiguous or confusing data.

Bitking · December 12, 2017, 5:10pm

Perhaps you may want to consider some other states:
Reproduction, hunger, thirst, tired (looking for a safe place to shelter), exploration of the environment (play), grooming, fighting, fleeing.
WIthin these states are sub-activities.
Attention is a part of most of these. (identification of goal)
So is goal seeking. (reduction of distance between current state and current goal)

Ask yourself: WWLD (What would a lizard do) to identify the basic states.

jakebruce · December 12, 2017, 9:14pm

Think of it this way. You want a predictive model that is minimally surprised in the maximal number of situations. That’s the quality measure of a predictive model. However, the cool thing about a learning machine is that you minimize surprise in the long term by maximizing surprise in the short term!

nunoo · December 12, 2017, 10:40pm

Thank you all !!

If HTM stands on the grounds that the neocortex is a prediction mechanism, and if by definition, prediction is trying to minimize surprises (“error”), my assumption states that it must be doing so for a purpose. It’s possible and likely that the emotional equipment and its wiring have evolved to enforce the idea that chances for survival would be higher if there was some sort of behavioral drive for controlled exposure to surprise - call it curiosity, avoidance of boredom, etc, but my point is not behavioral, it’s about the assumption that (1) the anatomical reality of the neocortex is driven by minimization of energy consumption while interacting with the world and (2) that the resulting grid spends less energy on the expected inputs than on the unexpected ones.

(2) Most of your life is spent unsurprised, otherwise there would be no point in evolving a neocortex. This one depends on surprise to evolve, but not as much as your survival depends on it to help you NOT be surprised. It therefore makes sense to expect that transforming the unknown to known bring down the energy bill of the overall grid, otherwise you would be consuming nutrients episodically (surprise), to consume even more most of the time (no surprise). Concretely this should mean that preparing and firing a polarized soma is cheaper than firing an unpolarized one. But there could be local compromises done on behalf of the optimum for the whole neocortex, that could make this false, see below.

(1) The brain encodes the experience in real time, in the structures that receive the input. It’s as much optimized to immediate reality as it can possibly be at any point in time, but it’s possible that if there is an underlying long term global optimization process taking place, its convergence is more erratic, as the locally induced anatomic changes propagate to affect the form of an hypothetical global energy cost function. If in time, minimizing consumption of energy - a crucially scarce biological resource - is driving the composite modifications of the neocortex, the question would therefore be about the computational model of that evolution. But there would have to be some medium for this global state and not only I have no idea this even exists, but I also have serious doubts about the quality of life this would entail Rationality is driven by causality, not energy consumption.

rhyolight · December 13, 2017, 1:23am

2 posts were split to a new topic: Prediction and Representation

Bitking · December 13, 2017, 7:40am

I think it is only fair to point out that the error signal from the miss-match between prediction and ground truth could modulate the earning rate. “surprise” could be a useful name for the error signal. This relationship was mentioned in the Crick “spotlight of attention” paper.

It’s a relatively well-known fact that the outputs from the amygdala modulate learning rates. Adding an “emotional” flavor to the sensed event is a clever way to code in “judgment” and “values.” That would not be surprised but more of a “scold” or a “kiss.” (Yes - your care-givers words shape your learning.)

Learning your environment and your physical agency in the world demands “exploration.” The “play” is a reasonable behavior that cubs of many species engage in. Perhaps “playfulness” will be the expected behavior of social robots.

nunoo · December 13, 2017, 11:48am

Mark, do you know if there’s any scientific evidence that would support a general model of emotion as a modulation of surprise, as an internal knob to amplify resistance to surprise (lower the energy budget by anatomical synapse re-enforcement) or to inject surprise (explicit connectivity degradation => increase the energy budget => trigger behavioral change/exploration) in the neocortex ?

Could rational behavior in the end be just the continuous, incremental solving of a thermodynamic equation in the neocortex ?

Bitking · December 13, 2017, 2:18pm

Contagious surprise - see section 5.

since this modulation is in response to the surprise of others it implies that the Amygdala is able to influence the response of the host. Understanding the source of this surprise in the PFC, in turn, modulates the activity in the amygdala.

Bitking · December 13, 2017, 2:27pm

Note the excitatory effects of arousal which modulate surprise.
A stick gets a very different response compared to a snake.

nunoo · December 14, 2017, 2:05pm

Thank you !

Topic		Replies	Views
Complementary Learning Systems theory and HTM as a theory of the hippocampus Tangential Theories hippocampus , sparsity , one-shot-learning , replay	8	1498	June 27, 2023
HTM vs. bayesian inference (network), predictive coding General Neuroscience question	11	2117	April 2, 2018
Two Papers on Differentiable plasticity Current Research journal-club	4	900	September 23, 2019
Read this first Site News	2	6709	March 24, 2019
Time-compressed preplay of anticipated events in human primary visual cortex General Neuroscience sequence-memory	3	757	June 4, 2017

Optimization algorithm of the brain

Related topics