Optimization algorithm of the brain


Assuming human beings at their most basic are driven by the need to reduce the level of surprise in the world, meaning prediction error, and that the physical translation of that is a mechanism to build a model of the world whose convergence engine is the minimization of the energy spent in surprises, that the brain is that model and that its plasticity is the mechanism to reach the lowest average energy consumption in time, what would be, from all the optimization algorithms out there, the one that would best model computationally the dynamics of neuronal plasticity ? Did nature decide to implement gradient descent ? Seems “too mathematical”!

Thank you!

Free Energy Principle

I disagree with the assumption that surprises are by nature a bad thing and should be avoided.


If I understand your question right I think you’ve come to the right place! Disclaimer I’m not a neuroscience expert by any means, but I think this concept of plasticity is something that HTM does completely differently than any other neural network I’d come across (as in HTM actually does it).

I’m not sure how much of the HTM algorithms you’ve gone through, but I think the model of dynamics you speak of is what the Temporal Memory is doing. If you haven’t already I’d highly recommend the HTM School videos on youtube. There are 2 so far on Temporal Memory, to me the theory behind the TM is the closest answer to your question (or at least coolest in my opinion).

TM Episode 1:

TM Episode 2:

If you want to get a nice clear visual overview of HTM theory and how the algorithms work I’d say watch all the episodes


In the context of the Bayesian Brain hypothesis, “Surprises” do not have an emotional valence. There is no assumption that they are “bad” or “good” or that should be “avoided” or “welcomed”. Mathematically, they are just a way of formulating the loss function. One can think of this as an alternative to “Rules”: minimising Surprise = learning a Rule.


The ‘surprise’ of a given input is quantified in the anomaly score, which is the proportion of cells that became active at a given time step and were not expected to. Since this can be very volatile from time step to time step depending on the noise level of the data, another value (the anomaly likelihood) is used to look at the total surprise over a certain time window relative to the system at large.


thanks, can you suggest a reference?


Certainly. Here are the things I would recommend looking at first:

  1. HTM School on youtube, episodes 11 and 12 (though I’d definitely watch them all for a complete understanding)

  2. Numenta’s BAMI book, which has a lot including algorithm pseudocode with theoretical explanations and visualizations: https://numenta.com/biological-and-machine-intelligence/

  3. Subutai Ahmad talking about anomaly detection with HTM, they did a big study (‘NAB’) comparing anomaly detection methods in streaming environments: https://www.youtube.com/watch?v=Nf2BNqrSg28

  4. The source code for temporal memory, which I’d recommend once you have a good sense of the pseudocode from the BAMI book. To test my understanding I first read and digested the pseudocode, then read through the source code and re-wrote my own pseudocode from it: https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/temporal_memory.py

Happy hunting :slightly_smiling_face:




I’ve been working on a similar hypothesis for sometime now. The way I see it, there are roughly three general states that a given (intelligent) system might be in: bored, confused, and surprised. (There are likely more states, but these should suffice for this discussion.) For the sake of a more concrete metaphor (but at the risk of creating confusion) I’ll attempt to use HTM layers as an example. A bored network layer is successfully predicting each successive input with high confidence (i.e. no ambiguous predictions, nearly all predictive cells become active). A confused layer is also predicting successfully, but is currently making many simultaneous predictions (i.e. insufficient input data to resolve sequence ambiguities, many more predictive cells than are becoming active). Finally, a surprised layer is experiencing novel or unexpected input sequences (i.e. many non-predictive cells becoming active and/or columns bursting).

The over-arching theme of this hypothesis is that our attention is naturally drawn to phenomena that we are not currently able to predict with high confidence. Obviously, we evolved to ignore the mundane aspects of our environment and to focus our attention on the things behaving erratically or otherwise causing us uncertainty. If my hypothesis is correct (and I’m still working on the experiments to test it out), then there should be some natural behaviors that arise in response to these different states. These might include: expressing boredom and/or curiosity in the form of novelty-seeking, or being able to rapidly gather additional input to efficiently resolve ambiguous or confusing data.


Perhaps you may want to consider some other states:
Reproduction, hunger, thirst, tired (looking for a safe place to shelter), exploration of the environment (play), grooming, fighting, fleeing.
WIthin these states are sub-activities.
Attention is a part of most of these. (identification of goal)
So is goal seeking. (reduction of distance between current state and current goal)

Ask yourself: WWLD (What would a lizard do) to identify the basic states.


Think of it this way. You want a predictive model that is minimally surprised in the maximal number of situations. That’s the quality measure of a predictive model. However, the cool thing about a learning machine is that you minimize surprise in the long term by maximizing surprise in the short term!


Thank you all !!

If HTM stands on the grounds that the neocortex is a prediction mechanism, and if by definition, prediction is trying to minimize surprises (“error”), my assumption states that it must be doing so for a purpose. It’s possible and likely that the emotional equipment and its wiring have evolved to enforce the idea that chances for survival would be higher if there was some sort of behavioral drive for controlled exposure to surprise - call it curiosity, avoidance of boredom, etc, but my point is not behavioral, it’s about the assumption that (1) the anatomical reality of the neocortex is driven by minimization of energy consumption while interacting with the world and (2) that the resulting grid spends less energy on the expected inputs than on the unexpected ones.

(2) Most of your life is spent unsurprised, otherwise there would be no point in evolving a neocortex. This one depends on surprise to evolve, but not as much as your survival depends on it to help you NOT be surprised. It therefore makes sense to expect that transforming the unknown to known bring down the energy bill of the overall grid, otherwise you would be consuming nutrients episodically (surprise), to consume even more most of the time (no surprise). Concretely this should mean that preparing and firing a polarized soma is cheaper than firing an unpolarized one. But there could be local compromises done on behalf of the optimum for the whole neocortex, that could make this false, see below.

(1) The brain encodes the experience in real time, in the structures that receive the input. It’s as much optimized to immediate reality as it can possibly be at any point in time, but it’s possible that if there is an underlying long term global optimization process taking place, its convergence is more erratic, as the locally induced anatomic changes propagate to affect the form of an hypothetical global energy cost function. If in time, minimizing consumption of energy - a crucially scarce biological resource - is driving the composite modifications of the neocortex, the question would therefore be about the computational model of that evolution. But there would have to be some medium for this global state and not only I have no idea this even exists, but I also have serious doubts about the quality of life this would entail :slight_smile: Rationality is driven by causality, not energy consumption.


2 posts were split to a new topic: Prediction and Representation


I think it is only fair to point out that the error signal from the miss-match between prediction and ground truth could modulate the earning rate. “surprise” could be a useful name for the error signal. This relationship was mentioned in the Crick “spotlight of attention” paper.

It’s a relatively well-known fact that the outputs from the amygdala modulate learning rates. Adding an “emotional” flavor to the sensed event is a clever way to code in “judgment” and “values.” That would not be surprised but more of a “scold” or a “kiss.” (Yes - your care-givers words shape your learning.)

Learning your environment and your physical agency in the world demands “exploration.” The “play” is a reasonable behavior that cubs of many species engage in. Perhaps “playfulness” will be the expected behavior of social robots.


Mark, do you know if there’s any scientific evidence that would support a general model of emotion as a modulation of surprise, as an internal knob to amplify resistance to surprise (lower the energy budget by anatomical synapse re-enforcement) or to inject surprise (explicit connectivity degradation => increase the energy budget => trigger behavioral change/exploration) in the neocortex ?

Could rational behavior in the end be just the continuous, incremental solving of a thermodynamic equation in the neocortex ?


Contagious surprise - see section 5.

since this modulation is in response to the surprise of others it implies that the Amygdala is able to influence the response of the host. Understanding the source of this surprise in the PFC, in turn, modulates the activity in the amygdala.


Note the excitatory effects of arousal which modulate surprise.
A stick gets a very different response compared to a snake.


Thank you !