What is the Cerebellum doing?

It is clear because the cart-pole can not be successfully controlled by classical controllers.
I am also interested in how you implement the negative controllers based on the current HTM.

Could you @dmac please explain it more in details or can you share us your codes?

Thanks

1 Like

@thanh-binh.to I don’t know what you mean by “classical controller”, but a PID controller with the right parameters will easily balance the cartpole. At least that’s what hoverboards and other two-wheel balancing bots use.

It is not clear in @dmac’s movie whether the cartpole is applying a learning algorithm to update its coefficients over the 3 episodes, or if they are showcasing the same set of already trained, (or hand-tuned?) coefficients.

All three seem to slowly increase oscillating (and over longer episodes would fall), which a PID would be able to dampen.

1 Like

I use the HTM to predict one timestep into the future. It predicts the sensory input that it will see, given the current motor action.

The way this works is:
First i ask the simple controller what it wants to do, to keep the pole balanced.
Then i ask the HTM predictor what it thinks is going to happen as a result of the controller’s action.
Next i tell the controller “here is what’s going to happen as a result of what you just did” and then i give the controller a chance to correct any mistakes in its actions.
Finally, i combine both the original and the revised actions, send it to the motor, and wait until the next timestep.

1 Like

@cezar_t yes, something like PID, at my PhD-student time 30 years agoafter tuning we can get it works for some limited episodes. After we tested with ANN and get better results.

For HTM based RL I use the idea of Profile - sunguralikaan - HTM Forum
and only get max 49 episodes (best run)

1 Like

Thanks for your feedback @dmac !
How many episodes can you balance the cart pole?

1 Like

The coefficients for the closed loop controller are hardcoded constants. Only the HTM-based predictor learns. The video shows the first three episodes, demonstrating zero-shot online learning.

2 Likes

Sounds interesting, did you also shared anywhere this solution for CartPole, like github?

2 Likes

you can get his SW in Python here

and adapt it for Cart-Pole.

3 Likes

Thanks, that’s quite intimidating.

It seems it attempts to solve environments in “visual” mode by learning (and simulating) feedback via mouse interaction which is not what simple gym tasks (like CartPole, MountainCart, LunarLander, etc…) are meant for.

Therefore a very complicated structure for anyone trying to approach simple RL and HTM problems via this route.

2 Likes

5 Likes

Exactly that, it is not very helpful. Most software/ ML people did not graduate in neurology to know what those cryptic Letter-Digit codes mean.

1 Like

The forward model of hypothesis was discovered in the context of process-controls-engineering, by Otto Smith in 1957, and applied to the cerebellum before I was born.

Overcoming process deadtime with a Smith Predictor

A controller equipped with an accurate process model can ignore deadtime. Deadtime generally occurs when material is transported from the actuator site to the sensor measurement location. Until the material reaches the sensor, the sensor cannot measure any changes effected by the actuator.

By Vance VanDoren, PhD, PE February 17, 2015

For the purposes of feedback control, deadtime is the delay between the application of a control effort and its first effect on the process variable. During that interval, the process does not respond to the controller’s activity at all, and any attempt to manipulate the process variable before the deadtime has elapsed inevitably fails.

[…]

Read More At: https://www.controleng.com/articles/overcoming-process-deadtime-with-a-smith-predictor/




Is the cerebellum a smith predictor?

Miall, Weir, Wolpert, Stein (1993)
https://doi.org/10.1080/00222895.1993.9942050
Free PDF: https://wolpertlab.neuroscience.columbia.edu/sites/default/files/content/papers/MiaWeiWol93.pdf

Abstract

The motor system may use internal predictive models of the motor apparatus to achieve better control than would be possible by negative feedback. Several theories have proposed that the cerebellum may form these predictive representations. In this article, we review these theories and try to unify them by reference to an engineering control model known as a Smith Predictor. We suggest that the cerebellum forms two types of internal model. One model is a forward predictive model of the motor apparatus (e.g., limb and muscle), providing a rapid prediction of the sensory consequences of each movement. The second model is of the time delays in the control loop (due to receptor and effector delays, axonal conductances, and cognitive processing delays). This model delays a copy of the rapid prediction so that it can be compared in temporal register with actual sensory feedback from the movement. The result of this comparison is used both to correct for errors in performance and as a training signal to learn the first model. We discuss evidence that the cerebellum could form both of these models and suggest that the cerebellum may hold at least two separate Smith Predictors. One, in the lateral cerebellum, would predict the movement outcome in visual, egocentric, or peripersonal coordinates. Another, in the intermediate cerebellum, would predict the consequences in motor coordinates. Generalization of the Smith Predictor theory is discussed in light of cerebellar involvement in nonmotor control systems, including autonomic functions and cognition.

3 Likes

Hey, good find!! I think that Miall, Weir, Wolpert, Stein paper is proposing essentially the same model as the one I like and described above. Or at least it’s related. I have just added it to my blog post as a reference. Thanks again. :slight_smile:

2 Likes