I have a hypothesis, please help me find out where I'm wrong

I am coming up with a reinforcement learning algorithm based on TBT, I want to know how close I am, but I suck at reading papers or searching for them.

I have a few predictions (more like guesses) about the brain that are kinda required for this algorithm to be implemented in neural hardware, so if you guys can help me find out which are right (if any).

guess #1: muscle spindle signals arrive at layer 5/6.
guess #2: layer synapses between layer 5 neurons have asymmetric anti-hebbian plasticity.
guess #3: layer 2/3 neurons synapses have symmetric hebbian plasticity.
guess #4: basal ganglia’s “reward” signals project to either layer 5 or 6.
guess #5: connections between neurons of the same type in layer 5 are dense and mostly distal, while connections from layer 2/3 to layer 5 are rare but mostly proximal
guess #6: projections arriving at layer 5 from subcortical areas that do not relate to “reward” are mostly proximal.

Again, I have no idea of how good or bad those guesses are.

I’d say, don’t worry about the neural hardware too too much. I guess check that it’s not totally inconsistent with the hypothesis, but if it’s unclear (like usual), focus on the theory instead.

I’ll try to answer your questions, but to really give anything useful, I’d need to know the algorithm and spend like a month reading papers to check each thing. For AI, neuroscience facts are a tricky thing, because it’s not really about individual facts, but generic cortical principles. Exceptions don’t exactly invalidate them, because the hypothesized principle could just be slightly wrong. Lack of exceptions don’t validate them, because there’s always a lot going on so cherrypicking things is too easy.

I don’t mean to discourage you from learning about neuroscience, though. Some facts are more straightforward than they usually are, and it’s good to know what tools you have to work with. There are also ways to use neuroscience for designing AI besides generic cortical principles. For example, sometimes things seem to contradict assumptions, and sometimes things inspire theoretical ideas.

When I started, the terminology alone was pretty overwhelming. It’s a bit better in things besides papers, like wikipedia articles, but still takes time. It’s probably easiest to focus on something specific, like a specific topic or part of the brain. Every time I read about a new topic, there’s lots of words and concepts I don’t understand.

I don’t think so, but I’m not sure.

I’d worry about this part’s being true last.

Generally, plasticity is asymmetric, although I’d guess not always. The order in which cells fire is usually, presynaptic cell fires first and then postsynaptic cell fires second → reinforce the synapse, otherwise weaken the synapse. In layer 5, the order can reverse. The scholarpedia article on spike timing dependent plasticity seems to say that in general, but I’m not sure it’s always true, because I recall it being based on burst firing mode.

As far as I know, basal ganglia doesn’t project directly to the cortex. Instead, they project to the thalamus.

There are multiple classes / groups of cells in each layer, each with distinct connections and properties. One of the most prominent connections between layers is from L2/3 to L5. I’m not sure whether they’re proximal, so maybe the proximal ones are rare. There are connections in layer 5, but I don’t know whether they’re dense and distal.

This is definitely possible, at least for sensory input sent by the thalamus.

A lot of what the muscles do is mediated directly by the spinal cord.
The brain establishes a “setpoint” and the spine works to adjust the muscle to reach that setpoint.

How do you arrive at that conclusion - just on the basis of the complexity of the system? I would guess that a lot of the circuitry involves coordination and timing.

It seems that the voting on action happens in subcortical structures.

By reading many research papers on the actual circuitry configuration and instrumented behavioral studies. If you read the paper attached to that statement you may see the reason I made that statement.

I had already read the paper but did just read it again and I still don’t understand how it relates to voting. Since I know that you know more than I do about it I was hoping for some explication. Maybe I should ask in a different question then. Since the study is strictly anatomical what elements of the anatomy indicate a voting capability?

Please go back and re-read my post, I have copied the relevant sections of the linked thread.

PFC, thalamus and Basal ganglia have many well studied pathways.

If I were to bet my money, I’d say that subcortical RL regions innervate mostly PFC and don’t have access or ignore for the most part lower level, or association regions. It’s for the fact that they need conceptually higher order predictions to work.

It’s not surprising, as lower level predictions have a lot of noise, and only once consensus is reached between them, they are useful. So RL parts probably have access to brain’s symbols and representations in PFC that are formed from lower levels (completely different from famous RL algorithms that work on raw pixels, raw configurations, as they run right on the lower level predictions). So, I doubt RL part of brain works on V1 or V2 area. It probably doesn’t have access to that part, or completely ignores it.

Cortical layers in PFC are structurally different from other parts of the brain. They have famous neurone that are only seen in PFC and run through all the layers. Von Economo neuron - Wikipedia.

That’s probably a way to help them better assign representations accessible to lower level algorithms that are ran by the brain (RL one of them).

so more like this, you know that there are Jennifer anniston neurons. That’s a neuron that in one specific person, only fires when shown a picture of Jennifer anniston. So my guess is that instead of RL to innvervate every layer of the cortex, they just work at Jennifer anniston level. If that makes sense.

Thank you guys for your help.


I have not seen it mentioned here yet so, in the unlikely event that you are not already aware, Nature’s latest edition (Oct 7 2021) will be of interest in this and many other regards relating to HTM.

I think I am getting it to work, its still not complete and is only using the pooling layer directly without feedback for now, but I guess I’m on the right track because it’s already learning something.

This is basically a hello world, but I’m happy with the result.


Very interesting and oddly alien looking (whatever that is).

What code did you use?

it was supposed to be some sort of racetrack and that little rectangle is an agent whos task is to learn to stay in the lane by turning left or right.

I’m doing this from scratch, so I didn’t use anyone else’s code other than the pygame lib.

1 Like

I dunno about reward-based control over cortex, but in terms of signals from cortex to basal ganglia, corticostriatal cells are ubiquitous as far as I’ve seen.

That’s good to know. I’m having trouble figuring out the context. What were you responding to specifically?

1 Like

The OP talking about spindle cells projecting to cortex. It’s considerably more complicated than that.

1 Like

Made a few tweaks to the model, it is not yet using an HTM because I want the basics to work, so far the input pooling layer is already a good predictor of reward, later I’ll add an HTM in between.

The output (action) layer is a spatial pooler too, but the winner cells are constrained to all code for the same action, if the winner action has increased the predicted reward in the next step, these cells get its permanence updated, they get inhibited on next step otherwise.

Interestingly enough, this algorithm seems to like actions that cause a lot of change to the input, if I add cells that turn the bot less and cells that turn it more, it will start to prefer to use the stronger cells to ignore the weaker ones.


@JarvisGoBrr cool animation!

haha, the graphical power of pygame.

Yes, that’s what I was trying to convey. RL is just localized into narrow parts of the cortex specialized to present data in a way that RL algorithms can use, that’s why I was saying most likely, it doesn’t access every part.