HTM Based Autonomous Agent

Sure, here you go. However, I am not sure if it is self-explanatory because it mostly consists of figures from the thesis.

1 Like

Uow! These 3D visualizations are insane!!! Congratulations!! It gave me new ideas about my new tool related to robotics which will be integrated to Nupic Studio (which it now looks a stone age tool compared to yours…:sweat_smile: Maybe it’s time to replace PyQtGraph library to some 3D game engine to show neurons… :thinking:).


Thanks for your ppt!

1 Like

I am having some difficulty visualizing how a forward RL strategy might be implemented with HTM-like neurons. Forgetting about pooling, layers, and other brain structures for a minute, at the most basic level, it is easy to visualize how future rewards can be learned and predicted by visualizing a cell representing reward growing connections with other cells that represent motor/context over time. Something like:


This is of course depicting a backward RL strategy. The motor/context cells are essentially an eligibility trace which a current reward is able to connect with when it happens some time after that motor/context occurred, allowing it to be predicted in advance when a semantically similar motor/context occurs again.

What would be the basic connections between neurons (and what would those neurons represent) in a forward RL strategy? I know this is getting off topic from suguralikaan’s paper, so we can break this into a new topic if the answer isn’t simple.

1 Like

Hey Paul, I totally agree. I’m not sure whether a forward strategy is at all plausible in a biological model, which I suppose is why the backward view is referred to as the mechanistic view. I was referring to the functional equivalence of the two views, so it becomes an implementation detail having more to do with constraints on the substrate than the algorithm.

For example, you could do a forward approach in HTM by batching short rollouts (typical policy gradient implementations these days use around 30 timesteps) and computing the gamma and lambda returns on those batches to update your permanence values. Not aesthetically appealing or bio-plausible, but research has shown this approach to be computationally equivalent to the backward view under mild assumptions.

Whether you’d want to do that for HTM I don’t know, but it means the usual policy gradient implementations are fair enough comparison targets (backpropagation objections aside).

1 Like

For a nice model like yours I have to recommend as a benchmark environment of the neuroscientific moving invisible shock zone arena. I ended up needing one for my minimal system, where the challenge is to (from nerobiological information) demonstrate two frame place avoidance as well or better than a live rat, using the least amount of code.

A simple virtual 2D environment with robot platform type left/right and forward/reverse control helps keep the test equal. This focuses on what is most important to fully understand and test before adding a third dimension to the environment.

Game engines look wonderful and have fast graphics. But neuroscience needs what neuroscientists use in the lab, where normally it is a 2D problem that the animals must solve.

I found this test that delivers a shock when at the wrong place at the right time is very good for indicating when the code needs work. There is also the useful signal information for live rats in the paper that goes with it. This adds another level of testing where the model must somehow match experimental data. A neuroscientist then better understands how a model relates to biology, and has a reason to take it seriously.

You have talent this forum needs. Numenta is on a neuroscientific mission where the best thing to have is what the animal labs use to test navigational skills and other behaviors. I hope that sounds like something you would be interested in helping to develop.


Hello @sunguralikaan,
First of all congratulations for this amazing thesis work! I am really impressed by it.

I have some detail questions that I could not quite get from the paper (maybe overlooked) regarding the L5 layer where everything integrates. As I am currently playing around with RL and HTM using a customized Nupics network API orienting on your architecture as the experimenting and researching you did is extremely valuable to learn from.

In you model the cyclic flow is described the following:

  • L5_SP layer is pooling from the feedforward input from L4 (spatial, weighted = active and predicted neurons as the TP in Nupic supports, but no temporal abstraction)
  • L5_TM is predicting the active columns in the context of L2 and L4 active neural input.
  • The active columns of the L5_SP layer are feed forward input for D1/D2
  • The active neurons of the L5_TM layer are distal input for D1/D2, which both try to predict the next state of L5 with the given information. This predictions are utilized to determine the TD error.
  • L5_Apical connections to D1 and D2 are activated and permanences updated depending on the TD error.

In L5 we now have active cells, distal depolarized cells and apical (D1/D2) depolarized cells.

  • In the given framework it is stated that cells which are depolarized by distal AND apical connections will become active.
  • The next step is learning the motor-activation by association. This is done through apical connections from the L5 layer to the motor neurons. Here learning depends on the firing-type leading to either excitation (D1 activated) or inhibition (D2 activated).

In the last integration is where I am slightly confused.

  1. Does that mean Motor neurons only connect to neurons in L5 that are “voluntary” active - meaning active through a combination of apical and distal depolarization? Or also to the neurons that are active through proximal input interpreted in distal context? How do this neurons then influence the excitation level?

  2. Are the apical connections (formed at previous timesteps) from D1/D2 also used in L5 to sparsify the active columns to neural activation (additionally to using distal connections to L2/L4)?

In case of 2. I believe not. But if not wouldn’t it be possible to input (and learn) apical D1/D2 and L5 distal connections directly to the Motor layer and then take their intersection to excite/inhibit the neurons without loss of generality from the algorithm (as the learned apical connections are only used for Motor excitement?)?

This would be interesting, as NUPICs Network API does not support symmetric computation well and it would need a lot of customization hacking.


Maybe you could clarify some of it and I hope you support the idea of a NUPIC implementation that builds on your architecture.

@Gary_Gaulin I thought about this as well, following up Numentas paper for a version implemented in a neural simulator such as NEST and integrated e.g. in the HBP (Human Brain Project) Neurorobotics platform. Would definitely be a very interesting project and maybe make it easier to compare with experimental data/collaborate with neuroscientists.

Kind regards

1 Like

@Gary_Gaulin thank you for the interest and good points. I am trying to steer into this sort of biological experimental data to use as a reference point in my PhD for the reasons you stated above.

On the other hand, it is hard to beat games on the marketing front. Drawing the attention of many young researchers is also very important on the long run. Speaking from a game development perspective, applying any sort of agent AI is a lot easier on 2D and AAA games are mostly interested in 3D. Here we have a contender for a real-time 3D approach.

The game industry does not utilize the advancements in machine learning or AI for many reasons and I think we are missing huge on this alone. They spearhead the state of the art computer graphics, why not machine intelligence?


Thank you for the kind words @kaikun. I just had a wedding so I could not respond in time :sweat_smile:. The summary above seems about right.

Motor neurons map to layer 5 neurons meaning that the L5 activation at time t -L5(t)- is mapped to Motor(t) via apical synapses (can be distal or even proximal as long as motor layer is associated with layer 5). Association was easier to do with temporal memory so I used apical synapses.

At any time, motor neurons have random activations generating random behavior by default. If there are L5 cells that are both apically and distally depolarized these are the voluntary activated cells that override the random activation of motor neurons. So other than the default random acitvations, motor neurons are receptive to and excited by only these voluntary activated cells of L5.

Those apical connections are used to filter out the salient (important in terms of reward) L5 states among the union of the next possible states. On your suggestion, I think it would be functionally possible, but that is not the biological workflow based on my research which I wanted to follow.

Feel free to experiment with what makes sense and hopefully share any findings with us.


The following discussion started as a PM between @kaikun and @sunguralikaan. They would like to make it public, so I have moved that discussion below.


Dear Sungur Ali Kaan,

First of all I want to tell you that I am really impressed by your master thesis.
It is very comprehensive and high level and an incredible work, even more with respect to the time-frame for a Master project.

Regarding myself, I am a 3rd year computer science bachelor student and currently writing the final thesis on a combination of HTM and RL for character recognition. I have read a lot about it and its complexity which makes it hard to design an agent that produces voluntary, goal-oriented behavior. After some experimentation and designing, I decided that I would orient my agent very strongly on your published work. Also to stay in the very limited time constraints.

However to the best of my knowledge you never published any source code for your implementation. In the thesis it is written that the engine is written in C++ and I could imagine your HTM implementation too? Is it designed to be reusable?

It would be great to get some insights from you as I will basically attempt to re-implement most of your architecture in python in the weeks to come. I will try to use as much from NUPIC as possible and otherwise customize the algorithms.

Kind regards
Jakob Heyder

1 Like

Hi Jakob,

I was away for some time trying to marry, I just saw your email.

I did not publish any source code as my implementation is embedded in my game engine. Engine is around 150k lines and HTM implementation is around 5k. I do not think the code would be that good of a use to you as it is too dependent on the engine and the visualization. I also have incrementally changed my architecture since then to allow for recent allocentric location discovery and grid cells. However, I am happy to provide you with any information you would need. I can assure you that my implementation follows Nupic codebase closely with very little tweaks for the actual Spatial Pooler and Temporal Memory algorithm.

If you want to base your implementation on the architecture I proposed, my first tip would be to currently exclude 2 and 3. They may complicate things for your final thesis. Concentrate on layer 4, 5 if you have a limited time because 4 and 5 provide the actual goal oriented functionality (2 and 3 only refines it).

I hope you can accomplish your goal. I can answer any questions you have throughout the process, even very detailed ones. Always happy to see someone taking a shot at HTM + RL.


Hello Kaan,

Thank you for the reply. I hope everything went well at the wedding! Congratulations!

I started rebuilding the architecture including layer 2/3 in NUPIC and got some more detailed questions regarding L5/Motor layer that I formulated in the forum-thread. In my current attempt I try to calculate the excitation from D1/D2 directly in the motor layer without layer 5 in between again, as this symmetric, circled computations are not really supported in NUPIC.

However I currently experience a hard time trying to figure out why the network is not doing as intended doing to a lack of good visualization tools for NUPICs Network API. My layers have 2048x32 neurons which makes them very big and I see that the agent initially has no apical connections that are active at all to excite/inhibit neurons. Probably it just takes more learning or more likely debugging in my architecture.

I am also wondering about how much training it took your agent to learn behaviour and the environment? I use the same parameters (PermInc etc.) initially but am not sure how much time it will take the agent to learn and adapt - in case it works in the first place.

Kind regards

1 Like


Thank you for the kind words. I will make a seperate answer for the thread. Visualization indeed helps to track the learning progress. On the other hand, you can ensure that the system works by introducing a mechanism to directly give inputs to the agent. These would be very simple 1 2 or 3 step inputs. Then you could give the control back to the agent at the same point in action sequence. This was how I made sure it was at least learning.

The required number of steps is not concrete because the layers need to settle down self competition and stablize. This takes time and input variety. After they settle, the number of steps should be around (connection permanence / permanence increase) for every new step of sequence. Still the convergence time varies greatly. You can limit the number of environmental states to reduce the necessary time to stablize.

1 Like

I meant that you could make the agent learn a very simple sequence by repeatedly giving direct motor inputs. You then give the control back at some point to see what it does. It is hard to see whether it learns with random exploration at first.

1 Like

Thank you, I saw that approach of human control in your thesis it is indeed a good idea to test the network. I will try to test it that way. Yes I believe so, I reduced the environmental complexity greatly for now. Currently the task consists only of 4 cubes on a screen and the agent should click the left upper corner to get a reward, with a timeout of 10seconds.

Some more questions coming to my mind:

  • With which layer sizes did you experiment with?

  • My layers are quite large but I sample only a very small number of active neurons with the given parameters, which makes me wonder if that can be correct and working this way. Did you keep sparsity at about a ~2% rate or much lower in some layers?

  • And I currently do not include topology… so my input is just encoded as 1D-black/white array. This takes a lot of information away from the agent. Did you have any tests how important the topological information (2D) was in your framework?

1 Like

Seems like the task is simple enough. I assume the agent is controlling the mouse on a 2D screen? Also, why the timeout?

At the end of the thesis there are various layer size measurements both in terms of performance and learning behavior. My environment worked good enough with layer sizes 512 columns and 8 neurons per column. I experimented in 256 and 4096 column range. O went with 512 for performance reasons as it did not seem to lose learnimg capability. If you design your encoder well you can get away with smaller layer sizes.

The sparsity was between 4% and 2%. It did not have a drastic effect. 512 columns with sparser activation then 2% results in lower than 10 active columns which hurts overlap capacity.

My input was rgb values of a 2d image converted to binary. I tried sampling without a topology (all columns can connect to all pixel locations) and sampling based on proximity (for example columns near the center of the layer sampled rgb values at the image center). I am not sure about the neccesity of a topology but it certainly helped visualizations as local sampling provided activations corresponding to its image location. Local sampling also helps to disconnect distant values in the image so that a column does not learn to combine values on opposing ends of the image. However, as long as similar inputs have overlapping activations based on the strength of the similarity you shouldn’t need topology. The agent does not need to have the same ‘sense’ of similarity as we do at this point. I would start with 1d.

One downside to topology is that the layer is not utilised fully because some receptive fields have a larger variety of inputs than others. Some columns need to learn more than the others because they sense more patterns. You absolutely have to enable some sort of boosting or bumping with topology to balance the activations but then you introduce more instability.

1 Like

Yes the agent sees a 160x160 2D screen and can control the mouse in this field. The action space is discrete as I only allow it to click, move left/right/top/bottom 10px or do nothing at the moment. I map the motor neurons (which are currently 65k) to the range of 6 and choose the action with most active winner cells (most excited) neurons. The maximum number of winner cells can be modified with a parameter but I kept it low at 4 and planning to reduce the amount of motor neurons that is currently so large as it mirrors layer 5.

I use the timeout to ensure that the agent is not learning to just do nothing (as it just gets negative reward if it clicks or the task times out). However at the moment the task does not change after the timeout so it is pretty useless.

Yes I completely agree on the thoughts on topology, that was the same from what I figured out so far. I will keep it in 1D for now and try to make that work.

I experiment with the sparsity at the moment and will shrink the layer size as it should really not have troubles with representational-space. However to really be able to make systematic experiments I need to implement serialization and maybe also a better debug-printout than console^^

In your thesis you also speak about the boosting factor and in the appendix set it to 4, but also set it to false. Was it completely disabled or somehow used as 4 and I miss interpreted that?

Thank you for the quick answers!

1 Like

Also in your framework you activate the cells which have basal and apical depolarization in layer 5 and use them to drive excitation/inhibition of motor neurons (as I believe without the proximal activated layer 5 neurons). It is grounded in biology but what are your thoughts why this works/improves performance?

I think of it as a filter so far that is filtering the apical depolarizations with the basal depolarized neurons to let the let the “more realistic” prediction come through. (as D1/D2 are predicting from L5, which in case is the same information input as the basal activation)
However that part is not quite clear to me yet. I see how the D1/D2-L5 loop can work, makes predictions and utilizes the TD-Error to excite/inhibit actions/states that are favourable, but as the distributions in L5 and D1/D2 are different, how/why should there be an overlap of apical and distal depolarization besides that they are both contextualizing and implicitly driven from the same source. (In my current version they do mostly not overlap at all).

1 Like

I believe reducing the number of possible motor outputs would make the whole process a lot easier for both you and the agent.

I utilized a tweaked version of boosting. Normally boosting artificially increases a column’s overlap based on the activation frequency. I did not like this so I used the calculated boosting factor to continiously increment the permanences of all proximal synapses of all column (referred as bumping). In Nupic this synaptic increment is based on the average overlap of a column with the input not the activation frequency. In my implementation, proximal synapses always gets stronger and the amount of it is dependent on the activation frequency (boosting factor) of each column. It worked better for me; lesser uses columns grow more. There is a subsection for this in the thesis. Boosting is disabled but bumping uses boosting factor.

If you can accomplish this, it allows new and in depth ways of debugging and assessing learning.

The idea was that the predictions somehow should alter the proceeding actions to produce behavior. The problem is that a cell cannot effect its post synaptic targets in a depolarized state. There is just no way of knowing that a cell is predictive from the perspective of the other cells. So this prediction cannot be directly sent to other cells and could only be used to make the self firing easier. The main assumption is if apical and basal sources both depolarize a cell at the same time, it causes a spike which means the predictions from multiple sources are so strong that they fire the cell before the actual input. This was the only bioplausible way that I could find to produce behavior out of depolarization among the many mechanisms I tried.

Distal connections of layer 5 predicts its own activity. Among all the layer 5 activations, some are to be avoided and some are desired. D1 learns the desired activations and D2 learns the ones that are to be avoided based on TD error. At a given time there are predictive cells in layer 5 caused by distal input (layer 5 cells). Among these predictions some are also predicted by D1 or D2 layers. As you said, D1 and D2 filters out the salient ones among all the predictions. If there is any, this results in the respective predictive layer 5 cells to become ‘voluntarily active’ before the proximal input of the next step. These voluntary acitve cells stimulate motor neurons to produce action which are configured to produce random behavior otherwise.

Layer 5, D1 and D2 all predict the next activity of Layer 5. If connected properly, the predictions of D1 and D2 on layer 5 should contain a subset of the predictions from distal input (layer 5 itself). So there is indeed expected to be an overlap, the whole behavior production is built on this overlap; if the agent predicts something (layer 5 distal input from layer 5) and if that thing is important (layer 5 apical input from D1 and D2) activate that thing. I can clarify a bit more if you could pinpoint the source of confusion. The activations of D1/D2 and layer 5 are different at a given time but they are all mapped to the next layer 5 activation.