HTM Based Autonomous Agent

jakebruce · August 25, 2017, 10:33pm

Hey Paul, I totally agree. I’m not sure whether a forward strategy is at all plausible in a biological model, which I suppose is why the backward view is referred to as the mechanistic view. I was referring to the functional equivalence of the two views, so it becomes an implementation detail having more to do with constraints on the substrate than the algorithm.

For example, you could do a forward approach in HTM by batching short rollouts (typical policy gradient implementations these days use around 30 timesteps) and computing the gamma and lambda returns on those batches to update your permanence values. Not aesthetically appealing or bio-plausible, but research has shown this approach to be computationally equivalent to the backward view under mild assumptions.

Whether you’d want to do that for HTM I don’t know, but it means the usual policy gradient implementations are fair enough comparison targets (backpropagation objections aside).

Gary_Gaulin · April 20, 2018, 3:30am

For a nice model like yours I have to recommend as a benchmark environment of the neuroscientific moving invisible shock zone arena. I ended up needing one for my minimal system, where the challenge is to (from nerobiological information) demonstrate two frame place avoidance as well or better than a live rat, using the least amount of code.

A simple virtual 2D environment with robot platform type left/right and forward/reverse control helps keep the test equal. This focuses on what is most important to fully understand and test before adding a third dimension to the environment.

Game engines look wonderful and have fast graphics. But neuroscience needs what neuroscientists use in the lab, where normally it is a 2D problem that the animals must solve.

I found this test that delivers a shock when at the wrong place at the right time is very good for indicating when the code needs work. There is also the useful signal information for live rats in the paper that goes with it. This adds another level of testing where the model must somehow match experimental data. A neuroscientist then better understands how a model relates to biology, and has a reason to take it seriously.

You have talent this forum needs. Numenta is on a neuroscientific mission where the best thing to have is what the animal labs use to test navigational skills and other behaviors. I hope that sounds like something you would be interested in helping to develop.

kaikun · April 24, 2018, 9:44am

Hello @sunguralikaan,
First of all congratulations for this amazing thesis work! I am really impressed by it.

I have some detail questions that I could not quite get from the paper (maybe overlooked) regarding the L5 layer where everything integrates. As I am currently playing around with RL and HTM using a customized Nupics network API orienting on your architecture as the experimenting and researching you did is extremely valuable to learn from.

In you model the cyclic flow is described the following:

L5_SP layer is pooling from the feedforward input from L4 (spatial, weighted = active and predicted neurons as the TP in Nupic supports, but no temporal abstraction)
L5_TM is predicting the active columns in the context of L2 and L4 active neural input.
The active columns of the L5_SP layer are feed forward input for D1/D2
The active neurons of the L5_TM layer are distal input for D1/D2, which both try to predict the next state of L5 with the given information. This predictions are utilized to determine the TD error.
L5_Apical connections to D1 and D2 are activated and permanences updated depending on the TD error.

In L5 we now have active cells, distal depolarized cells and apical (D1/D2) depolarized cells.

In the given framework it is stated that cells which are depolarized by distal AND apical connections will become active.
The next step is learning the motor-activation by association. This is done through apical connections from the L5 layer to the motor neurons. Here learning depends on the firing-type leading to either excitation (D1 activated) or inhibition (D2 activated).

In the last integration is where I am slightly confused.

Does that mean Motor neurons only connect to neurons in L5 that are “voluntary” active - meaning active through a combination of apical and distal depolarization? Or also to the neurons that are active through proximal input interpreted in distal context? How do this neurons then influence the excitation level?
Are the apical connections (formed at previous timesteps) from D1/D2 also used in L5 to sparsify the active columns to neural activation (additionally to using distal connections to L2/L4)?

In case of 2. I believe not. But if not wouldn’t it be possible to input (and learn) apical D1/D2 and L5 distal connections directly to the Motor layer and then take their intersection to excite/inhibit the neurons without loss of generality from the algorithm (as the learned apical connections are only used for Motor excitement?)?

This would be interesting, as NUPICs Network API does not support symmetric computation well and it would need a lot of customization hacking.

~

Maybe you could clarify some of it and I hope you support the idea of a NUPIC implementation that builds on your architecture.

@Gary_Gaulin I thought about this as well, following up Numentas paper for a version implemented in a neural simulator such as NEST and integrated e.g. in the HBP (Human Brain Project) Neurorobotics platform. Would definitely be a very interesting project and maybe make it easier to compare with experimental data/collaborate with neuroscientists.

Kind regards

sunguralikaan · May 8, 2018, 12:02am

@Gary_Gaulin thank you for the interest and good points. I am trying to steer into this sort of biological experimental data to use as a reference point in my PhD for the reasons you stated above.

On the other hand, it is hard to beat games on the marketing front. Drawing the attention of many young researchers is also very important on the long run. Speaking from a game development perspective, applying any sort of agent AI is a lot easier on 2D and AAA games are mostly interested in 3D. Here we have a contender for a real-time 3D approach.

The game industry does not utilize the advancements in machine learning or AI for many reasons and I think we are missing huge on this alone. They spearhead the state of the art computer graphics, why not machine intelligence?

sunguralikaan · May 8, 2018, 12:23am

Thank you for the kind words @kaikun. I just had a wedding so I could not respond in time . The summary above seems about right.

Motor neurons map to layer 5 neurons meaning that the L5 activation at time t -L5(t)- is mapped to Motor(t) via apical synapses (can be distal or even proximal as long as motor layer is associated with layer 5). Association was easier to do with temporal memory so I used apical synapses.

At any time, motor neurons have random activations generating random behavior by default. If there are L5 cells that are both apically and distally depolarized these are the voluntary activated cells that override the random activation of motor neurons. So other than the default random acitvations, motor neurons are receptive to and excited by only these voluntary activated cells of L5.

Those apical connections are used to filter out the salient (important in terms of reward) L5 states among the union of the next possible states. On your suggestion, I think it would be functionally possible, but that is not the biological workflow based on my research which I wanted to follow.

Feel free to experiment with what makes sense and hopefully share any findings with us.

rhyolight · May 8, 2018, 4:05pm

The following discussion started as a PM between @kaikun and @sunguralikaan. They would like to make it public, so I have moved that discussion below.

kaikun · April 18, 2018, 4:48pm

Dear Sungur Ali Kaan,

First of all I want to tell you that I am really impressed by your master thesis.
It is very comprehensive and high level and an incredible work, even more with respect to the time-frame for a Master project.

Regarding myself, I am a 3rd year computer science bachelor student and currently writing the final thesis on a combination of HTM and RL for character recognition. I have read a lot about it and its complexity which makes it hard to design an agent that produces voluntary, goal-oriented behavior. After some experimentation and designing, I decided that I would orient my agent very strongly on your published work. Also to stay in the very limited time constraints.

However to the best of my knowledge you never published any source code for your implementation. In the thesis it is written that the engine is written in C++ and I could imagine your HTM implementation too? Is it designed to be reusable?

It would be great to get some insights from you as I will basically attempt to re-implement most of your architecture in python in the weeks to come. I will try to use as much from NUPIC as possible and otherwise customize the algorithms.

Kind regards
Jakob Heyder

sunguralikaan · May 5, 2018, 9:18am

Hi Jakob,

I was away for some time trying to marry, I just saw your email.

I did not publish any source code as my implementation is embedded in my game engine. Engine is around 150k lines and HTM implementation is around 5k. I do not think the code would be that good of a use to you as it is too dependent on the engine and the visualization. I also have incrementally changed my architecture since then to allow for recent allocentric location discovery and grid cells. However, I am happy to provide you with any information you would need. I can assure you that my implementation follows Nupic codebase closely with very little tweaks for the actual Spatial Pooler and Temporal Memory algorithm.

If you want to base your implementation on the architecture I proposed, my first tip would be to currently exclude 2 and 3. They may complicate things for your final thesis. Concentrate on layer 4, 5 if you have a limited time because 4 and 5 provide the actual goal oriented functionality (2 and 3 only refines it).

I hope you can accomplish your goal. I can answer any questions you have throughout the process, even very detailed ones. Always happy to see someone taking a shot at HTM + RL.

kaikun · May 5, 2018, 1:28pm

Hello Kaan,

Thank you for the reply. I hope everything went well at the wedding! Congratulations!

I started rebuilding the architecture including layer 2/3 in NUPIC and got some more detailed questions regarding L5/Motor layer that I formulated in the forum-thread. In my current attempt I try to calculate the excitation from D1/D2 directly in the motor layer without layer 5 in between again, as this symmetric, circled computations are not really supported in NUPIC.

However I currently experience a hard time trying to figure out why the network is not doing as intended doing to a lack of good visualization tools for NUPICs Network API. My layers have 2048x32 neurons which makes them very big and I see that the agent initially has no apical connections that are active at all to excite/inhibit neurons. Probably it just takes more learning or more likely debugging in my architecture.

I am also wondering about how much training it took your agent to learn behaviour and the environment? I use the same parameters (PermInc etc.) initially but am not sure how much time it will take the agent to learn and adapt - in case it works in the first place.

Kind regards
Jakob

sunguralikaan · May 5, 2018, 4:18pm

Hello,

Thank you for the kind words. I will make a seperate answer for the thread. Visualization indeed helps to track the learning progress. On the other hand, you can ensure that the system works by introducing a mechanism to directly give inputs to the agent. These would be very simple 1 2 or 3 step inputs. Then you could give the control back to the agent at the same point in action sequence. This was how I made sure it was at least learning.

The required number of steps is not concrete because the layers need to settle down self competition and stablize. This takes time and input variety. After they settle, the number of steps should be around (connection permanence / permanence increase) for every new step of sequence. Still the convergence time varies greatly. You can limit the number of environmental states to reduce the necessary time to stablize.

sunguralikaan · May 5, 2018, 4:35pm

I meant that you could make the agent learn a very simple sequence by repeatedly giving direct motor inputs. You then give the control back at some point to see what it does. It is hard to see whether it learns with random exploration at first.

kaikun · May 5, 2018, 7:31pm

Thank you, I saw that approach of human control in your thesis it is indeed a good idea to test the network. I will try to test it that way. Yes I believe so, I reduced the environmental complexity greatly for now. Currently the task consists only of 4 cubes on a screen and the agent should click the left upper corner to get a reward, with a timeout of 10seconds.

Some more questions coming to my mind:

With which layer sizes did you experiment with?
My layers are quite large but I sample only a very small number of active neurons with the given parameters, which makes me wonder if that can be correct and working this way. Did you keep sparsity at about a ~2% rate or much lower in some layers?
And I currently do not include topology… so my input is just encoded as 1D-black/white array. This takes a lot of information away from the agent. Did you have any tests how important the topological information (2D) was in your framework?

sunguralikaan · May 5, 2018, 8:49pm

Seems like the task is simple enough. I assume the agent is controlling the mouse on a 2D screen? Also, why the timeout?

At the end of the thesis there are various layer size measurements both in terms of performance and learning behavior. My environment worked good enough with layer sizes 512 columns and 8 neurons per column. I experimented in 256 and 4096 column range. O went with 512 for performance reasons as it did not seem to lose learnimg capability. If you design your encoder well you can get away with smaller layer sizes.

The sparsity was between 4% and 2%. It did not have a drastic effect. 512 columns with sparser activation then 2% results in lower than 10 active columns which hurts overlap capacity.

My input was rgb values of a 2d image converted to binary. I tried sampling without a topology (all columns can connect to all pixel locations) and sampling based on proximity (for example columns near the center of the layer sampled rgb values at the image center). I am not sure about the neccesity of a topology but it certainly helped visualizations as local sampling provided activations corresponding to its image location. Local sampling also helps to disconnect distant values in the image so that a column does not learn to combine values on opposing ends of the image. However, as long as similar inputs have overlapping activations based on the strength of the similarity you shouldn’t need topology. The agent does not need to have the same ‘sense’ of similarity as we do at this point. I would start with 1d.

One downside to topology is that the layer is not utilised fully because some receptive fields have a larger variety of inputs than others. Some columns need to learn more than the others because they sense more patterns. You absolutely have to enable some sort of boosting or bumping with topology to balance the activations but then you introduce more instability.

kaikun · May 5, 2018, 9:17pm

Yes the agent sees a 160x160 2D screen and can control the mouse in this field. The action space is discrete as I only allow it to click, move left/right/top/bottom 10px or do nothing at the moment. I map the motor neurons (which are currently 65k) to the range of 6 and choose the action with most active winner cells (most excited) neurons. The maximum number of winner cells can be modified with a parameter but I kept it low at 4 and planning to reduce the amount of motor neurons that is currently so large as it mirrors layer 5.

I use the timeout to ensure that the agent is not learning to just do nothing (as it just gets negative reward if it clicks or the task times out). However at the moment the task does not change after the timeout so it is pretty useless.

Yes I completely agree on the thoughts on topology, that was the same from what I figured out so far. I will keep it in 1D for now and try to make that work.

I experiment with the sparsity at the moment and will shrink the layer size as it should really not have troubles with representational-space. However to really be able to make systematic experiments I need to implement serialization and maybe also a better debug-printout than console^^

In your thesis you also speak about the boosting factor and in the appendix set it to 4, but also set it to false. Was it completely disabled or somehow used as 4 and I miss interpreted that?

Thank you for the quick answers!

kaikun · May 5, 2018, 9:52pm

Also in your framework you activate the cells which have basal and apical depolarization in layer 5 and use them to drive excitation/inhibition of motor neurons (as I believe without the proximal activated layer 5 neurons). It is grounded in biology but what are your thoughts why this works/improves performance?

I think of it as a filter so far that is filtering the apical depolarizations with the basal depolarized neurons to let the let the “more realistic” prediction come through. (as D1/D2 are predicting from L5, which in case is the same information input as the basal activation)
However that part is not quite clear to me yet. I see how the D1/D2-L5 loop can work, makes predictions and utilizes the TD-Error to excite/inhibit actions/states that are favourable, but as the distributions in L5 and D1/D2 are different, how/why should there be an overlap of apical and distal depolarization besides that they are both contextualizing and implicitly driven from the same source. (In my current version they do mostly not overlap at all).

sunguralikaan · May 6, 2018, 12:17am

I believe reducing the number of possible motor outputs would make the whole process a lot easier for both you and the agent.

I utilized a tweaked version of boosting. Normally boosting artificially increases a column’s overlap based on the activation frequency. I did not like this so I used the calculated boosting factor to continiously increment the permanences of all proximal synapses of all column (referred as bumping). In Nupic this synaptic increment is based on the average overlap of a column with the input not the activation frequency. In my implementation, proximal synapses always gets stronger and the amount of it is dependent on the activation frequency (boosting factor) of each column. It worked better for me; lesser uses columns grow more. There is a subsection for this in the thesis. Boosting is disabled but bumping uses boosting factor.

If you can accomplish this, it allows new and in depth ways of debugging and assessing learning.

The idea was that the predictions somehow should alter the proceeding actions to produce behavior. The problem is that a cell cannot effect its post synaptic targets in a depolarized state. There is just no way of knowing that a cell is predictive from the perspective of the other cells. So this prediction cannot be directly sent to other cells and could only be used to make the self firing easier. The main assumption is if apical and basal sources both depolarize a cell at the same time, it causes a spike which means the predictions from multiple sources are so strong that they fire the cell before the actual input. This was the only bioplausible way that I could find to produce behavior out of depolarization among the many mechanisms I tried.

Distal connections of layer 5 predicts its own activity. Among all the layer 5 activations, some are to be avoided and some are desired. D1 learns the desired activations and D2 learns the ones that are to be avoided based on TD error. At a given time there are predictive cells in layer 5 caused by distal input (layer 5 cells). Among these predictions some are also predicted by D1 or D2 layers. As you said, D1 and D2 filters out the salient ones among all the predictions. If there is any, this results in the respective predictive layer 5 cells to become ‘voluntarily active’ before the proximal input of the next step. These voluntary acitve cells stimulate motor neurons to produce action which are configured to produce random behavior otherwise.

Layer 5, D1 and D2 all predict the next activity of Layer 5. If connected properly, the predictions of D1 and D2 on layer 5 should contain a subset of the predictions from distal input (layer 5 itself). So there is indeed expected to be an overlap, the whole behavior production is built on this overlap; if the agent predicts something (layer 5 distal input from layer 5) and if that thing is important (layer 5 apical input from D1 and D2) activate that thing. I can clarify a bit more if you could pinpoint the source of confusion. The activations of D1/D2 and layer 5 are different at a given time but they are all mapped to the next layer 5 activation.

kaikun · May 6, 2018, 9:26am

Ok, yes I read about the bumping but the parameter declaration was slightly confusing so I wasn’t sure. I stay with/without NUPICs boosting option for now but keep it in mind.

Agreed. But your motor-layer where you choose the 3 winner cells from had the same size as the other layers (512). Did you keep all layer sizes the same for simplicity?
Do you believe it hurts the architecture much if the apical connections from layer 5 voluntary active neurons to motor neurons to excite/inhibit are not learned but constantly mapped instead?
As it is learning by association, I thought it might work this way too and simplifies one layer.

The main confusion is caused by the fact that we sample in D1/D2 using a Spatial Pooler from layer 5. This means we will have a complete different columnar activation than in layer 5 as the connections are randomly initialized. Then even though the neural activation is still the prediction for layer 5 the indices can not really be mapped to layer 5 as they are completely different distributed. Thus it seems not systematic to me that basal and apical depolarizations aim to the same.

Maybe I misunderstood this part and we use the same columnar activation as in layer 5 without Spatial Pooler learning, that would in my intuition work more as we intended. (I did not test it yet)

kaikun · May 6, 2018, 12:15pm

I also have the problem currently that the mouse movement does not have enough effect on the sensory encoding and layer 4 neural activation to make a difference.

I believe this is a general problem also when the task would be more complex and detailed that individual pixels become more important but get vanished by the Spatial Pooler. In general some “Attention” mechanism could be tried here to give a certain image region more importance in the encoding (or using saccade and a limited perceptual field as you had).

In my case I could encode the mouse coordinates next to the image visuals and then combine the input at layer 4 sensory input, sizing them on the importance for the feature. Or I increase the layer size and sparsity enough to encode the details in the activation. Or do it specifically for the pixels around the mouse (attention like)

Do you have thoughts on this and what might work best?

Kind regards

sunguralikaan · May 7, 2018, 12:43pm

I have been travelling the last couple of days so I couldn’t respond quickly.

The motor layer consists of only 30 neurons and no columns. It is as shown in page 28. There are 4 rows just to visualize different states of the same neurons. There is actually a single row of neurons.

So the motor layer can be treated as 30*1 (30 columns and 1 neurons per column). The functionality can be realized by a full layer but I just simplified it this way.

I thought of this myself too at some point. It works but there is a catch. What happens when the layer 5 activation changes slightly because of a new pattern or boosting? In this case, there will not be a motor command mapped to the slightly changed layer 5 activation for whatever reason. It can work if the layers are highly stable but you basically remove noise robustness and any change requires a new mapping which kind of contradicts with what HTM does.

I think I understand your confusion.

Each layer 5 activation corresponds to different activations on D1 and D2. Suppose that at time t, L5 has activation L5(t), D1 has activation D1(t) and D2 has activation D2(t). Activation D1(t) and D2(t) takes activation L5(t) as their input. Therefore D1(t), D2(t) and L5(t) all have differing activations. However, there is a relation; both D1(t) and D2(t) occurs when they get L5(t) as their input so they encode activation L5(t) on their own unique way. As the time goes on, D1 and D2 learns all layer 5 activations. On the temporal memory side, D1(t) and D2(t) takes their distal input from L5(t-1).

This is like motor layer association with layer 5 with a single difference. For motor layer you associate L5(t) with Motor(t). In this case, you associate D1(t-1) with L5(t) through apical connections instead of D1(t). Same with D2.

So, there are also apical connections forming to L5(t) from D1(t-1) and D2(t-1). So any activation occurring in D1 or D2, depolarizes cells in L5 that are expected to be active in the next time. What you end up is at any given time there are predictive cells in L5(t) that are distally depolarized by the activation from L5(t-1) and apically depolarized by the activation from D1(t-1) and D2(t-1).

You can achieve the same thing by directly using the same columnar activation of L5 on D1 and D2, however that does not seem to be how biology does as one activation is in the cortex and one is in the striatum. These should be mapped but not the same.

This is a VERY crucial problem. I had this sort of a problem from the beginning. Currently, you can only get around this by redesigning your encoders. This is a research area on its own. In my case, I was interested on the parts of the image that changed so I tried implementing an event based visual sensor here. This allowed the agent to sense what actually changed (what matters). Maybe you can come up with an encoder that ‘magnifies’ what you need until we can have a crack at the attention problem.

Any of these can be a starting point. You will probably understand what is important about the input after some trials. I really spent weeks trying to come up with something kind of universal to “zoom” on to the important bits in the input. However, biology has very sophisticated tools tailored just this task such as the retina and thalamus. If you are interested in how the eye does it, you can read on neuromorphic vision sensors but then again, this is another area of research.

sunguralikaan · May 7, 2018, 1:33pm

I edited this bit. For time t, activations D1(t) and D2(t) take their distal input from L5(t-1). It also works if you configure temporal memory of D1 and D2 as in vanilla HTM but I found that prior layer 5 activation as distal input works better as in the architecture diagram.

Topic		Replies	Views
An open-source community research project on comparing HTM-RL to conventional RL Related Papers	63	3355	June 19, 2018
Hierarchical Temporal Memory Agent in standard Reinforcement Learning Environment Engineering	12	2048	February 16, 2020
Minecraft or Crafting Minds? NuPIC	0	514	August 19, 2021
Deep Reinforcement Learning, HTM Numenta Theory	5	1270	May 14, 2016
Exciting potentials with HTM agents in OpenAI Gym Engineering	5	540	October 20, 2019

HTM Based Autonomous Agent

Related topics