@jakebruce You’re welcome
So the scope of the thesis was to propose a real-time HTM based autonomous agent directed by neurobiological research. The task itself was daunting enough given the aim. I would have loved to have comparisons but they were low priority in the end.
Do you mean an HTM-QL system? I have not compared HTM-TD(lambda) to HTM-QL because it did not make sense in terms of neurobiology. QL decouples actions from states which we (HTM community) believe is not true. Layer 5 output is the state and the action. Also what the agent does do not effect its state values in QL (off-policy). So there is that too. In addition, none of the computational models of basal ganglia utilize QL but there are ones imitating TD(lambda) because of its correlation with dopamine secretion in striatum.
If you are asking about just applying QL or advantage actor-critic on the task without HTM, I guess the agent state would be the 2D image. No I have not tried that but I am almost sure that a basic QL would beat the architecture by quite a margin IF you could represent all the possible visual data as different states and map it on the memory. I think this would also run faster than the proposed architecture given the complexity of the learning task, a basic POMDP.
I am definitely interested in this, especially in VizDoom since I was in the conference where it was first presented - CIG 2016. They also host some other video game AI benchmarks like GVGAI (General Video Game AI). On the other hand, I argued on that same conference that our current benchmark environments are limiting us severely on the path of general intelligence. We are only evaluating the output of the underlying intelligence through these benchmarks, not the functionality of intelligence. This is why I am interested in HTM.
I totally agree with you on that and the study is surely missing some form of comparison (other than random walk) at this point. Then again, I am sure HTM in its current state and by extension this architecture would get butchered by the other state of the art approaches in these benchmarks and I think you know why. One could only present that it learns in real-time, online and continuously as advantages if you leave out the neurobiology part. Now if there was another approach that claimed neurobiological plausibility that also had these benchmarks, than a comparison would be meaningful. The closest one is Nengo, Spaun and they are understandibly not interested in these benchmarks. So I guess what I am trying to say is, this sort of an approach misses the point of HTM.
There was another AI benchmark proposed a year ago - Good AI General AI challenge. I think this would be a better candidate but when I read their evaluation metrics, I am not sure if they were able to come up with a proper benchmark design but it looks better than what we have for evaluating general intelligence. It is hard to evaluate GI afterall.
Goals of the Round
o To get working examples of agents that can acquire skills in a gradual manner and use learned skills to learn new skills (increasing the efficiency of learning).
o We are not optimizing for agent’s performance of existing skills (how good an agent is at delivering solutions for problems it knows). Instead, we are optimizing for agent’s performance on solving new/unseen problems.
Example:
if an agent is presented with a new/unseen problem, how fast (i.e. in how many simulation steps) will it deliver a good solution? This also includes a question of how fast the agent will be at discovering this new solution. If the agent has already learned to find solutions for similar problems, it should use existing skills in order to discover the new skill.
o Agents must provably use gradual learning and will be evaluated on how fast they are (how many simulation steps do they need) at discovering acceptable solutions to new tasks.
o Agents won’t be evaluated on task performance, cumulative reward etc.
This looks like a much better fit for the architecture I proposed. I just spawn the agent in and watch it learn. I could take the same agent, put it in a new task and watch it learn again. Of course, it is not very good at it at this point. I would love if they came up with a way to evaluate spatiotemporal abstractions where the agent gradually works on higher level abstractions (union/temporal pooling as you also know).
TL;DR: Thanks for the crucial questions. A VizDoom benchmark would certainly be helpful but it had a lower priority compared to presenting a better architecture. Hopefully in the future.