CC algorithm so far covers the recording and recall and assumes receiving correct motor-comands/actions.
There is still “unused” layers and neurons that will probably add more functionality, but I don’t expect decision making to happen in the same CC.
Thats why I’m trying to figure out the Decision loop, only then we can have simple Agents doing useful tasks.
We can start with a simple tasks as object recognition OR path-finding.
Current CC model relies on L6L4-loop.
The process as I understand it is that we receive as
INPUT : Sensor information and Motor Action
OUTPUT : Location and Features (to Thalamus)
So far we have elaborated RECORDING device.
What we need next is Decision maker which would convert L&F to Action.
I know of two mechanisms to do that : Reinforcement Learning and Planing.
Lets concentrate in this post on RL.
What are the requirements of RL. At its most simplest a lookup table with the following format :
State:Action:Acummulated-discounted-reward
i.e. :
Loc&Future:Action:Q-value
In addition we need a GOAL : recognize object(Layer2/3 representation stabilizes ???) OR find the path (reach a destination landmark/Sense)
I have no idea of how the RL-loop is connected, does he uses the Basal-Ganglia.
Or it goes somewhere in the Cortex via Thalamus !
We have two problems :
The Goal : In both cases of Obj-recognition and Path finding we need a Stored in memory SDR against which to compare.
In case the Goal is unclear/fuzzy (i.e. we never seen this object) we need a separate process, lets ignore it for now.
The Q-value : In traditional RL this is a Real number, but we cant use numbers here !
The Loc&Future => Action pair can probably be solved by simplifies TMem used as Classifier.
(In fact every time I tried implementing TM /3 times/ the CORE was a classifier which I then extend to TM, so this functionality comes for free)
The Goal could be just another transition MemorizedSDR => END-Action
The biggest problem I have is the Q-value … real numbers with precision are not suitable as SDR
So one solution is having Real value table outside of SDR loop.
The other option is a different mechanism ?
So take it from here. What are your proposals ?
PS> In this case I’m assuming we use RL when we have partial or no map or reference frame, we are exploring. Planning/Search would be when we know the steps but there is different paths. if you dont agree restate the problem in your way.