This is a diagram of my agent.
Note the difference from the usual diagram.
The policy consists of nodes .( A spiking network or something else).
But the policy is split into two disjoint pieces A and B that do not communicate directly with each other.
They still communicate indirectly which i will show how shortly.
The A receives the state which causes some particular nodes , dependent on the state to fire.
B is stuck with nodes that fire in a pattern in a feedback manner. So this pattern is constantly firing and different nodes go off and on initially randomly.
The particular pattern that B fires with determines actions. This is randomly initialized so the actions are random since Bâs pattern is randomly arrived at but are determined by the pattern B fires with.
This then causes the state to transition which affects the inputs to A and causes Aâs nodes to fire in a particular way.
So this way Aâs pattern depends indirectly on what Bâs pattern does because
Bâs patterns ->Actions_>State transitions-> Aâs pattern
If we assign the nodes in A and B to the notes in a musical keyboard as if they were a combined set of nodes (not disjoint), then arrange for a reward to B that rewards it according to the consonance of B *combined * with A,
then B has to play a progression that through the indirect pathway causes A to play in such a way that the two sets of nodes harmonize.
B is controlling everything .Dependent on what it plays, A will respond, through the effect B has on state transitions and what states A is input.
and this affects what sort of reward it will get when the consonance of the set of (A and B) nodes is evaluated .
Why is this useful.
For one, we know that the optimal strategy for B to do to solve the problem is to arrange for similar states to be visited. The expected state visits should have low entropy. This will make the problem more tractable.
Unpredictability is not a good thing from the perspective of Bâs problem statement. And Randomness is the worst case, order or lower entropy will naturally be favored.
So in order to form a grammar with its activationâs, the state must be seen as an extension of that grammar in order to modulate A nodes appropriately.
This is what we want. an agent who seeks to get the lowest net cumulative entropy in state visits.
Why is this useful. When classifying the state according to a T system, the agent, if it is human looking enough , will place itself in the same tile. since this is an optimization process and thats optimal.
From there other tiles include language and higher âthoughtâ in the process of modulating language.
In fact the A nodes will form a manifold that interpolates these tiles,in the direction of reducing entropy which is exactly what we want.
Humans do this and that has led to civilization ,science ,art and literature etcâŚ