Abstract—We employ heavily the principle of symmetry in the music group to craft a simulated robots policy. Using the same principles we create a method for evaluating how near the simulated environment is from a video data set of movies and use this principle to reward the agents in the environment. The hope is that they will learn to fool the algorithm into thinking it is being parsed a movie when frames from the simulation are parsed to it.I. INTRODUCTION
The principle of symmetry is a useful concept in that it simplifies a complex subject into related pieces. The music group shows a surprising amount of rotational symmetry especially induced by the group generator of D12. A musical key board can be viewed as a highly regular non local cellular automaton with a particular (well structured ) song forming paths between leading notes in consecutive chords. Because of the group generator there are also connections within any particular chord. What is interesting about this cellular automaton is that it can be optimised with the whole body of music theory in mind. If we were to simply sample chords somehow on a keyboard without favour to scale we would produce mostly dissonant chords. If we were to reward this sampling process with a reward based on the consonance of the chord and trained the agent sampling to attain the greatest expected reward. It would learn the whole body of music theory just by maximising this objective. We will describe away of using this fact to create a system that takes in frames from a movie. When there are people in the frames ,the neural networks output layer, which is split in 12s like a keyboards octaves, is encouraged to have activation’s that harmonise,would it have been a keyboard. When there are no people in the frames dissonance is encouraged. this aids to segment out human behaviour. We will then train agents in a simulation to fool the above network into harmonising its activation’s.II.
MINING FMRI DATA
We plan to use the music group two ways. One way is a way of compressing FMRI data from different people into one cellular automaton based representation. Here the cellular automaton refers to the connections induced by the generators of D12 on D12. Different people have different brain graphs,but they all follow a distribution. That distribution is particular to the capabilities that the human brain has in learning ,memory and processing. To this end we employ a Vector Quantised Variational AutoEncoder VQVAE[1]. A VQVAE belongs to the variational auto-encoder family of algorithms that are generative in nature and employ some sort of bottleneck in their construction as they map the same thing to itself.The bottle neck after training will be forced to contain the essential attributes of the things it is mapping as it needs to recreate the same sample after losing information. In a VQVAE the bottle neck consists of discrete codes that are indexed by the encoder and input to the decoder.We will feed in frames from a FMRI dataset into the VQVAE. In our formulation the codes will be fixed in a cyclic vector representation and be grouped in 12s and the distance between the output of the encoder and the codes will be rewarded for harmonising. Further more the progression they make will be encouraged to be a circle of fifths progression with shorter jumps favoured. This process creates just one representation for the fMRI data from different brains using the graph based non local cellular automaton of D12. We shall train the agents in the simulation to minimise the distance between their internal representation and that of the found codes from the VQVAE above.III.
A REWARD OF CONSONANCE
As mentioned we will parse a movie dataset to an algorithm that seeks to maximise harmonisation between its activations whenever there are people in the movie dataset, while max-imising dissonance when there are none or when there are simulated agents. This is so it identifies harmony with human like behaviour. This will function somewhat like a GAN. Because we will later parse frames from the simulation to this algorithm and reward the agents this time for making the activation’s in the algorithm harmonise, while not training the algorithm. The agents policy will be formed from a VQVAE and this will map Actions to Actions, conditioned on the state.We will also minimise the distance bewteen the agents code activations and that from the fmri data the following way.Similar to how we are going to train the agents we need to parse both sets of codes to an algorithm. The codes from the fmri will have it that the algorithm is encouraged to harmonise while that of the agents is not. Then we train the encoder of the VQVAE to fool this new algorithm.
https://www.researchgate.net/publication/371812049_AGI_Training_Environment_and_Procedures_to_Attain