Seems interesting. Thoughts?
Only just started reading it, but one of the initial assertions strikes me as a bit odd:
Human cognition processes can make future predictions without any previous learning? I’m pretty sure I’m misinterpreting their point… will have to revisit this statement after I get a better handle on the theory I think.
I raised an eyebrow at that too, I felt I understood them a bit better once I was done with the document. I kind of understand the concepts but not via the math
The lack of need for learning is dependent on having access to a simulator which itself has either perfect knowledge or a good approximation to this about the behaviour of the environment and the consequences of actions. If you start with no (or inadequate) knowledge of the environment and the effect of actions, the simulator would need to be learnt before the Fractal AI could be used to choose decisions.
The key difference is that learning the incremental behaviour of the environment and the effect of actions on state is a supervised learning problem. Learning a policy for behaviour from scratch is a reinforcement learning problem. These are in a sense more complex problems. FAI gives a systematic, reliable way to bridge from the solution of the supervised learning problem to the reinforcement learning problem.
Hi Paul, as a disclaimer I am the main author,
Yeah, I should had been more cautious in this statement, learning is actually needed in orther to build a reliable prediction of your system next state -or “simulator”, a word not well received in this community, so feel free to translate into “predictor”- but the idea is, given a prediction function of the next state, given an intial one and an action, then, we focus on how to use this information to deal with the basic goal of the intelligence: taking the proper decision on whitch action to take next.
In a more general setup, you would need some form of NN of the kind of VAE to get a good embedding of the state, plus a LSTM to model the cause-effects of the system dynamics, so given a time serie of embedding states you can actually predict the next states properly.
So basically here I try to answer the question: given an already learned predictor -or magically given one, like an atari emulator- of the next state of your system, plus a secondary reward predictor -in the atari games, a way to read the game score- what is the best policy you could build to choose among your available action that makes your agent behave as “intelligently” as possible?
Hi Jordan, thanks for opening this discussion!
The math part of the paper is not quite related to the idea behind the algorithm. Algorithm is aimed at making the distribution of visits -sates sampled by the algorithm- to be equal to the distribution of rewards found at hose states (here rewards can be negative or oddly shaped, so we first “normalize” them using the relativize() function, a variation of the z-score function- so basically you don’t need all the math for this.
The math part is about defining a measure how efficient a planning algorithm is by computing those two distribution, then getting the mutual information, and finally normalizing it so 0 is random policy and 1 is maximal mutual information (so mutual information equals entropy of one of the distributions). This is just the way of benchmarking different versions of the algorithm or vs other planning algorithm like MCTS.
Some how I regreet adding all this math for so little meaning, but also felt some formalization was ok, as the main idea is so simple and difficult to compare to other approaches.
Also there was a feeling of completness: the idea starts with a version of the 2nd law, maximizing a kind of entropy, but this is lost in the simplicity of the algorithm, looking as it just diverged totally from the underlaying first principles. By showing we are still maximizing for a kind of entropy -mutual information- solved this lack of completness, and also showed a nice way to add rewards into the entropy formulation used, so finally I kept this ugly formulas in the doc.
It should be possible to implement fractals with recurrent ReLU networks and also chaotic and pseudorandom behavior. It would be interesting if someone were to prove that mathematically.
Fractal AI is all about sampling the space intelligently right? It answers the question, “where should attention flow?” (attention management) rather than, “what am I looking at here?” (pattern recognition).
Since I discovered Fractal AI I found Karl Friston’s work on the free energy principle and I’d like to get your thoughts. It seems his work is answering the same question as yours, am I right about that?
My very cursory glance tells me that his answer has something to do with looking where ever maximum information will be given. In other words, if I have a prediction that has a moderately high chance of being violated (my confidence is low) and if it is I learn a lot about the world (huge information pay off) I should look at that space of the environment (should test the prediction).
It seems to me that Fractal AI does the same kind of thing, (seeking the boundary of things) but does it in a heuristic fashion. Is that an accurate comparison of the two?
I always find it enlightening to compare two things that seem similar, or seem to accomplish similar things. That way I learn the differences and where they align.