No worries, it was a question that I was curious about the answer, I was assuming there is some theory about branch prediction you are try to take advantage of using HTM which is a good way to start a problem. Nice illustration by the way and thanks for providing your hypothesis and examples. The gathering and experimenting of the hypotheses are one of the most important parts in the process of finding solutions to problems that are in a highly stochastic and partially observable environments.
In actual cpu execution, heaps and heaps of unrelated programs being executed by the cpu, you probably know this already. For example imagine the permutation of unrelated programs’ instructions including the ones running in the kernel mode are being; context switched non-deterministically, processor threads run in parallel (temporal sequence of inst can be lost here). While the inputs based on the example given is a branching history, these branching code don’t simply form a pattern due to again the nature of the environment or simply put their fate highly depends on the previous sequence of instructions’ sequence. Using only the branching history is like hoping for a pattern will emerge and hoping it will stay for a long time enough to be useful on the next branch prediction, this is because the TM remembers and forgets what it learned based on the inputs it has seen so far.
In some existing implementations/research such as this for example,(https://www.cs.utexas.edu/~lin/papers/hpca01.pdf) they have identified hypotheses that they can take advantage of using a perceptron. For example linear separability of functions which a perceptron can model and its performance increase when number of inputs get larger. The perceptron in this case learns correlations between long histories of branching decision history. However these correlations do not simply mean that they are consistent sequences that an HTM system is happy to store and recall in the future. This could be some invisible (from the human eyes) function learned that doesn’t necessarily generate an observable sequence.
Sorry to sound pessimistic but I think that in order to use the TM at some level of success, the problem or the AI environment must be examined first and conceptually evaluate if a TM is capable of performing in this environment. The TM loves consistent sequences in inputs and gets bored and updates its memory when it finds something more interesting sequence. I’m not discouraging you by the way, this is an encouragement .
One suggestion would be to use a different scheme of inputs, for example provide more instruction context rather than just branching history, or keep and encode chunks of previous instruction histories before the branchings. Good luck with the hardware budget though it will be another challenge in the design.