@Jose_Cueto and @Charles_Rosenbauer (as there are some overlapping ideas from both of you)
Until now I was just thinking to use the TM in order to get those patterns that could be spotted by a human just by looking at it in not much time. The ideia of using more not-so-human-friendly data to make these predictions would require a SP and a different way to represent the output, right? More like a continuous output, rather than a binary one.
Also, I’ll better explain the problem with using these data I was refering to: on a local predictor, nearby instructions data would not be useful, since they do not change. So only registors data could be useful (because even cache would delay too much to deliver anything), but by the time the predictor needs this data, the content on them do not refer to that specific instruction, but to some instructions before (the ones further in the pipeline), so the TM would probably have to see a indirect pattern here (sorry, I think it still is confusing, hope you understand it). However, a global predictor could perhaps make use of the instructions to get some insights. It would be interesting to test.
@Charles_Rosenbauer Thank you so much for the detailed post. Your comment has actually hurt my hope on getting interesting results (or any at all), but the earlier I discover this, the better, so I can even change the topic of my thesis if needed. I knew most of what you wrote on the first paragraphs, but the last takeaways were really valuable.
So, I have some questions. I don’t expect you to answer them with so many details (as @Jose_Cueto said, it’s my job to make further investigations, I just need a path =])
Q: Is it possible that a TM can replace a Two-Level Predictor table with less hardware and more of less the same results, but learning in less cycles than a perceptron?
Expected answer: No, a TM would require more hardware.
Q: Is it too difficult (or even impossible) to rewind a TM? So, is the solution for rewinding be using copies of all the parameters of the model? If it is, hardware will be definely a great problem.
Expected answer: there is a way, but it would be very difficult to implement (isn’t default TM too difficult already?).
Q: Since I expect ‘No’ for the first question, a TM would only make sense as a global predictor, right? One that could be reused, forgetting about instructions not used in some time). My idea is to combine a 2-bit saturating counter to help the TM when it’s not so sure (or, from other perspective, use a TM to improve the 2-bit saturating counter when possible)
Q: (From above) The ideia of using more not-so-human-friendly data to make these predictions would require a SP and a different way to represent the output, right? More like a continuous output, rather than a binary one.
Q: How can Memory Prefetching be modeled as a temporal problem? (don’t waste too much time on this one haha) [EDIT: @vpuente has linked an interesting paper below. I think I can find the answer reading it beyond the abstract, which I’ll do later]
Thanks so much for everyone that is taking some time to guide me on this topic.