Using HTM for branch prediction

Daniel_MK · January 12, 2020, 3:46am

Thank you!

First of all, thanks for the suggestion. I have not implemented this encoder, but I thought about it a bit and I don’t think it would be possible to perform this encoding process as fast as I would need it for this application. Computing the pseudo-random generator on-the-fly and several times for each SDR seems to be very complex to optimize for. Please, correct me if I’m mistaken.

I was not even thinking about a table, but a hardwired solution. Since the synapses in the network are much more expensive to implement than the encoding process, I didn’t think it would be a problem to use this method.

vpuente · January 13, 2020, 9:47am

A HW random generator can be really simple (v.gr.using xor gates) [althouhg not very good]. In any case, TM will take you >100 of clock cycles. Since the CLA algorithm is easy “pipelineable”,… the encoding should be not an issue.

The benefit of this is that it can increase the history of the PC substantially.

That might be impractical if you want some flexibility. For example, if you need static partitioning (let say because you are using SMT). Additionally, wiring is not free. It might require less area but certainly will have a negative impact on power.

In any case, the work is really nice.

Daniel_MK · January 15, 2020, 1:23am

Oh, are you sure there is not a way to make it faster? Even if it’s “pipelineable”, I don’t think it would ever make sense to even think about using TM for branch prediction if it takes that much time to run.

That’s true, but according to my tests, the use of more history does not provide that much benefit = /

I’m sorry, I’m not familiar with this.

Yeah, it would require some really nice improvements in materials in order to work well.

Thank you very much! I’m glad you liked it.

vpuente · January 15, 2020, 4:33pm

Don’t think so. With 1024 mini-columns there is a lot of work to do each cycle. If you have a large budget in area and power, it could be improved, but hardly it will be faster than that. In any case, the LSTM paper shows this as a “theoretical” experience… my understanding is that your assumptions are in the same direction. [ And LSTM inference will be millions of clock cycles per “prediction” and with offline training]

The only way to make it faster is to use smaller systems (few tens of mini-columns) and have a hierarchy

SMT stands for Simultaneous Multithreading. Some processors, such as Intel’s, statically partition per thread branch predictor tables (and history register) when you enable SMT (Intel call it Hyper-threading).

Daniel_MK · January 19, 2020, 6:13am

My understanding of the topic was a little off, then hahaha

Yes, they were, but I didn’t expect it to be for that much hahaha

Right, I can see how hardwired solutions would prevent that.

Topic		Replies	Views
Integrating aspects of the HTM Algorithm Engineering	8	799	January 1, 2019
Application of HTM algorithm Engineering question , community	6	59	May 2, 2025
Advice For a New MS Student Lounge research , anomaly-detection , projects	12	1614	April 19, 2018
General analysis of HTM learning and prediction abilities? Related Papers	5	781	April 19, 2019
Getting started with HTM with some simple examples Lounge	6	1464	August 6, 2018

Using HTM for branch prediction

Related topics