Just sharing in case someone is looking into the same thing.
I mentioned in my project report Hierarchical Temporal Memory Agent in standard Reinforcement Learning Environment that HTM on a VLIW may be a good idea. That the instructions can be well scheduled into a consistent, 3~4 instruction/block bundles. And I proofed it by writing the assembly by hand.
Yesterday I found out that Compiler Explorer supports Kalray’s VLIW processor. So I put a put the SP overlapping algorithm into it and tried. - The result is disappointing. Their compiler spits out code that effectively turns a VLIW into a single-issue in-order processor (a lot of single instruction bundles). I’m not sure if this is caused by their VLIW architecture or the the compiler not able to find a optimal schedule. In any case, HTM on their processor with the current compiler will be quite slow. And I hope HTM on VLIW can still be fast.
Source code: https://godbolt.org/z/owBBpr