IMO, I think the voting part is relatively easy to hack together, so long as you aren’t concerned with doing it the “right” way. The simplest naive way would be to run output from lower layers in the same CC through the SP algorithm to activate minicolumns in the Object layer. Then run the TM algorithm on cells in the Object layer, but taking their input from the cells in the Object layer of other CCs. And finally run a separate round of the TM algorithm (for apical dendrites) on cells in the lower layers, but taking their input from cells in the Object layer of the same CC.
Now of course that wouldn’t be very optimized (all features in each CC would need to learn associations with all other features in the other CCs), so it wouldn’t scale well. You’d probably want to put a little more work into it, and implement TP in the Object layer rather than (or in conjunction with) SP, to eliminate that combination explosion problem. There are a few different TP implementations floating around (including one written by Numenta in NuPIC’s research code)
To me the much bigger grey area in TBT is how to implement a learning algorithm that establishes gridcell-based reference frames. Since reference frames are really a core concept of TBT, without an algorithm for that you are kind of stuck in the realm of sequence learning and unable to unlock SMI.