Implementing HTM.Java on Apache Spark



I’m really intrigued by the HTM theory and I’m thinking of implementing a version of the algorithm on Apache Spark. I’m aware of the Apache Flink and Akka implementations but I would like to get my hands dirty and implement a Spark version.
However, I was wondering if there are any known limitations that have prevented others from implementing HTM on Spark.
What I mean by limitations is, if the Spark framework is unable to run HTM at scale, if it’s micro batch (streaming) isn’t suitable at all for HTM etc.
From my point of view I cannot see something that will prevent me from running HTM on spark, even scaling up the algorithm. I might be wrong though…

It’s weird that I couldn’t find any implementations on Spark, just some old mailing list threads…

Thanks !



Hi @papajim,

There is a community member @EronWright, who implemented a Flink version that you can check out. At some point, I had it in mind and so did the company I work for (sponsors of HTM.Java) - have it in mind to do a Spark implementation - however I have had (and still have) a more pressing goal I’m working on right now. As far as I know however, there should be no limitations present for doing a Spark implementation though. Last year, we worked on a Persistence framework to be able to serialize HTMs to stream or disk - in anticipation of eventually doing a Spark implementation - then Eron came in and did a Flink implementation.

EDIT: There is also an AKKA implementation too!



Thanks @cogmission. @papajim the beauty of using HTM in Flink is that you get a true streaming system with good connectors to various data sources (e.g. Kafka), that scales by creating separate HTM model instances for each ‘key’ in your data, with full checkpointing support.

Here’s the integration library for this:

I’ve been meaning to update the library with the latest Flink and HTM dependencies. With renewed interest from you and others, I’ll prioritize that.