Replacing array storage in nupic/research/connections with rtrees

A while ago, I was trying to add spacial indexing to HTM neurons so that they could mimic neurogenesis/neurodeterioration and be stored in a corruption resistant file. I could’ve used hash maps for that, but I came across a few problems with that idea, such as: how could synapses take input from neurons far outside the length of the original array size without storing the indices of each neuron and reconfiguring each synapse-neuron link when a neuron was deleted/added, and keep things ordered so spatial algorithms could find nearby algorithms?

R-trees could solve those problems while giving a few extra important benefits, such as using the index locations to read in only a portion of a much larger array into an HTM spatial pooler, and allowing for efficient algorithms to be developed to change those input locations over time. This could allow for algorithms that took a small subset of the input from the cells on the temporal memory spatial arrays and fed it into another spatial pooler while still having the ability to read every cell in the temporal memory array.

So far I’ve managed to create an algorithm that generated an n-dimensional set of points, and I’m just getting into R-trees. However, a set of n-d points should be enough to take input from n-d arrays, so I might be at the point where I can start combining my code with some of the python Nupic research code. But if I’m replacing a core part of the Nupic research libraries or Nupic core libraries with R-trees, where should I start from? would I just replace the vector or array classes and go up from there making sure everything worked with the change, or would I have to change much of how some of the algorithms worked? If it’s the former, then where would I want to replace arrays and where wouldn’t I?

1 Like

Hi @SimLeek,

I suspect every implementation of HTM stores constructs in their own way with variations occurring due to language differences; or within the implementation itself due to the algorithm being served. There are some overlaps also, which arise because they are the most efficient given the job at hand. For instance, the SP actually has a few data structures which consist of the same conceptual elements, but are each used in different ways and at different steps within that algorithm. The treatment of each of these can vary between Python, C++, Java, and Clojure.

So the short answer is yes, feel free to use whatever data structure you see fit - and on my experience porting from one language to another; it’s best to test as you go along by making sure unit tests result in the same values. At least imho.

If you do come up with a novel approach, be sure and share it here - there are lots of devs here other than myself who would be interested :wink:

1 Like

Oh, I definitely agree with the advice about unit tests. I had to test each individual part of the n-dimensional fill-box-with-points algorithm before it worked right. And thanks for making me realize I should definitely look for pre-built unit tests. These tests should help immensely for developing python code, and I may eventually use the c++ tests too. I really don’t know why I didn’t think to look for those.

Also, I’ll definitely be sure to share my approach when I’m ready. Right now my code’s a bit uncommented/unprofessional though, and I want to make it look kind of good before I show it off.

1 Like

I didn’t know whether that was a question, but yep - just to confirm - those are the tests to which I was referring…

1 Like

It wasn’t a question, but thanks anyway!

I’ll see if I can use those without much of the rest of the repository, if possible, since right now I just want the spatial pooler/temporal memory.

1 Like