Hi mraptor, I read through your documentation, and if I understand it correctly you use multiple arrays with binary data to manage the temporal data changes between ticks right?
I just reacted to the mem usage as it seems abnormally high for what you are doing? (Maybe a Python thing not sure)
Edit: As an example my cells with all components loaded take about 97 bytes each. 390MB for 4 million cells (2048x2048) with some overhead.
Probably Python thing, but you should probably not use such big arrays. Even if it works it will be super slow.
Most of what I tested so far, works OK with 5x300 - 5x1000 regions. ( If you use Spatial Mapper you will probably have to up the lower boundary to 5x500).
I mentioned in the docs, that it works most probably because even that if you use 5x300 region i.e. input-SDR is just 300 bits, the UNION in the connection-memory is using 1500 bits (5*300) per UNION, where the “incoming-data” uses only 0.4% sparsity (i.e. (300 * 2%)/1500).
So use small regions… and up them only if the prediction performance degrades.
(try always with as lower as you can get, you get the benefit that it is also faster )
If you need bigger regions, I’m afraid you have to do it with multiple regions, I did Spatial mapper exactly for this reason, so I can chain multiple encoders and TM in different configurations, so I can start experimenting with hierarchies.
Thanks for your quick answer and explanation. I will experiment a bit during the week, and probably have some follow up questions/thoughts next weekend
Hey @mraptor, I sent you a PM about this (you can check yours at HTM Forum). Because you’re posting about your code implementation, it would be fitting to cross-post into HTM Hackers. You can do so by hovering over your original post and clicking “Reply as a linked Topic”:
Something interesting … now that you mentioned it this is one more variable that can be fine tuned … In my current experiments I always used the same sparsity (winp) over all modules (Enc,TM,SM), but there is also possibility to use different sparsities at different levels. It may be benefitial!! to have lower sparsity at some situations…
Just did a quick test with 5x300 region and sparsity 5% on Encoder and 2% on TM, seems to work !
(300 * 0.05)/1500 = 1% of 1500
interesting … anyone have experience with different sparsities.
OK @mraptor since I’ve been talking up the theoretical perspective of your transition memory as a connection matrix I need to state clearly here what I told you offline, that I don’t think it makes sense to talk about optimizing a spatial pooler until you know what the signal you want to optimize is. And since I believe “meaning” is in generalizations of temporal relations, any optimization should depend on the transition memory.
I’ve been talking about this in the Universal Encoder thread. I don’t think the spatial pooler matters. Or if it does, it will be optimizing data so it can be better coded with temporal generalizations in the TM.
Perhaps you should optimize your spatial pooler to distribute or concentrate the connectivity of the transition memory. But until we start trying to generalize temporal structure in the TM, we won’t know.
And when you talk about hierarchies, I think they will emerge from the connection matrix too, so will not require separate transition memory instances.
I’m not sure what you’ll be optimizing to with the spatial mapper you’ve implemented here. A self-organizing map, so spatial distribution?? It looks like you are optimizing in a clever way. But unless you are optimizing on the right parameter it won’t matter.
Frankly, I don’t think spatial distribution will matter much. It might. I don’t know.
The interesting questions for me at the moment are in the transition memory. Expressed as a connection matrix we have a new perspective to analyse it. The connectivity can give us generalizations for meaning and hierarchy.
I’m to do some more experiments, but from what I can guess so far … .Spatial Mapper is special case of Temporal Memory (in my impl.).
From todays Encoders you need transition layer which is what SM is for.
Also you can not use physically one large matrix as Franky also just figured out i.e. you have to split it to pieces… if you do that you would need plumbing, SM again.
Third if you don’t split this theoretical giant matrix you waste alot of space, because it is sparse.
So no matter how you twist it you have to have many TM’s which act in concert to simulate sort of one giant TM.
One additional point of why you need hierarchies : Complexity needs NEARLY DECOMPOSABLE SYSTEMS
I would say yes, to the idea a spatial mapper/pooler is a special case of TM. They should both group columns.
I’m only arguing about the parameters for doing that, which I believe should be the connectivity of the TM.
I’m totally with you on the need for hierarchies. But I don’t think they are going to be physically distinct sets of cells. Not an entirely separate TM instances.
Take a look at that avian telencephalon paper in the Universal Encoder thread. They find hierarchy. They find it in the connectivity of their networks. As @floybix points out, the hierarchy they find is top level functionality - motor, vision etc. I don’t know how distinct that is from our low level temporal networks, but I think the decompositional principles are the same.
I’m also not saying it will be “distinct sets of cells”, but if you are implementing them using bit or other matrixes you have to group them. This is where TM become distinct module, which then you use to build hierarchies.
I’m simply using different abstraction level, not single neuron, but bunch of them as a group, so there are consequence of that.
You can not insist on abstracting on region level and keep all the “fluidity” of single neuron scenario. Right ?
Something have to give.
No, I still disagree you need a separate representation. See the example below where it is done using activations alone. Contrasted with inhibition, all within a single network.
I think you can.
Take a look at this tutorial I came across the other day. These guys “pool” a representation for “touch” without making a separate copy of the network to represent it.
We can do the same, where the activation spreads out to highlight the “small neighbourhood connectivity” (Ed. “small-world networks” in Shanahan paper) of our connection matrix.