Ogma's Sparse Predictive Hierarchies

The purpose of this topic is not only about drawing attention towards Ogma’s ideas/framework which currently is named SPH (from Sparse Predictive Hierarchies) but mostly to discuss how to apply SPH to SDR-based processing, and I’m thinking at both:

  • HTM tools, specifically Spatial Pooler and Temporal Memory as encoder/decoder
  • Associative Memory tools as in Diadic/Triadic memory and possibly any other bitpair maps.

Why?
One reason is Ogma already produced some interesting results with quite limited hardware like Raspberry Pi-s and even microcontrollers,
The other reason being HTM feels kind of stuck at TM and SP - they seem to work but I haven’t seen very explicit proposals to further expand (or assemble) these basic blocks in more complex architectures.

To start with here are a couple links detailing the core concepts in SPH:


One main difference between Ogma’s system vs HTM is their is not based on SDRs as data representation, but on a similar, yet very different structure called CSDRs. I won’t delve into differences, because the papers above are much more clear, and what I am going to propose here is using SDRs instead of Ogma’s CSDRs in a SPH-like system.


One of the most interesting chart describing SPH is at the page 5 on the paper, or page 14 of the slide presentation.
I don’t know how to pull out an image from a PDF, so you’ll have to look there to make sense of what follows:

That image contains a layered stack of encoder-decoder pairs, layer 1 being the bottom-most encoder-decoder pair and layer N (usually N=3 or greater) is placed at the top of the SPH hierarchy

From what @ericlaukien kindly explained, there is no inherent constraint on what “encoder” and “decoder” blocks are made of, they experimented with a vast array/kinds of encoders, while for decoders they used mainly a (relatively) simple logistic regression.

One key feature (not obvious in the ladder schematic) is
each upper layer operates at half the time rate of its underlying layer, (mostly 1/2 time steps). That means increasing number of layers expands the time span of the whole system, without a linear increasing in computing costs. They call this “exponential memory”, in the sense each upper layer “sees” changes over twice the time span of the layer below it.


Now how can SPH architecture could be build with HTM bricks:

  • Use a Spatial Pooler as encoder.
  • Use a variation of Temporal Memory as decoder.

Will further discuss how TM needs to be altered in order to be usable in a SPH, because by default TM predicts its own next input while in SPH it has to predict the future underlying encoder output(s)

Without further ado one should also notice that a triadic memory might be good decoder too, while as encoder can be tested a FlyHash encoder that simply “translates” X input bits into X/2 output bits.

2 Likes

It looks like fun to play around with robots running these kinds of algorithms.
we should try to build stuff like this, even if just robots inside simulated game-like environments.

I once made a very rudimentary toy self-driving car which was kinda similar but used a stack of spatial poolers.

1 Like

That’s cool,
It would be interesting to attack gym’s car racing environment which presents the challenge of being image based.
Well you can handcraft some “forward looking lasers” to get similar results as in these videos, but it would be more interesting to work more directly with the screen image itself.

I’d have to downscale the image and apply a highpass filter to get it to run at realtime speeds but it should work.

@cezar_t

SPH, HTM, and Sparsey (official website/code) are the pioneering exemplars for what I call Binary Pattern Neural Networks (BPNNs), with Sparsey being the first. I describe BPNNs below in my draft paper (which I have yet to complete).

BPNNs generally have the following properties: 1) they receive input in the form of binary vectors, 2) they use a form of Winner-Take-All (WTA) computation for selecting the neurons to activate, and 3) the neurons have a binary activation function for output. The implementations of BPNNs differ in how neuron activation is implemented, how the network learns, and how the network is architected. BPNNs are not to be confused with Binary Neural Networks (BNNs) [?], which are traditional ANNs but with activation functions that that transform the underlying scalar weights and neuron states into binary. Unlike BNNs, BPNNs natively operate on binary states.

Given the previous discussion on clusterons, I think including dendrites as first-order computational objects should also fall into this umbrella definition. The power of research in this area is in the clear visual exploration of distributed representation and computation of discrete information packets. This is in contrast to the ANN approach which linearizes computation and embeds information into vector spaces. I’m starting to strongly believe the latter’s availability of strong existing linear algebra computational and math tools is severely inhibiting scientific advancements in artificial cognition.

What’s missing for BPNNs, and why HTMs seem to have been stalled, is a clear theoretical framework for how information is represented and transformed through BPNN networks. Without that theory, trying to connect spatial poolers and temporal sequence memories together to create some effect is just like wiring blackboxes together to see what happens. When you fail, there’s no way to understand why you fail or how to improve it without that theoretical understanding.

I think I have part of this theory, but it’s still a long way from explaining what’s happening, and how to get desired effects.

4 Likes

@JarvisGoBrr your self-driving car looks nice!
Do you use HTM + RL?

Personally I found SPH is very powerful frameworks which is continuously developed by Ogma: high performance and very fast.

One thing I find much interesting is imager encoder/decoder of AOgmaNeo, which allows us to check the potentials of HTM for image prediction and classification.

In my experiments HTM works perfectly with CSDR for both image prediction and classification!

1 Like

I only use spatial poolers for the self driving car.

one is an input encoder, which is decoded into preddicted reward and action by other two spoolers.

1 Like

@JarvisGoBrr
How can you calculate rewards?
Which RL do you use?
What are 2 actions? Velocity and car orientation in the driving direction?

1 Like

I dont know which algorithm I’m using, I just made it up.

First primary input pooler that learns to represent the visual input.

supervisory reward is 0 if car is in road and -1 if its touching the border.

I train a spatial pooler to decode the visual input into a reward prediction but its being trained on its own prediction plus the real reward times an adaptation constant.

This leads it to a “reward smearing” over space and time, that turns a sparse reward into a smoothly varying dense reward that is more negative close to edges of the road and fades off as the car gets further away.

I use this dense reward to modulate the learning of a second pooler that decodes the input into a left-right action.

if dense reward increased relative to previous time step, reinforce the previous action taken otherwize forget the action.

2 Likes

@JarvisGoBrr ok and thank
AOgmaNeo learns very quickly and the car runs some rounds without any problem!

1 Like

@jacobeverist your paper sounds interesting. Do you have any draft public?

When I googled “BPNNs” top results are about Back Propagated Neural Networks which usually means ordinary deep networks.

That’s a potential of confusion.

1 Like

@thanh-binh.to did you used RL or “teacher learning” as in the featured youtube videos?

1 Like

@cezar_t I don’t think that acronym is very commonly used given that nearly all deep neural networks use back propagation for learning these days. Do you have a better name?

Sadly, my draft is under corporate lock&key at the moment so i’m unable to release the partial work without going through a release process. I want it to be complete before I do that so I don’t have to go through it again.

The paper actually focuses mostly on encoders and how to build them which is a sadly neglected but essential topic.

2 Likes

@cezar_t i do not know about this video.
In Ogmaneo they use actor-critics algorithm

1 Like

@jacobeverist Thanks. Then I assume the paper isn’t just a review of BPNNs,. About acronym collision I just noticed it , could be “representations” instead of “patterns”, or ML instead of NN.
You have a better perspective on how important this detail is.
PS. BRNN is taken by the less notorious Bidirectional Recurrent NNs which I haven’t heard of yet, but sounds interesting.


@thanh-binh.to interesting results link in first message here is their youtube channel demonstrations.
The most impressive ones feature either some form of imitation learning, or some simple kind of path memorization followed by goal setting.
In both the robot is first manually driven through an environment, which is not what mainstream RL is about.
However their system can also be set to work in RL settings. I played only with the cartpole example which if time-scaled to real time it isn’t performing as well as the much more complex raspbery pi RC car in the park alley with imitation learning.

That’s why I asked what kind of results are you talking about.

1 Like

I am speaking about car racing demo!

Another peice of prior art is the “cerebellar model articulation controller” which almost meets your definition (IIRC it does not use a competition but it does incorporate sparsity). Its from the '70s. The authors analysis of how sparsity affects it is basically correct but much less math-formalized than the state-of-the-art theories.

4 Likes