Simple Cortex


#1

It’s been a while since I’ve posted on the forum. I hope everyone is well and enjoying the summer.

Anyway, I have been working feverishly on an OpenCL HTM implementation when I got inspired to write some other algorithms based on HTM theory. The work has turned into a project called Simple Cortex (SC). SC uses a slightly modified pyramidal neuron model to detect coincidences in patterns and predict the future. Here’s the first demo of it in action:

Over the next few weeks I will be documenting my ideas and the theory for public consumption. This should be a somewhat easy task because the theory behind the algorithms is relatively easy to understand, hence “Simple” in the name. Because of this I’m thinking about writing my first scientific paper. The journey begins…

Here’s the code:

Any general advice or comments about the project would be greatly appreciated!

Dave


#2

Very cool work, @ddigiorg! (and the soundtrack is so dramatic!)

Also thanks for noting “The work is heavily inspired by Numenta’s Hierarchical Temporal Memory (HTM)”, we appreciate the credit! :smiley:


#3

Thanks Matt!


#4

I upgraded my demo (with more dramatic music :wink:) and finished the github repo’s readme. I want to do a few more demos before branching out into the wider public eye, but it’s my hope that more people will become interested in the applications of neocortical-based machine intelligence. HTM can do whatever SC can, so hopefully my work will attract more interested parties to the forum!


#5

I looked over your examples.

Perhaps you could provide a simpler example without the graphics and dynamics simulations?

Perhaps you can provide a dataset, read it in and show how to train and predict on it.

Also, do you have a conceptual mapping of HTM to Simple Cortex terminology? From your diagram it’s not clear how things are working. I understand all the HTM stuff, but how does SC work?

Also, it looks like you made some new things like Forests so that the code is amenable to parallelization. Is that correct?


#6

Thank you for the feedback.

A simpler example like you suggest is definitely a good idea and easily doable. I’m going to do a few tutorial videos somewhat similar to Matt’s HTM school videos to explain how it all works. I think this would really help provide a more clear explaination of SC.

As for a dataset, I think MNIST is a great starting point because it’s so simple and standard. I’ll have to figure out how to read the data files into my code, but it shouldn’t be too difficult.

You are correct about forests. I created that structure for better OpenCL parallelization because I can keep synapse data, that responds to and learn from the same input space, in one large buffer. For example in the Ball demos, forest 0 contains all neuron’s “proximal dendrites”, which learn the active inputs. Forest 1 contains all neuron’s “distal dendrites”, which learn previously active neurons.


I apologise for the wall of text below (the tutorial videos will be much more effective, imo), but here’s a quick explaination of terms (HTM on left, SC on right).

  • synapse -> synapse
  • segment -> dendrite
  • cell -> neuron
  • region -> area
  • spatial pooling -> forest 0 encode and learn
  • temporal memory -> forest 1 encode, learn, and predict
  • htm “classifier” -> decode

There are many architectural and design differences between HTM and SC:

  1. SC does not have “spatial pooling” and “temporal memory”, rather SC has 4 functions (encode, learn, predict, and decode) that if used correctly replicate HTM’s functionality.
  2. For the demos, SC activates just 1 neuron wheras HTM would activate 2% of a population of neurons. SC can activate a percentage of a population as well, but for simplification I decided to just do 1 active neuron.
  3. A SC synapse’s “connected/unconnected” threshold is 0, a departure from the biological model. This means SC synapses are always connected to their pre-synaptic address unless the permanence falls to 0 in which case the synapse will look for an unused pre-synaptic address.
  4. SC does not use terms like “proximal”, “distal”, and “apical” for the dendrites. A SC dendrite may function like any of those in HTM depending on their useage.
  5. SC does not make use of the “column” data structure. Like the neocortex and HTM, SC may have neurons with dendrites that share the same receptive field. This would be considered a “minicolumn” or HTM column, but SC does not specify it or organize the data as such.

#7

Some comments and questions:

  1. Pattern uses “node” terminology. Not defined or used elsewhere.

  2. Dendrite Forest handles a route of connections from one layer of inputs (synapses to input) to a layer of outputs (neurons that activate)

  3. One distal set of connections managed by Dendrite Forest with neuron activity as input and neuron inputs as the output. Becomes prediction network.

  4. No actual neuron data structure. Workhorse is the Dendrite Forest object with arrays of inputs and outputs are passed to the parallel algorithm. Works with arrays: inputs, activity states, boosting count, overlap count, permanences

  5. Decoding is accomplished by tracing the dendrite from each active neuron back to the input cell through each connected synapse. All synapses are backtracked, so the input prediction kind of seems like it would be “noisy” based on reading the code, but if the synapses are pruned and reassigned during learning, then that noise would disappear.

  6. Boosting works during the encoding process. Boosting value accumulates for every time neuron is not activated up to a max. Selects the top most boosted neurons to be active during learning process if no other activation occurs.

  7. Only one-step prediction occurs in the current implementation. Without the mini-columns, there’s no way to do higher-order sequence prediction because input A only has one prediction value based on its learning. No other predictions are possible. (I bet if you occasionally reverse gravity in your experiment, it will simultaneously predict both up and down falling with no way to disambiguate it based on past behavior.)

  8. Is prediction biasing activation on the current input pattern? I can’t figure that out yet. Since you only have Active/Inactive states instead of the Active/Inactive/Predictive states of HTM neurons, it’s difficult to see how that would work. There is no predictive state that will inhibit competing neurons in a column either which allows for the higher order sequence learning.

  9. It might make more sense in your demos if you explicitly label or comment on what each Pattern represents. Furthermore, the same thing with what each Forest represents. Referring to them by Index makes it difficult to follow what each input/output means. I figured it out eventually.

  10. How do multiple Forests relate to each other in a single Area? I see that they are referenced indirectly depending on the number of Patterns received in a function call. Seems like a problematic design feature.

  11. Each OpenCL call operates on a single neuron and its connected synapses correct? I am guessing that’s what the uint n = get_global_id(0); index refers to.

Like the neocortex and HTM, SC may have neurons with dendrites that share the same receptive field. This would be considered a “minicolumn” or HTM column, but SC does not specify it or organize the data as such.

  1. You would also need to implement predictive states and column inhibition to replicate minicolumns.

For the demos, SC activates just 1 neuron whereas HTM would activate 2% of a population of neurons.

  1. Can you explain what you mean by this? I didn’t see anything that indicates a single neuron. I saw 64k neurons though.

#9

Thank you for your comments. It’s helpful having a fresh perspective on this project so please let me know where I could better clarify my ideas.

I’ve upgraded my concept diagrams. It just shows one neuron for now but I’ll add a few more to show how the algorithms operate in the vids.

image

I’ve buried nActThresh in the encode function and nPreThresh in the predict function. I plan to pull these out and make them Area class variables. Perhaps it would make sense to make a Neuron class as well for better organization.

To answer your comments and address your questions:

  1. I had some trouble with naming conventions so I threw in “node” for now as a representative of 1 element of the Opencl buffer in Pattern class. Ultimately, the class is supposed to be a vector representing the pre-synaptic inputs and/or neuron states (the outputs) of a SC area. Therefore, I am going to change my naming convention to “Stimulae” as the class name, “states” as the Opencl data buffer, and “stimulus” replacing “node” as one item in that buffer.

  2. I think that’s right. For example, forest 0 lets me calculate all neuron’s dendrite 0 overlaps in 1 OpenCL pass.

  3. Correct. Forest 0 (all neurons 1st dendrite) observe and learn from input stimulae. Forest 1 (all neurons 2nd dendrite) observe and learn from previous active neuron stimulae.

  4. Correct.

  5. Decoding looks at active (or predictive, see 8) neurons, fetches the synapse addresses of the specified forest, and stores it in a vector. For example, let’s say during a predict pass a neuron is activated by dendrite 1. Then we can grab the values in dendrite 0 because in the past we’ve seen the stimulae learned by dendrite 0 and the stimulae learned by dendrite 1 occur together.

  6. Yes, boosting helps the encode algorithm select an active neuron when unlearned stimulation occurs.

  7. I’m not sure I follow. A HTM active neuron represents an input pattern (proximal dendrite) in a certain context (distal dendrite). A HTM column has neurons that share the same input pattern, but have different contexts. Neurons are chosen based on having a pattern in the context of previous active neurons (which have their previous contexts). A SC neuron in the ball demo has a dendrite which represents an input pattern and dendrite which represents a previous active neuron context. I can have as many or as few neurons that respond to the same input pattern, but in different previous neuron contexts, as needed which allows for learning and predicting high order sequences. Ultimately HTM and SC are coincidence detectors, which allow them to operate similarly, but SC doesn’t use the minicolumn structure which has its advantages(dynamic number of neurons per column) and disadvantages (no shared proximal dendrites).

  8. It’s true I only have active/inactive states, however I get active and predict neurons based on what function sets the neuron states. This is why I uses “nStates” as the buffer name instead of “nActivations”. So “active neurons” are the neuron states after the encode process and stored in a stimulae vector and "predict neurons” are the neuron states after predict process and stored in a separate stimulae vector. For forecasting many timesteps into the future, the first predict pass is fed the current neuron activations and the subsequent predict passes are fed the current neuron predictions.

  9. That’s a good idea. I’ll modify my demos when I get a chance.

  10. For flexibility, I let the user select what pattern gets processed by which forest in the encode, learn, predict, and decode algorithms. You may be right that it’s probably not the best feature, but I’m not sure how to do it any other way.

  11. Yes, that’s correct.

  12. I hope I was able to answer this in 7 and 8. My claim is I indirectly replicate minicolumns because HTM and SC neurons both operate dendrite contexts.

  13. I have a variable in area.h called _numAN which represents the number of active neurons an area is allowed to have at one timestep. For now I set it to 1, which means out of ~64k neurons, only 1 may be activated by the encode step. This restriction does not affect prediction, which can have many or as little predicted neurons as needed.


#10

Okay, that clarifies some things for me. It’s starting to make sense.

  1. Based on your diagram, it looks like the “pink” dendrite only activates when all synapses are activated by input according to dThresh. Furthermore, the neuron only activates when both dendrites are active according to nActThresh. This is the encoding step. This also answers my question about prediction biasing activation since the prediction has to be correct in order for the neuron to activate in that diagram.

  2. Furthermore, with the requirement that all synapses are activated to activate the neuron, this means that the decode step will produce only valid input predictions. There are no loose synapses like you would have in HTM and the SC synapses can be used for predicting the input.

I’m not sure I follow. A HTM active neuron represents an input pattern (proximal dendrite) in a certain context (distal dendrite). A HTM column has neurons that share the same input pattern, but have different contexts. Neurons are chosen based on having a pattern in the context of previous active neurons (which have their previous contexts). A SC neuron in the ball demo has a dendrite which represents an input pattern and dendrite which represents a previous active neuron context. I can have as many or as few neurons that respond to the same input pattern, but in different previous neuron contexts, as needed which allows for learning and predicting high order sequences. Ultimately HTM and SC are coincidence detectors, which allow them to operate similarly, but SC doesn’t use the minicolumn structure which has its advantages(dynamic number of neurons per column) and disadvantages (no shared proximal dendrites).

  1. I understand now. What HTM does that SC does not is share the same encoding for same inputs that are part of different learned sequences. For instance, in the HTM examples, given the two sequences: ABCD and XBCY. In HTM, you have B’/B’’ and C’/C’’ which share the same SDR representation but activate different cells in the minicolumn. You can compare B’/B’’ and C’/C’’ SDRs and they will be equal. However, in the SC implementation, B’/B’’ and C’/C’’ are not equal. They activate different cells with different proximal dendrites. However, in SC I see you can still compare them by doing the decoding step. It’s not guaranteed that the synapses are completely same, so the decoded values may not be completely equal.

If you wanted to start to recognize a sequence from the middle starting with input B, you need to tune the parameters so that you can carry multiple predictions forward until you receive the resolving value D or Y. Does the _numAN=1 allow you to do this if you start from B?

Now that I think about it, you would activate a single neuron during the encode step representing one of the Bs. It would be either B’ or B’’ since there’s no way to disambiguate. However, the prediction step would be stuck predicting either C’ or C’’ respectively, but not both. Which means, it would assume one sequence from the start. A way out is if it learned the sequence starting from B and then it can make multiple predictions to C’ and C’’. However, it would still select either C’ or C’’ on the next step and you would have the same problem.

The only way out I think is to increase _numAN sufficiently. But there is no way to guarantee that you will be encoding and carrying forward all possible known contexts starting from B. That’s the challenge I see for SC right now. Without the explicit shared encoding for same input but different contexts, I’m not sure what kind of performance it will have. Maybe you have already thought about this.


#11

Doesn’t this mean that temporal pooling is not possible?


#12

Maybe. Probably.

Is temporal pooling well-defined? If you have a list of its capabilities, then I can say whether or not it would work or not. Without knowing how TP is implemented, it’s hard to say whether the columns are necessary or not.


#13

@jacobeverist

  1. Yes, at least for the diagram and ball demo all synapses must activate for the dendrite to be active. I have tried different dendrite activation thresholds with success so I could make it a user-specified parameter.

  2. Interestingly in the ball demo with the 50% dendrite threshold value, most of the time the ball prediction is slightly off and sometimes displays multiple slightly off predictions. This makes sense to me because even if SC hasn’t seen the new ball starting location and trajectory, it has learned trajectories that are close to it.

  3. This is true, SC can not easily compare similar proximal inputs at the neuron layer which is an advantage of HTMs minicolumn structure.

Ahh you noticed the annoying flaw! I’ve been trying to solve this issue since I started developing SC. This has to do with dealing with ambiguous distal contexts, which also relates to HTM’s minicolumn bursting. I think it’s possible if I can get the right neuron activation and/or inhibition algorithm, but I have been unsuccessful in my thinking so far. You may have noticed every time a new sequence occurs I “reset” the previous active vector. This is just my temporary workaround until I can get the proper algorithm working.

The reason I haven’t got it working is the difficulty in finding a balance between activating new neurons with the greatest boost values and/or activating ambiguous previously learned neurons. For example, let’s say we have a proximal input that’s been seen in many distal contexts. However, we’re unsure if the distal context is one of the previously learned distal contexts or if it’s a new context and therefore a new neuron is needed.

Anyway, I put the cart before the horse in saying SC can do everything HTM can do (I got excited and forgot about this issue). I need to be able to recognize a sequence starting from the middle before I can make that claim.


#14

So I thought about this and tried to figure out what was missing and how you might be able to fix it.

Mini-columns provide two things that SC doesn’t have. 1) Neurons with shared synapses (all neurons on a column), and 2) inhibition of neurons that activate on same value but different context.

Because you don’t have this, the SDR of same values but different contexts will not be equal. Furthermore, without the inhibition on a column, you can’t disambiguate what sequence you’re in.

For example, if you have the sequences again ABCD and XBCY. If you input A then it will predict BCD. If the next input is B, it will resurrect the other sequence and will simultaneously predict CD and CY. If you have inhibition of neurons that share same synapses, you would solve this problem. That is, implementation of mini-columns.

I actually don’t think columns will be too difficult to implement in your current design. You have to do the following:

  1. Add a “grouping” of neurons. Basically an array of indices of states that are part of a column.

  2. Allow group of neurons to share same synapse addresses. No need to learn synapses individually. The column can train and share the same set of synapses. (This is the proximal dendrite, not the distal/predictive dendrite)

  3. An inhibition step that either bursts the group of neurons or zeroes them out except for the neuron that was predicted. This can be an intersection of the predicted state and the encoded state to decide which neuron to set. Of course, you would need some way to select a winner if there is more than one.

Now, the SDRs will be comparable across different contexts and will be able to reused in learning new sequences. Furthermore, you cut the number of synapses you have to manage divided by N where N is the size of the column. Also, you will now be able to disambiguate predictions by managing the context with inhibitions.

By keeping the inputs in columns and inhibiting them there, you don’t have to find some weird logic that would be able to find two different neurons that activate on the same input but on different contexts and go about inhibiting each other.

Another thing you can do that HTM doesn’t do is be greedy about the neuron groupings. Group them starting with only 1 neuron and add more neurons as more context is needed. Perhaps you have a good heuristic for determining when more context neurons are needed?


#15

@ddigiorg It is still really cool. Keep up the good work! I’m sure that some derivatives of HTM will have uses in the future.


#16

@ddigiorg

See the latest HTM school video which basically explains the same thing I explained here. Maybe it will be easier to understand. It’s funny how this came out right after I figured this out :slight_smile:


#17

(the story of my life :wink: )


#18

While on vacation I’ve given some thought to the basics of sequence learning and prediction applied to Simple Cortex architecture. So far this is only conceptual and I will have to modify SC code a little to achieve the desired results. Let’s say a SC Area has learned two sequences, ABCD and XBCY:

NEURN 0 1 2 3 4 5 6 7
STIM0 A B C D X B C Y
STIM1 _ 0 1 2 _ 4 5 6

Stimulus 0 handles the proximal, feed-forward context, “A”, “B”, “C”… etc., and Stimulus 1 handles the lateral distal context, the previous active neuron index. The “_” symbol represents a cleared stimulus, indicating the start of a sequence. After learning those two sequences let’s input “A” into Stimulus 0 and reset Stimulus 1 context:

step 0: (A, _) activates n0, n0 predicts n1, n1 decodes B
step 1: (B, 0) activates n1, n1 predicts n2, n2 decodes C
step 2: (C, 1) activates n2, n2 predicts n3, n3 decodes D
step 3: (D, 2) activates n3, no predict neurons

Now what if we input “B” into Stimulus 0? At first glance we might think this to be true:

step 0: (B, _) activates n1, n1 predicts n2, n2 decodes C
               activates n5, n5 predicts n6, n6 decodes C

However, this is incorrect. Our input stimulae (B, _) is an entirely new, unique, and unobserved set of stimulae. Therefore, it is a mistake to assume either of the C’s will follow. A starting “B” in a sequence could represent something vastly different than a B in the middle of either of the two learned sequences. Therefore, the Area has to learn a new neuron and start a new learned sequence:

NEURN 0 1 2 3 4 5 6 7 8
STIM0 A B C D X B C Y B
STIM1 _ 0 1 2 _ 4 5 6 _

But what if we want to predict what follows a B, no matter its distal context? We would have to feed Stimulus 0 into a Predict step first, then continue feeding the predicted neurons into a Predict step on subsequent steps.

step 0: B predicts n2 and n6, n2 decodes C and n6 decodes C
step 1: n2 predicts n3, n3 decodes D
        n6 predicts n7, n7 decodes Y

This is how SC can recognize sequences from the middle, although it will not be able to differentiate overlapping sequences until the proper context is observed which activates the right neuron.


@jacobeverist I think your minicolumn grouping idea would work, but I actually might not need it after all based on these two reasons:

  1. Based on the above thinking I believe the 2nd problem of not having minicolumn-like architecture “2) inhibition of neurons that activate on same value but different context” is not an issue for SC to properly learn and predict sequences.

  2. The 1st problem “1) Neurons with shared synapses” I can solve by having a SDR of dendrite activations and one or more vectors of dendrite addresses. Whenever a dendrite is active, the neuron will learn the active dendrite’s address.

This type of architecture is nice for recognizing, learning, and predicting neurons with lots of contexts. I could see Simple Cortex neurons responding to more than 2 stimulae like HTM’s sensory-motor theory:

  1. Feed-forward observed sensation
  2. Lateral temporal context
  3. Sensor position context
  4. Apical top-down perception
  5. etc.

#19

@rhyolight That video is hilarious and informative. Please don’t ever change, Matt.


#20

Apparently turning videos into .gifs is really easy and quite nice for a github readme. Anyway, I’m almost done with my tutorial video. I have the slides ready so now I just need to get my voice to make the right noises.

@jacobeverist Thanks again for helping me with my project. The code is much cleaner new so reading it will hopefully be a little less laborious. I haven’t been able to implement either my idea or your idea for getting multiple neurons to share feed-forward proximal dendrites. I’ll have to shelve it for future upgrade whenever I have the time. For now Simple Cortex works and I need to start shifting my focus towards wrapping it up and using it to find employment.

Dave


#21

That’s great! The predictions look much more reasonable and understandable now.

The README looks great too. For your image, I recommend making a box for your Forest instead of lines, and I would also recommend drawing a few lines from your stimulae to the synapses so you know where the data is coming into the system and where the data is coming out.

You know, since it’s OpenCL, you might want to add some simple benchmark results on performance in comparison to the NUPIC tool. It would be very interesting to see what kind of performance gains you can get from using GPUs. That’s part of the benefit you were proposing with your approach.

If you can’t really compare it, at least specify a problem size and then give the benchmark result in time with your specified hardware.