A Theory of How Columns in the Neocortex Enable Learning the Structure of the World

Hi, Sheiser1
Thank you so much. Its really a nice explanation.

2 Likes

Agreed. Very nice.

There is a problem with terminology. You have used words with neuroanatomical meanings (basal/apical, L2/3/4/5/6, depolarise), but it’s not clear what these are intended to mean in an HTM context.

It’s not clear to me how a ‘column’ or a L2/4/5/6 column relate to a mini-column, hyper column or cortical column mentioned elsewhere.

It’s not clear to me what form the 3 kinds of input take, or where SDRs fit into the whole thing, or what form the output takes (when something is recognised).

In the HTM context, a cell is depolarized when it is put into the predictive state. This happens when any of the cell’s distal dendrite segments activate – meaning enough synapses on any segment are active. This is the TM’s job, to connect any activated cell to a set of prior active cells (from t-1), by storing these prior active cells on a distal dendrite segment.

These cells all belong to mini-columns. Each layer/region is composed of a set of mini-columns (2048 by default in NuPIC). At every time step a fixed percent of those mini-columns activate (2% by default, so about 40 of the 2048). Which 40 these are is decided by the SP. Each mini-column is connected to some subset of the input space (its receptive field). This input space is often an encoding vector but not necessarily, tho always a list of indices which are active within a vector. The SP assigns an overlap score to each mini-column based on how many of the bits in its receptive field are currently active. The 40 mini-columns with the highest overlap scores activate and inhibit all other mini-columns at that time step.

So each HTM region is composed of a set of mini-columns, which are composed of a set of cells. The cells predict their own imminent activation (depolarization) through their distal dendrite segments, which monitor sets of cells that activated soon before. So if a TM has learned from the sequence: ABCABCABC, then cells representing “C” will depolarize when “B” arrives, because the “C” cells’ distal dendrite segments have learned to monitor sets of “B” cells.

This is where SDRs come in. Each input like A, B or C is represented by a certain set of mini-columns (some 40 of the 2048). These representations are “distributed” across 40 bits, which is “sparse” relative to the total 2048. This sparsity leaves room for many different inputs, which may be totally distinct or may overlap somewhat.

In the ABC example there’s no overlap since A, B and C are distinct categories with no similarity - so they are each represented by distinct sets of 40 mini-columns. However if the data type allows for semantic/qualitative overlap, the different inputs could share some bits. For instance, lets say instead you have numerical inputs which may be in the range of 0-10. In the sequence: 123123123 there would be overlap in the representations of 1 and 2, since they are qualitatively similar. There would be no overlap though between 1 and 10, since they are not similar given the 0-10 range.

Basal and Apical inputs are just ways of saying depolarizing inputs. They are inputs to a TM, informing the decision of which cells in the active mini-columns to make predictive. The difference (afaik) is that a cell’s basal segments monitor other cells from the same layer/region, while apical segments monitor other cells from another layer/region (Like the effect of L2 on L4 in the prior post).

So each layer/region (like L2/4/5/6) is comprised of a set of mini-columns which are comprised of cells and compete for activation through their proximal dendrite segments. Each of these cells creates and updates a set of distal dendrite segments, to monitor other cells which activated soon before.

The macro-column (synonymous with cortical column) is the largest structure, which is comprised of a set of layers/regions that connect to each other in some way. One example of this is shown in the L2456 network in the prior post. Each layer/region has some activating input (SP choosing mini-columns to activate) and some depolarizing input (TM choosing which cells to make predictive). Some of the regions like L4 and L6 are activated by Sensor inputs (raw data sources), and some of the regions are activated by the outputs of other regions (like how L2 is activated by output from L4). Connecting layers/regions together into these macro-columns allows multiple regions containing different information to inform each other.

I haven’t actually heard the term hyper column, but I bet its also synonymous with macro/cortical column.

3 Likes

Hi, sheiser 1

Have found two algorithm in Nument’s git repo
1.apical_dependent_TM.py
2. apical_TiebrekTM.py

But, what if in case of sequence learning.

Can you just tell me one thing in case of sequence learning in L4-L2 feed forward network which algorithm will you suggest me

For example if i want to train this sequence
12345,12345,6783,6783,12345,6783
in L4-L2 network for apical feedback which one you will suggest me
apical_depedentTM or
apicalTiebrekTM?

if you can suggest me the proper algorithm from your experience i will be really grateful.
Thank You

Very nice. Ever thought about adding that to BAMI?

But that’s just a solid explanation of SM, covering roughly the same territory as Building Brains to Understand the World’s Data in 2013 and the videos in HTM School. Does HTM theory now extend to anything else?

  • Is there something comparable for location data, like the finger on the coffee mug example?
  • How far does sequence recognition take us down the path to feature or object recognition?

It looks like the default for L2-L4 is apical_tiebreak_temporal_memory, here’s how I know:

In the l2_l4_inference script the default L4 RegionType is * ApicalTMPairRegion*

Checking the source for ApicalTMPairRegion, I see the default implementation is ApicalTiebreak.

Further down in that script there’s logic for choosing which TM implementation to use:

So the default implementation is ApicalTiebreak.

1 Like

Thanks! I hadn’t considered it but am certainly willing to, especially if you think it’d enhance BAMI at all.

Yes! That finger on the coffee mug uses a set of (L2-L4) macro-columns. The source for that is in a research repo here:

Jeff talks about it in this talk also (starting at 16:20, diagrams starting at 18:35 until about 22:00)

The locations paper seems quite relevant too:

According to Numenta’s theory it seems quite far down the path!
From what they say it seems that bio-realistic physical object recognition requires sequences of sensory features at locations on the objects.

As I understand, each cortical column is sensing a small slice of sensory space, maybe the tip of a finger or a certain patch of the retina for instance. So no one column can see/feel the whole picture/object. This limitation requires the columns to share information with each other in order to reach consensus on what’s actually there. This is born out in the fingers on coffee mug example, where 3 columns (each sensing a different part of the mug) can collectively recognize the object much faster than any 1 column could.

This short video visualizes it well I think (starting at 3:13)

There’s also their newer theory of displacement cells (shout out @mrcslws), as discussed in this most recent paper in case you haven’t seen it yet.

As always I’m eager to be corrected or built upon by anyone here.

I suspect the reason most of the newer TBT parts of the theory are not yet in BAMI is because this part of the theory is still evolving rather quickly.

I wonder if it would be worthwhile for us that are deeper into the theory to write a community version of BAMI. If we can get enough folks involved, it may be easier for us to keep it updated more frequently than BAMI. We’d need to agree on some threshold for the newest ideas to make it in though (since there are often conflicting ideas being explored at a given point in time, and things that are touched on then abandoned rather quickly)

3 Likes

Hi, Sheser 1,

In case of sequence learning ( like ABC, CDA,ABC,…) in multilayer in L4_L2 network it is clear that
L4 region has = SP + TM + Apical feedback from L2
But I am confused about what L2 region has.

Is it
L2 = SP+TM
or
L2 = only this poller algorithm ColumnPoolerRegion.py but no SP & TM ?

And if L2 has only Columns Pooler can you please tell me how this algorithm works with incoming Active cells sdr from L4?

1 Like

Hey @MukitCSTE

L2 is essentially just an SP, since it has activating input but no depolarizing input (basal or apical) as L4 does.

Each SP column must have a receptive field – a set of encoder bits in the common case, but in this case its a set of “predictedActiveCells” from L4. Each SP column’s receptive field has some number of its bits active, in this case all those contained in “activeCells” from L4.

So outputs from L4 to L2 (“activeCells” & “predictedActiveCells”) should be arrays of shape (2048, 32), while the outputs of L2 (“apicalInput” to L4) should be an array of share (2048,1).

1 Like

Hi david, this might be helpful.
Column is the macro-column (in Wikipedia also known as hypercolumn). In HTM context, the cortical layer is also known as HTM region.

In HTM we create the slice of the neocortex called HTM region (or HTM layer) that consists of a set of mini-column, that consists of a set of HTM (pyramidal neurons) neurons. HTM Network connects HTM regions (HTM Layes) to the larger entities called columns (marco-columns).
Please note that most of technical HTM documentation and HTM papers very often use a term ‘column’, instead of mini-column. This is sometimes confusing, but, when talking about SP and TM, we almost always mean mini-column.

3 Likes

sheiser1,

Did you mean that?
Here arrays of shape ( number of rows=2048, number of columns=32)

But one thing has made me confused. See for each cycle iteration from TM of L4 to L2 SP we are passing Active or Predictive cells. And for each cycle for each input iteration at layer 2 predictive active cell will also change over each iteration. So if L2 doesn’t has any TM only SP as you said earlier, so how the CONTEXT will work at L2 for detecting large sequence at upper layer?

Thanks @ddobric, that’s a nice diagram.

Jeff Hawkins said to do machine intelligence we must:

  1. Discover operating principles of neocortex
  2. Build systems based on these principles

Neuroanatomy belongs to part 1, but my focus is part 2. The central part of your diagram is what helps.

Niklaus Wirth said Algorithms + Data Structures = Programs. At present I am aware of:

Two data structures:

  • Sensory data, consisting of analog values (pulse trains) on individual sensors (fibres)
  • SDRs, consisting of (say) 40 bits set out of 2000

Three algorithms

  • Encoding of sensory data (will vary by sensory input)
  • Union of SDRs
  • SM, recognition of SDR sequences over time.

Nothing here about location, wo where does that fit?

Anything else?

Location?
Watch this and see if you can work it out.

2 Likes

Thanks. That’s high quality research material on a very promising line of enquiry.

And no, I can’t work it out. I’m more down the engineer end of things than lab white coat.

1 Like

Hey @MukitCSTE,

It’s the reverse, 2048 is the number of columns and 32 is the number of rows (cells in each column). By default 2% of columns activate (so 40 out of 2048), and one cell in each column is chosen as the winner cell out of 32.

If any of the column’s cells are in the predictive state (depolarized) when the column activates then it is chosen as winner. If none are predictive the column bursts (all cells activate) and the winner is the cell which best matches the prior activation (was closest to being predictive). If no cells have any overlap with the prior activation then a cell is chosen at random amongst the cells with the fewest existing distal dendrites.

Good questions, I’d wonder about them too.

So I dug into the ColumnPooler more, and it seems its much more complex than I said before. It’s true there is no depolarizing TM-style inputs to L2 from other Layers, but it can do TM learning internally (like a common TM region).

Here’s where I see this, in ColumnPooler.compute():

Since predictedInput is not passed in to ColumnPooler.compute(), the first condition never enters, as seen here in ColumnPoolerRegion.compute():

This is more complex the more I delve into it, so I don’t have a complete answer at this point, but I do know that TM learning in L2 depends on there not being too many active cells.

If activeCells < minSdrSize the input is considered un-recognized and a new representation is learned, and if > maxSdrSize the input is considered ambiguous and learning is not done.

I think these are the function to look into, in order to really understand to its core:

Yeah. Its really complex to realize. I guess the ColumnsPoller.py is doing the real game here i mean at layer 2 for contexting. It will be better if we get a papper from Numenta regarding this algorithm principles

1 Like

From that video, I see that there is a lot of experimental evidence showing the various cells described in the EC are more global in nature, and that the hippocampus is necessary to create specific contexts. This is contrary to some of my previous understanding (for example, I thought grid cell firing patterns to be specific to the room – but seems they do not actually start out that way). It makes me question the theory of grid cells in the neocortex firing in patterns that are specific to the object – seems like we would need to be looking for a similar mechanism to EC>hippocampus (not just a copy of the EC functions alone)

One experiment that would be interesting (though logistically difficult since it would need to be measured in humans), would be to see if the various types of EC cells described which are global in nature, also fire when a person is playing a video game involving moving a virtual character around an environment.

Another would be exploring a miniature room with a finger. Both of these involve “placing oneself” into the perspective of something else (the virtual character or finger), and presumably operating on a higher level of abstraction than directly walking around a room.

It would be interesting to see if one/both of these older structures of the brain are necessary for these more abstract tasks, and would certainly be applicable to some of the current HTM theory.

2 Likes

There have been several humans that have little or no hippocampus function. HM was tested on many tasks and the results are available for your review. I don’t know for sure but there may be some tests that address your questions.
I will add that videos of these patients that I have seen do not show any gross defects in spatial perception or manipulation. Motion through the space of rooms and between rooms seems normal to me.
This does not tell me what the firing pattern on the EC is but it does place some limits on the interaction between the various parts of the EC/HC complex could be.
Note that Moser mentions in various talks that they severed the connections between various parts of these structures when trying to learn more about place cells. Reading those papers could shed some light on what theories are plausible. I don’t have links to any papers from this time period but I suspect that they must exist.

2 Likes

On second thought, there is evidence of that also presented in the Moser video above (starting around 16:49)-- he mentioned an experiment where they lesioned input that had passed through the hippocampus, and still observed that the firing patterns were not very noticeably different (which is what lead them to explore medial EC as their origin). This is a good indication that passing through the hippocampus isn’t necessary for generating “specific to the room” grids in context.

So the question remains as to where the “specific to the room” context might be coming from. A couple possibilities come to mind – maybe the cells in hippocampus that EC projects to are directly generating context (but from what?). Or maybe there is a step between where they were measuring in the EC and the hippocampus.

2 Likes