Jeff Hawkins on Scale and Orientation in Cortical Columns - 30 November, 2020

The book “perceptrons” showed convincingly that layers of units could be replaced by a suitably constructed single layer. This had limits such as being unable to perform the xor function no matter how many layers you used.

This book effectively killed research in the area for a long time.

True believers kept at it and eventually worked out how to surpass this basic limit.

Adding the limiting activation to the mix allowed transcending the limits of the classic perceptron. It was now possible to form islands of meaning between the layers.

Much the same benefits with layers of HTM modules. Adding the H to HTM radically enhances the representation and computation possibilities. It’s not the same thing then - it does more. The SDRs are now able to pool, both spatially and temporarily and these pools to be sampled to form conjunctions of these semantic meanings.


I never found perceptrons convincing and always saw that entire body of work as a blind alley.

I don’t find ‘layers’ a helpful concept. There are anatomical layers, but they are an artefact of the way that the cortex has evolved. And I haven’t seen anything to convincingly justify the H in HTM either.

In my view the broad organisation of cortex into columns speaks to multi-processing. Large numbers of columns doing the same processing job requires a common data representation, and SDRs fill that need. The representation of sequence memory that learns by prediction as ‘stacked’ SDRs is credible. Something similar seems to apply for location, although I don’t find it as convincing yet.

Rather than a hierarchy or layers I am left with a mental model of SDRs that:

  • in the sensory areas represent raw sensory inputs and successive refinements
  • in some higher centres represent more abstract properties, concepts and plans
  • in the motor areas represent broad motor intentions and successive refinement down to individual muscle movements.

It’s SDRs all the way down. But this is a computational model and while we have a data representation (totally unlike any computer we know) we do not have an ‘instruction set’ (also likely to be entirely novel).

I’m guessing there will be 10-100 ‘instructions’ that represent ways of generating SDRs, of which we know of or can guess at a mere handful. I’m guessing both SDRs and ‘instructions’ will be found in far simpler brains, from which the cortex has evolved. That’s where real science comes into its own.

1 Like

I’m sure that the deep learning people will be shocked to hear that!

Seriously- that branch of research captures some important properties of the brain.
HTM captures some different properties.
The brain does use elements of both technologies and they are both worthy of study.

As far as localization of function- there has been some very fine-grained work in this area. If this interests you I would suggest looking into the work on the connectome.

I will add that the preservation of topology and the hierarchy of maps in a unifying theme though much of the cortex. This allows the possibility of a spot in an early map to project to a higher level and still be meaningful in processing at that higher level.
The first speaker in this series (Marge Livingston) points out implications of this in her talk on Category based domains. Pay special attention to the bit about how information that goes DOWN the hierarchy trains the lower levels, even if the related sensory stream is not present. The implication is that the connection scheme is critical to forming categories, and not as much the content as is normally assumed.

1 Like

By dead end I mean: not on the path to AGI. ANNs and their ilk are astonishingly successful at recognising and classifying, given large amounts of labelled training data. They can do things no biological system ever could, but it’s still a dead end. And no, I have no reason to believe they capture important properties of brains. Processing sensory input: just maybe, but beyond that: no way.

The connectome tells me just one thing: brains are packed full of neurones, which are deeply inter-connected. I read the book: it takes us nowhere.The basic precept is wrong: we are not our connectome.

That video is 4 hours! Sorry, but if there is something relevant in there you’re going to need to pinpoint it for me.

I’m a software guy with a medical/scientific background. I see multi-processing, a data representation and a storage mechanism, and I look for the software, the instruction set, the programming language. People without my background think that neuroanatomy and connections will get us there, but they’re dead wrong, that’s just the hardware. If I give you the full wiring diagram of the computer you are using right now, you know nothing about what it does or how it works. That’s software.

So go find the software for a maggot brain and we’re on the path to AGI. Sooner or later we’ll leave the ANNs in dust.

1 Like

What I don’t find quite convincing in the argument “reproduce the hardware and it won’t do anything, you need to understand&replicate the software too” is that software happens in the realm of physics too, it isn’t projected from a platonic ideal realm.

If you actually make an accurate copy of a computer you will have to include charge states on its SSD or magnetization of its HDD. And there is its software! It will work without having to understand how it works.

If you replicate a brain you will replicate synapse position, size, and whatever is relevant to a synapse state, and there is its software.

What is funny is there are greater chances the brain replica would be a functioning one since it is much more robust against losses and noise. With lots of errors it will exhibit dementia but computer replica would not work at all with very few replication errors


Marge Livingston is the first speaker in the video, you can safely skip the rest.

In “normal” computer programs you have data and the sequence of operations that are performed on that data. You have a collection of locations where the data is stored and you specify the sequence of those operations. Data in a stored-program computer is accessed sequentially and the operations performed are sequential with the output from one step being the input to the next step.

In the brain, data and operations are mixed together. The connection ARE part of that computation. As you indicated, the pipelines of data are all run at the same time. The processing is still sequential, but massively parallel.

Since the brain connections are essentially fixed, data is not accessed and stored, it flows from one processing stage to the next.

The connections are part of the algorithm.

For this reason, I see that understanding the connection scheme is a critical part of understanding the computation.

For the basic computations I have compiled a list of the operations that I can see that the CC is doing here:


This is becoming quite the art assignment. ;-).

I think you’re missing Jeff’s point.

Lets say you have a coffee cup. And then you have three pictures each showing one cup. One shows a cup in another color. One shows a cup with a slightly different shape. And the third one shows exactly the cup you have in front of you.

You recognise that third cup. Even though it may look smaller on the picture; even though the ear is facing another direction; even though the picture is taken from a different angle and with other lighting conditions, you still recognise your cup.

How does the brain do that?


I am also in the “collection of features” camp.

I see that in the association regions the stream of feature & displacement is the thing that accumulates to object recognition.

The visual scanning primitives have been reliably documented in many studies of saccades.

This subconscious saccade mechanism force-feeds a constant sequential stream of feature and displacement data to the visual stream to be assembled into object recognition. This makes use of the spatial and temporal pooling mechanisms that is thought to be the key operations of the cortical column computation.


As far as recognizing the image on the screen as not an actual object, I think that is a learned thing.
I think that until this screen thing is learned a pipe is a pipe.
Or an ant is an ant.
I don’t have this in a maggot example, but in this case, I do have a frog:

Note that the frog is nailing the faux insects!


Physics has no software (that we know of). It represents a set of more-or-less inviolable rules to get from one state to the next, and the next state is probabilistically determined.

Brains are different. They receive inputs and decide upon outputs. The decision process is modified by previous states ie memory. It takes software to do that.

Computer state is indeterminate at the quantum level. As I’ve said many times: the past is immutable, the present is chaotic, the future is probabilistic. Your thought experiment fails: it cannot be done, you cannot replicate quantum states. And even if it were, you understand nothing about how a computer works.

My point is: you do not see what I see. To me it is blindingly obvious that we know a lot about the hardware, a little about one particular data representation (SDR) and almost nothing about the software. You don’t see it, but it’s there.

@david.pfx Are you proposing a way forward or just commenting that the people trying are somehow doing things wrong?

If it is the “way forward” thing, can you elaborate?

What you call ‘normal’ is a reference to the von Neumann architecture, and there is no reason to believe brains use anything like that. So forget that whole line of reasoning, it doesn’t apply.

At some abstract level:

  • NPU is a neuronal processing unit, possibly a column, consisting of hundreds of neurones. NPUs are general purpose: they are all the same (within a region).
  • Connectome includes all the connections inside a NPU, and the connections between NPUs
  • SDR is an atomic unit of data exchanged between NPUs. Its physical structure might be represented as (say) 15 ON bits and (say) 200 OFF bits.
  • An NPU instruction determines the output SDR: which output bits get switched ON in response to some input given the current NPU state. One example that comes to mind is a clock generator: an instruction that would cause a general purpose NPU to emit an SDR at regular time intervals.

As wild speculation, we might find the instructions in junk DNA. DNA is the instructions for generating proteins, so it has form in this role.

That’s what I see. If you don’t, it’s because our perspectives are so different.

Are you doing anything with your special knowledge or is this just an exercise in letting everyone else know they are somehow doing things wrong?

If you have anything concrete to offer as to what people should be doing it would be welcome. Telling everyone that you have special insights and that what everyone else is doing is wrong is not constructive.

What is it you want me (and others) to do with your special knowledge?


Sorry, wish I could help. I’m an engineer, which means I can’t do the research but I can use it to do useful things. This is what I see, but now all I can hope is that others see it too, and go looking in the right places.

Can you point to those places and explain why you think they are important?

Where is the instruction stored in this conception?

In HTM you could imagine the NPU as hyper-column, having a functionality that allows it to predict spatio-temporelle patterns by learning.

The concept of “instruction” comes from the Turing conception of computing. This analogy seems inappropriate for a learning system. Turing’s model assumes an infinite instruction tape generated by something that is not part of the Turing machine. In an intelligent system learning is like have a loop from the output of the Turing machine to its input - as if it generates it’s own program. This is why I think a Turing model is inappropriate for building learning systems. For many people the definition of computable is a Turing model, which makes it is reasonable to claim learning systems are not computable in the classic sense.


The Turing machine manipulates symbols on a tape but the instructions are stored elsewhere.(unspecified). It was the von Neumann architecture that stored instructions in the same memory as data. The Turing machine could ‘learn’ by storing symbols on the tape, not by modifying its instructions.

If you want to express the concept in these terms, each NPU is a Turing machine, the symbols on the tape are SDRs and the instructions are what we need to find. This model seems highly appropriate.

Imagine for a moment that the instructions are encoded in (junk) DNA. Each NPU is executing a DNA program in the same way that a ribosome uses a DNA program to build a protein. The programs are developed by evolution, inherited as DNA on chromosomes, transcribed into mRNA for execution by the NPU. Why wouldn’t that work?

1 Like

The input of the Turing machine is specified by a programmer. An assumption of Turning is that if you can provide a definition of the problem in terms that can be translated into (a potentially infinitely long) input tape then it can be computed. Note how this shifts the burden of implementing a learning system from Turing to the programmer. An intelligent system can’t be specified in this way.

The Turing model does not scale in the way you imagine. A network of Turing machines is still a Turing machine - it still needs a program at the input.

The symbols on the tape going into the Turing machine are the instructions. The Turing machine transforms the instructions into logical operations.

1 Like

I really think you need to update your understanding of Turing machines. Wikipedia is as good as any: the original TM consists of a tape (for symbols), a read/write head, a state register and a table of instructions. Please read. [You may be thinking of a Universal TM, but that’s not what he wrote.]

Not that it matters: HTM tells us that an NPU takes SDRs as input and output, and that it can update local storage so it can learn sequences. All I claim is that this function is controlled by running a little program, and that the same NPU will perform a different function if given a different program. It really isn’t rocket science.

That’s why all the columns look the same: they are the same, bar the program and the accumulated state.

And yes, an intelligent system is specified by the program fed into the machine, not by the machine itself.

1 Like

This is a fine hypothesis (and I always welcome a good prod from time to time, to remind me to check my assumptions), but of course the next step is for you to try and falsify it (versus arguing that it is obviously how the system works). For example, if you make the prediction that the programming might be controlled by “junk DNA”, the next step is to find evidence that supports that prediction (to make sure it can actually hold water), and then rigorously look for evidence that refutes it. When you find refuting evidence, then further refine your hypothesis.