The empty brain (an essay)

In contrast to DL, HTM seems to fit in this view. Probably you already know this essay. IMHO, is a interesting view …


This has come up before I believe. One of the many problems with that article, in my view, is that Epstein seems to view information and the processing of information very narrowly. When I look closely the article seems to me to dissolve into trivialities:

If by “information” he means the Shannon sort, where you’re communicating across a channel, then your brain obviously processes information. What else does he think those spikes are doing?

If by “computer” he means a programmable Turing machine, then (aside from the trivial sense in which any sufficiently complex lump of matter can be programmed to do Turing-complete computation) your brain obviously is not a computer.

If by “retrieve knowledge” he means activating relevant learned representations in response to internal and external cues, then this is obviously something the brain is doing.

If by “store memories” he means making changes in response to experience such that those representations are likely to be reactivated under relevant circumstances in the future, then again, obviously the brain does this.

The thing is, he doesn’t mean those things. He means symbolic algorithmic computation, which no one seriously thinks is happening in the brain in the first place. I think the problem I have with his article can be boiled down to this excerpt:

Misleading headlines notwithstanding, no one really has the slightest idea how the brain changes after we have learned to sing a song or recite a poem. But neither the song nor the poem has been ‘stored’ in it. The brain has simply changed in an orderly way that now allows us to sing the song or recite the poem under certain conditions.

The “orderly changes that allow us to recall” are exactly what we mean by storing memories and retrieving knowledge. He’s just thinking like a 1960s computer scientist, and the field has moved on from such simplistic views of information processing many years ago.

Anyway, that’s my rant on Epstein.

I understand your rant (and the 500+ rants in the comments section of the original post). Although that outcome was predictable, the author was bold enough to say something so loud against the mainstream.

My understanding is that most of the current information processing is based on a highly reliable substrate (both from hardware and software standpoints). This reliability is achieved at a cost: poor power efficiency (and costly development, and production). Fleshware can’t go there (because the substrate is extremely fragile and the environment is orders of magnitude less predictable). Someway the evolution has provided a way to circumvent this, being able to achieve an information processing system but approximate “computing”. What I understand about the post is that there is little to no accuracy in fleshware (but still is capable to identify the current inputs to predict accurately what comes next).

I’m not aware of successfully approximate computing approaches, beyond toy tiny examples like EnerJ or specific applications such as TPU (in most cases applied to the data flow). There is some people working in approximate instruction flow (like “opportunistic” thread synchronization), with little real success: we don’t know how to go ahead if the instruction flow is imprecise and/or not deterministic. The beauty of fleshware is that still works.

I agree with the author. In 10-20 year, we will look back and see the today’s mainstream as we see the pneumatics. Theory is important but as Santiago Ramón y Cajal said about theorists, “When faced with a difficult problem, they feel the irresistible urge to formulate a theory rather than question the nature…” [and here, in the opinion of the author, the nature doesn’t fit the theory]

PS: IMHO, fits in HTM.

The article raises some valid points, and most of the narrative seems to fit HTM.
I would like to highlight instead the parts that challenge HTM or at least our current understanding of it.

Worse still, even if we had the ability to take a snapshot of all of the brain’s 86 billion neurons and then to simulate the state of those neurons in a computer, that vast pattern would mean nothing outside the body of the brain that produced it.

Why so, the author does not explain well. Indeed, this is a bit philosophical (reminds us of the p-zombie) but according to HTM, consciousness is just a pattern of active neurons, right? Mathematically, it is a relatively stable SDR somewhere up high in the hierarchy. When we simulate this pattern on an arbitrary substrate then we should get the same consciousness ‘working’.

To me this poses an interesting question:
How much does consciousness depend on stable representations and how much on changing representations? The most stable representation is a fully static representation i.e. a set of neurons are active all the time and no other neurons become active in sequence. Is there any evidence of static neuronal activity, at least for an interval of several seconds? Would that be associated to a feeling of being conscious yet somehow frozen in time?
We can say consciousness is a relatively stable feeling but also constantly changing, so I think it’s fair to assume that a sequence of representations is still essential in order to ‘feel’ something, but in any case we can simulate the sequences too, as HTM is already successful in doing so, and this brings us to the next point.

To understand even the basics of how the brain maintains the human intellect, we might need to know not just the current state of all 86 billion neurons and their 100 trillion interconnections, not just the varying strengths with which they are connected, and not just the states of more than 1,000 proteins that exist at each connection point, but how the moment-to-moment activity of the brain contributes to the integrity of the system.

Again no clear explanation is given, but according to HTM by knowing the current state (context) and the varying strengths (the learned world) then we can predict the future states therefore be able to know the coherent sequence of states that the brain undergoes and which represent the conscious integrity. There is also evidence that the 1000 proteins represent biological details which are not fundamental to the intellect, but of course that may be until contrary evidence is found.

At last, supposing that there are indeed certain yet undiscovered biological details that don’t allow running HTM in its fullest form on a classical computer, there’s probably still nothing stopping us from building a new ‘computing’ architecture (based on memristors or even quantum devices) that can sustain it.

I think we’ve deliberately tried to stay away from statements about HTM and consciousness, because that’s not the problem we’re trying to solve. Intelligence is different from consciousness, IMO. You can have the former without the latter.

It is certainly an interesting discussion, however.


A post was split to a new topic: Intelligence vs Consciousness

I take your point in that as users we can safely assume that a computer is a reliable information processing substrate. However, the entire impetus of Shannon’s information theory was to be able to reliably transmit information across arbitrarily unreliable channels, and that includes low-level CPU operations and so on, which are actually very unreliable at the hardware level. In that way, unreliable hardware is analogous to unreliable fleshware, and the reliable outcome of Shannon’s error correction is in some sense analogous to the robust agent-level behavior that fleshware produces. So I’m not sure whether they’re as vastly different as you suggest.

If you look at something like deep learning, although the information processing is generally reliable (intentionally low reliability of GPU operations notwithstanding), the input to the system is not reliable. The system has to learn approximations that are robust to the variation encountered in the data. And as a result, unlike the brittle nature of traditional programs, modern machine learning systems are generally robust to perturbations of their state. You can go into a deep network and randomly adjust the values of the hidden units by small amounts and the system will still perform, because it’s been trained on noisy data. See adversarial research for examples of where this can break down, but obviously there’s still work that can be done that doesn’t require abandoning the idea of information processing.

I strongly disagree that we will look back on the current state of computational neuroscience the way we look back on pneumatic theories of the brain. We may however look back on it the way we look back on Newtonian physics in light of relativity: the current theories are a special case of a more general explanation, and although technically wrong, they get 95% of the way there.

As for HTM, my opinion is that it fits quite squarely into the information processing school of thought, and I view that as a good thing.

If I understand correctly, you imply that instruction execution might be unreliable. Data might be unreliable but instruction execution can’t. Hardware should be (at functional level) reliable and deterministic, otherwise nothing will work. Data might be unreliable and operations approximate, but hardware never can be. In the frontier of sub-10nm one might think that a cache cell might work or might not, that some sun particle might flip a DRAM cell, or that a wire in a ALU can suffer crosstalk. In fact, all those things can happen, but the hardware has the (costly) mechanisms to prevent or detect and correct. For example, process variation can make the threshold voltage of transistor change from corner to corner in the die. To guarantee that all bit in the cache are correctly stored we should raise Vdd (at the expenses of static power). DRAM and (many structures in a processor, from caches to register files) prevent data corruption using ECC or other costlier codding approaches, i.e. they cost area and power. My point is all that requires energy and transistors. Some similar thing can be said about the fabrication yield. In the ramp-up of a new process, the number of correct dies can be quite low: a tiny defect can render the whole chip useless.

I don’t know if Shannon theory fits here. Apparently, no one has been able to figure out how to solve these problems at zero-energy and/or zero-cost. Somehow, fleshware seems to do it :slight_smile:

GPUs suffer from the same problem.

Perhaps you are right. I admire the elegancy of H&H model and Cable Theory, but we should look at the facts. H&H predicts nicely refractory period, which seems to explain sparsity (and SDR). But using a dendritic tree to build a classifier (using dendritic integration), seems to be quite pointless. I mean you can use the theory in the “good” way… to understand better how Na+ and K+ channels interact but if you run too much ahead (jumping the important intermediate unknown facts/steps) … you will never understand how fleshware works. The problem is that there is a million of intermediate steps: some of them important some of them not. The million-dollar question is which one we can skip.

PS: I’m not saying that computational neuroscience is useless, just that is far a part from the real thing.