The discussion over here got me started on one of my HTM rants. It’s possible that I’ve aired variations of this on the mailing list before: It’s both one of my biggest problems with HTM and yet not exactly a problem with HTM.

Input to real neocortex is always heavily preprocessed. We’ve been talking about color but the story’s similar for all sensory input: there’s circuitry between the sensory periphery and cortex that filters and shapes the activity. This generally looks like a statistical cleanup of the sensory input, filtering out lots of information that’s unlikely to be behaviorally relevant and narrowing in on Barlow’s “suspicious coincidences.”

The resulting input to cortex isn’t just sparse and distributed, but whitened (sort of) and heavily filtered to account for the natural statistics of the stimulus space. Is it reasonable to assume that a model of cortical processing could perform well with input where the only statistical constraint is sparseness? (I suspect not.)

It might be that HTM can do pretty well on some problems with naive SDR input, but I can’t shake the suspicion that clever preprocessing would at least make the system more performant and maybe allow it to solve new classes of problem.

I don’t exactly have a question here; it’s really just a rant. I don’t have a specific plan of action, either. I would like to hear if anyone’s thought hard about the statistical aspects of SDRs, either in general or for a specific domain.