Is there a Best Way to understand data?

TLDR;

  • is there an overall ideal way to understand information (creates the best predictions in the long run)
  • is it something we can even know?
  • if evolution was not constrained by the physical world, how would the ideal brain (in mathematical abstract space) be arranged (algorithm and data structures)?

I’ve been following HTM for a long while, as a matter of fact, it was my first exposure to the world of machine learning and intelligence. :blush:

I’ve been wondering lately about how much of the HTM system (data structure + algorithm of how the brain processes data) approximates the ideal way to manage data and how much of it is an expression of physical constraints.

In other words, I’m considering the idea that there are two kinds of constraints evolution is contending with:

  1. informational constraints (ie. this cannot be computed until we know the answer to that) and
  2. biological, or physical constraints (ie. neurons can only reach this far away, etc.)

If all the physical constraints were blown away, how much would the theory, or HTM data structure and algorithm change?

It seems to me that data is inherently temporal. (Not that all data is time-series data, but that no data can be seen outside of time, so times-series is actually the norm in a world that is changing, which is any world.) The only hook evolution has to claw itself up the intelligence mountain with is prediction, and prediction is only available to something existing in time.

Data Science and machine learning don’t seem to excel at time-series data and I think part of the reason is that it’s seen as a subset of types of data, not a necessary requirement to the very idea of “information” in the first place.

So, intelligence to be necessarily informed and formed in time, we begin to wonder, is there a universally best way to understand data throughout time? Now, perhaps best isn’t the right word.

They say in the world of finance, “nobody can beat the market consistently.” I don’t know if that’s true, but that’s kind of the idea I’m coming from. Is there an overall, most efficient, safest way to build intelligent systems?

Is the ideal intelligent system, in other words, an algorithm itself? Is there a meta-intelligence algorithm that defines exactly how to encode new memory (depending on what it is) given any series of historical observations?

Can we even know if the answer to this question is yes? It seems like it much be, ‘yes,’ it seems like it must be provable that there is in fact, a peak to the mountain or a global minimum. I suppose the real question, the useful question is, can we know it when we’ve found the peak, or can we only know if it’s local?

If there is a “best way” to understand (to see it in its own light) data, given data, what would it be? What are the kinds of predictions it would produce?

Jeff has talked about using the knowledge gained by reverse engineering the brain to extracting the ‘Information processing principles.’ We seem to have a vague outline of what some of those principles are, but perhaps they can be refined or even wholly deduced from first principles exemplified through thought experiments.

One principle in HTM is “semantic representation.” Everything needs to be encoded in memory semantically. But why? Because the computational structure is the memory structure, therefore different areas of the memory structure need to understand one another, they need to seek a common, evolving language. Each region, at some level, acts as a translation element from all the regions it hears from to all the regions it talks to. That translation is what computation is.

It seems to me that we can deduce that intelligent systems are computational-memory structures. And that is to say, they’re “networks.” They’re comprised of nodes of memory that change each other.

In other words, principles that inform optimized distributed computation are the same principles that comprise intelligent systems. But that’s just one example of what I’m talking about.

Anyway, maybe I got off track there, but the main question is - given a historic set of observations, (and observations made of a series of symbols), is there an optimal way to predict future observations? If so, what are the principles that determine what that optimal way is?

2 Likes

I have been thinking about the same things from the “bottom up” for a while now.
I like that you are taking this from the “top down.”

This was were I started and it burned out without much going much further:

From an earlier thread that was slightly related - how much computation does a critter need for the hardware it is equipped with and the environment it inhabits? This is the computation that must be performed with the spine & lizard brain augmented with special hardware like the cortex. (a math coprocessor?)

1 Like

I feel like if you had 1 observation in your history and you made a prediction of the future you’d be constrained to merely predict exactly what you’ve seen.

That makes me come to two conclusions:

  1. We could measure our “best way” of predicting the future by holding constant the relative number of observations it needs to predict into the future (predict as many observations as you’ve seen into the future).
  2. More importantly it seems that prediction is fundamentally an identity function. You’re merely changing what you believe the identity of what you see is over time.

Speaking of 2 above, that means you’re learning to predict patterns. Patterns are series of symbols that theoretically (is predicted to) repeat.

So if you’ve only ever seen “1” as you’re history of observations, then you’ll predict “1” as a pattern that continually repeats, but if you’ve seen “10” then you’ll predict “10” What else would you predict?

But what happens you something changes like 1,0, 1,0, 1,0, 1,1,1
You don’t want to just memorize one longer and longer pattern, you want to memorize smaller patterns that are simple and composible. So you start to form a hierarchy of patterns. you might interpret 1,0 as a pattern, and give it a name “A” then “1,1,1” and give it a name “B” Then you can predict “A,A,A,B,(and repeat)” is the pattern of the universe.

That way, by predicting what the “higher level” pattern is, the pattern of patterns, you begin to continue to merely predict the identity function, the universe is what the universe is.

So here, we’ve come up with the “hierarchy of composible patterns” principle without ever looking at how the brain works. It seems natural, at least to me, and this entirely naive way of walking through how to understand what something is.

I think if you continue on in this way you start composing patterns into concepts, concepts being a particular pattern of a pattern applied to other patterns. such as the concept of backwards. One pattern may be the other pattern in reverse; somehow observing the relationship between the two patterns you find a way to reason by analogy. That means you come up with the concept that patterns themselves are “functions” and you have a computational language built out of the patterns you see.

That is what is needed for distributed, incremental computation across a network - a language where the patterns in the data correspond to particular functions on the data (the patterns in the data are names of functions). That is, the memory structure is the computational algorithm. The network is function. You need a language where the patterns are the algorithm. (right now, code is not used as data, and data is not used as code, but in an intelligent system, where the memory is the compute, they must become one and the same).

This is all just wild prognostication, but it feels right for some reason, though the details elude me, yet.

3 Likes