A stream of bits, about which one knows only that it is comprised of some unknown lexicon tokenized as `n`

-bits where `n`

is also unknown, may be thought of as a temporal structure. In other words, the HTM’s challenge is to learn a 1 to `n`

bit demultiplexer which may, then and only then, be represented as an SDR.

Let me illustrate the problem by asking you to follow this recipe:

- Open a shell prompt.
- At the shell prompt, enter
`perl -de42`

- Copy and paste this one line Perl program:
`for(0..100){print ( (('00') x 1, ('01') x 2, ('10') x 3, ('11') x 4)[rand(10)])}`

- Now note a few things about the resulting string:
- One can, trivially, induce its lexicon as consisting of
`{0,1}`

. - Conditioned on this, one can also induce its probability distribution as
`{0 x 7, 1 x 13}`

– and from this we can see that lexical induction with`n = 1`

has a temporal structure. - However, one can also induce a lexicon of
`{00, 01, 10, 11}`

, at`n = 2`

, with probability distribution of`{00 x 1, 01 x 2, 10 x 3, 11 x 4}`

, but only if taken on even ticks of the clock (starting at 0 prior to the first bit). - Please note that this is, in essence, the Kolmogorov Complexity Program of the temporal structure since it is the heart of the program that generated the temporal bit string.
- Inducing a lexicon at any larger
`n`

– say`{000, 001, 010, 011, 100, 101, 110, 111}`

– will (regardless of the clock offset) produce less structure (higher Shannon entropy) hence be less representative of the algorithmic information (hence algorithmic probability) of the string.

This illustrates a trivial example of lexical induction as temporal structure learning. Despite it being trivial to state, and obviously relevant to virtually all learning (since it discovers “representation”) I’ve not been able figure out whether HTM can induce such temporal structure.

Can HTM “discover” `n`

when it is anything other than a small integer (say 2 or 3)?

If so, let me then throw one more challenge at HTM:

Let’s say the lexicon consists of tokens of 2 different bit lengths. For example, take a probability distribution of `{0 x 1, 1 x 1, 00 x 1, 01 x 2, 10 x 3, 11 x 4}}`

and learn that temporal structure.

Now, clearly, the neocortex can learn such temporal structures – at least for small `n`

– so I think this is relevant, if not key, to HTM theory.

When I challenged the Computer Science section of StackExchange with this problem – as addressed by probability theory – I got back “TL;DR: use maximum likelihood and discrete optimization.”