What I would like to discuss here is a means to make a time series AI aware of patterns that span over longer periods than their usual “history window” .
The concept should apply to any ML approach (e.g. LSTMs) but SDR-based ones might have a performance advantage (well, GPUs might say otherwise).
Let’s consider Numenta’s anomaly detection benchmark as an exemplar case.
In all data sets provided , the recommended way of encoding each record is to concatenate in the respective SDR both the specific scalar value and its associated time, with thorough detailing on hour, day of week, month, day of month, etc…
The reason for that is normal predictors like HTM are sensitive to maximum one (or very few?) dozen time steps behind, while the anomaly data set may contain repeating patterns over daily (200-300) , weekly (~2000) or monthly (~8000) time frames.
Wouldn’t be nice that instead of artificially presuming and inserting temporal patterns into data, we have a general means to detect the relevant cycles in the data itself?
That would allow not only to self-encode cycle phase info into the data SDRs, but also to make predictions over long time stamps by using these long-term patterns in a higher level, slow TM that searches changes in those long-term cycles.
I’m not talking here about predicting the following 5ms state but seeing whether data itself is cycling or not. That means e.g. if the predictor is confident it is within a 1000 time steps cycle it will make reasonable predictions 1000 time steps ahead without needing to step through 1000 consecutive steps which, in real world never succeed.
Also:
it doesn’t know if the current chain it predicts perfectly is periodic or whether it is a single repeat of a very old past series or whether there are several overlapping cycles in the current time frame.
it just homes in to the longest known cycle as absolute truth, the only option available. When faced with equally expected futures it either picks one or none, instead of signalling both.
and you didn’t mention how sensitive it is to interference, how much it remembers in noisy environment?
By noisy I don’t mean you only change a couple bits in every SDR at every step but also miss steps or add totally unrelated steps (distraction SDRs) within the stream
A cycle detector would allow the AI to “sense” there are e.g. two overlapping unrelated streams each with its own pattern and provide hints to separate them.
Look to how the brain segments data.
There are multiple connected maps, each an expert in it’s temporal/spatial domain.
As you ascend the hierarchy there is temporal pooling that segments the temporal domain.
Each sensory domain has it’s own pathway in the processing stream.
Many lateral and level skip connections allow interactions between these disparate domains.
I see people that assume that “the” SDR in a single map/area is the total HTM processing but in the brain there are many small pools that are aggragated in the hub areas of the parietal/temporal/frontal lobes into “super SDRs” with a hex organization of activation connecting the hubs.
The H of HTM building SDRs of SDRs.
Please reflect on how that can provide an answer to the question you are posing.
Thanks, I’m quite sure brains know how to handle it.
What I’m curious whether there is any HTM (or HTM-related aka SDR) algorithm implementing or proposed for this.
I’m not even sure what segmenting the temporal domain would do.
What I assume (might be wrong) we have an innate repetition-highlighting algorithm which is pretty general - by not being domain specific it can work on any point or level in the hierarchy.
Its basic purpose is to “point out” repeating patterns.
And sure if it exists it can be applied at any level.
In my best reading of the vanilla CLA/HTM/TBT algorithms, there’s really no mention of this longer-term memory retention. This is likely due to their focus on a single layer of cortical columns doing sequence prediction and anomaly detection directly on the incoming streaming data. If you’d like for me to extrapolate a bit, I can possibly give a plausible mechanism/algorithm for accomplishing longer term sequence prediction and/or anomaly detection.
When studying dynamical systems, one often invokes the concept of phase-space as a domain in which the state of the system can be uniquely represented by a set of coordinates denoting the instantaneous position and momentum of its constituent objects/particles. It’s not enough to just know where an object is, but also how it is moving (and accelerating) in that moment.
I think the separation of temporal and spatial pooling is a mistake. The SDRs needs to be generated by spatial-temporal pooling. What I mean by that is we are not just interested in learning a fixed representation for a single static presentation of a set of inputs. With streaming data, the system has access to more than just the instantaneous input data, it also possesses instantaneous latent states. These latent states are generated by the accumulation of statistics about the data and how it changes in both space and time. In LSTMs, this is captured by the recurrent layers. In HTM, it is partially captured by the temporal memory encoded in the distal dendrites and winner neurons in each column.
So, what am I proposing? There is sufficient information in the current HTM algorithm to encode the current state of the input as well as the recent past. Rather than focusing on the activation of a particular column as the output of the TM, we should probably be considering that there is additional information encoded in the specific neuron that is activated in each column. In coordination with the active neurons from all of the other nearby columns, this should provide a unique fingerprint of the current state and context.
Why does this matter? As @Bitking mentioned, there are a number of cortical regions that play an active role in the recall of specific sequences that have been previously encountered and which store associations with other sequences that share similar characteristics and/or have appeared contemporaneously on other sensors. It is my hypothesis that the associations in these regions are indexed by the aforementioned spatial-temporal SDR patterns. Each one of these indexing patterns is merely a bookmark meant to establish an anchor to a specific context. Through these association regions, the bookmark can be expanded into a sequence of SDRs that can be used as top-down feedback for what is to be expected at the lower levels. The temporal resolution of these associated sequences can be (and probably are) stored at multiple time-scales.
While I have ideas more along the line of what Eric is describing I can outline the simplest version to illustrate basic temporal pooling. Assume just two maps in series.The first is seeing a learned sequence “A” and so as long as this learned sequence is in progress the active “A” column(s) stay active. It is reacting to the changes it is sensing but the output is stable and unchanging. Now the sequence changes to a different sequence “B”, also previously learned. The active “B” column(s) are now updated to this new state signalling this new pattern.
Look at this from the context of the second map. It has only seen a sequence with two states, the “A” and “B” states. It is operating at a (much?) slower rate reacting the the output states of the previous map.
These states could be small spatial/temporal patterns embedded in a larger pattern.
This:
A) totally ignores the lateral connections between maps at the same level (TBT on steroids),
B) and level skips,
C) and further assume that the entire fully connected map form a single SDR such as the examples described in the early HTM demonstration models.
It is highly likely that there may be multiple local recognitions happening simultaneously in any given map, communicating/competing via lateral connections within the map. These local spatial/temporal pattern recognitions could form local islands of variable size/shape phase space tokens as described by Eric.
When you add in feedback/context path guiding recognition you have a much more complex system than the single HTM models we are used to seeing in the forum.
One strategy for detecting long term patterns is to use long term memory and to consciously access the relevant memories in rapid succession. This allows the brain to apply the short term pattern detection mechanisms to long term data sets. I do not think that the brain has any mechanisms for directly detecting patterns that happen over very long time scales.
And I think that humans do this type of activity very often, especially when we socialize or talk ourselves.
Yeah but when you say relevant it assumes some extra criteria or mechanism to mark or separate relevant frames from those which are not.
Probably the word “long” is a bit misleading. I mean 1000-10000 time steps (or frames) can be somewhere between 30 seconds and 10 minutes, depending on how many ms pass between frames.
Here-s the algorithm I used for first tests:
For the sake of the example, assume the most recent 1000 time steps window.
for each new frame measure its SDR overlap with every other SDR in the window. What was surprising is that with sparse representation this on my old core happens 100M ON bits/sec, e.g. for 50 bit long SDRs, the 1000 frames window is parsed in 0.5ms
use a threshold to keep at 0 every irrelevant overlap. This could be a small multiplier (1.5-2x) of the average overlap.
now the 1000 points result looks more like a nice sparse SDR. Let’s call this “deja-vu sdr”
when we do this with the following frames/sdr, if they have the same repetition pattern (e.g. they encode a sine wave, or sawtooth with constant frequency) all following deja-vu vectors will overlap almost perfectly the first one.
if we keep adding the deja-vu-s the cycles become very obvious through high peaks at fixed places.
furthermore the deja-vu vector can be passed through an fft (also quite fast over 1000-10000 point vectors) - then if recent data contains distinct oscillations each with its own frequency, then the fft would point out exactly these frequencies.
Ok the above doesn’t sound very plausible neurologically, but if you look for a suspect for this kind of “signal processing” I would think of cerebellum. First for its geometric, repetitive regularity and second because it is obviously involved in perfecting motion, dealing with both cycles (e.g. walking) and time lapse coordination/synchronization.
Further hierarchical refining of the above idea could allow the “animal” to easily spot synchronous events spanning different modalities - e.g. linking the sound of wing flapping with the visual motion of a flying-by bird.
This kind of trick is used to encode relative position of input tokens in transformers. I don’t think they are random, just varying periods which leads to specific patterns for every position.
Perhaps the cortex is actually modeling the temporal behavior of the input signal in a phase space (see also dynamical systems). That is to say that instead of (or in addition to) encoding the absolute value, the network is also modeling the rate of change of the signal in time. If you can figure out a method for encoding the sampling rate as part of the input (and assuming that the sampling rate is not changing significantly from one sample to the next), then you should be able to project the sequence forward in time from its current position in phase space (value, rate) to a subsequent position (value, rate). This would be equivalent to path integration, but in the phase space rather than value space.