Degrading impact of unhelpful variables


#1

Hypothetical scenario - two variables, one variable is exactly correlated with the target value and the other is completely random/uncorrelated values.

I use RDSE to encode both variables, so 2 x 400 bit (dimension) encodings each with 21 active_width (as per RDSE HTM school demo). They are concatenated to form the 800-bit input encoding

Is there a way for the Temporal Memory to “ignore” the unhelpful part of the encoding (the random variable)? Since, with global inhibition, the Spatial Pooler’s range is distributed across the entire input space, its cells are correlated with the activity of both the unhelpful half of the input space and the helpful half of the input space. Thus, the Temporal Memory cannot learn on the SP cells because the random variable “gets in the way” of the true correlated variable, constantly “mutating” their SP representation

Is there a way to automatically degrade the impact of this unhelpful half of the input space, or otherwise have the SP learn to ignore it? The idea being that the predictive performance converges to what it would be if only the exactly-correlated variable was in the input space


#2

Your scenario is a Type-2 noise problem, i.e. noise as a feature.

I am not sure if TM can do anything about it - but from an ML point of view, that’s a feature selection/elimination problem. It boils down to brute force… remove a feature and see if it improves performance. If you have many features, i.e. not just the hypothetical two of your scenario, you can look into methods like forward selection, backward elimination, etc. That’s pretty standard ML stuff and it’s tedious.

Noise is an interesting problem and it’s difficult to deal with. I have a Type-3 noise problem: lots of bad samples/records, with the occasional nugget showing up. It’s an extreme noise environment, and I am looking for sequences that have a meaning (i.e. where the forecast or label is meaningful) within the massive haystack of meaningless sequences.

I am not sure if HTM can handle that. In theory it might, because if you find enough needles (in the haystack), and the non-needles are all random sequences, then SP learning should still work (though probably very slow). So far I have just looked into that on paper though… nothing yet programmed.


#3

HTM is very tolerant to noise. You’ll need to adjust the config to balance the learning rate with stability. Lowering the learning rate will increase stability (since it won’t be forming connections with the random bits in a single shot), and increasing the permanence decrement rate will help the system forget wrong connections. You have to balance that with how quickly you want the system to learn (since you are configuring your system the opposite of what you would do to optimize for one-shot learning)

–EDIT–
I should point out that I am referring to random noise. If the “unhelpful variable” is not actually random, and repeats frequently or has temporal patterns of its own (which do not correlate with the “helpful” one and are not interesting), then the above strategy will not solve the problem.


#4

@Paul_Lamb

That’s what I was thinking. It needs tinkering, depending on the data stream. I also wonder what is more important: that the input is random, or the forecast/label is random.

If the input is random, but the forecast is biased, e.g. mainly positive values or something like that, then HTM might still learn random patterns (just a lot of them) - but the forecast is essentially meaningless. If the forecast/label is random, then even if input patterns repeat, the system should not learn them because there are no meaningful outputs… or am I wrong here?


#5

@sruefer if I understand your question, sounds like it is related to classification. I’ve not done any classification problems with HTM myself, so I’m not sure the answer. I use HTM for predicting the next input and anomaly detection, in which case randomness in the inputs leads to randomness in the predictions (and likewise repeating patterns in the input lead to patterns in the predictions).


#6

@Paul_Lamb

It’s related to classification, but I also just want to predict the next input. My (made up) data is something like that:

  • Sequences of 3 letters, say ABC, TGH, SGE, etc. Randomly selected from the alphabet.
  • I want to predict what letter comes next.
  • All 3-letter sequences are meaningless, EXCEPT sequences that actually exist in the alphabet, i.e. ABC, BCD, etc.
  • After each “meaningful sequence”, there is a 80% chance that the letter that comes next is correct. I.e. If the input is “ABC”, then there is a 80% probability that the next letter is “D”. The other 20% are random noise.

That means that most inputs are meaningless, with very few “needles” in the haystack.

The question is, can HTM find them?

The idea is that the meaningful sequences are in reality unknown. So to find them, the 80/20 probability of the next input could be used… that’s really the only indicator available. Given sufficient data, all random inputs should be discarded (i.e. not learned), while the few meaningful ones should be reinforced over time.

I hope my explanation makes sense. It’s a weird problem :slight_smile:


#7

Ah, I understand… this is a somewhat different scenario (the first scenario was a combination of two fields as part of every input, where one field was meaningful and the other meaningless). The simplest case for the second scenario would be every input has one field, with errors randomly distributed in the sequence (each error input could either be replacing a good input or inserted between two good inputs - behavior will be the same for both cases).

In this case, the minicolumns will burst in the timestep when an error input occurs, and then again the following timestep. After this, the system will have recovered, and predictions will be correct again. So in the case of errors happening 20% of the time (i.e. on average 1 out of every 5 inputs), If the system is properly configured (balancing learning rate with stability as mentioned earlier), I’d expect to see on average 3 correctly predicted timesteps to every 2 incorrectly predicted timesteps. The errors (if truely random) should be forgotten based on the permanence decrement setting.