HTM on EEG data

I’m working on a project applying HTM to EEG data for seizure prediction, which is actually detection of the stage just before the seizure (i.e. pre-seizure or not), so not next step prediction, but binary classification.

The EEG dataset consists of 16 electrode channel readings in 10 minute recordings.

My questions are:

  1. Can I split a 10 minute recording into epochs (say 30 second segments), extract features on these segments, keeping them sequential and use those features as input to HTM instead of the raw data? I was thinking this would involve using ‘reset’ at the beginning of a new 10 minute recording.

  2. If this is a valid approach, I’m assuming a rolling window mechanism on the epochs would confuse the TM learning?

  3. What is the effect of having more input fields? If I were to extract cross-channel features, making the number of input fields much larger, would that be a problem?

Thanks!

4 Likes

How much EEG leads do your signal have? If you have more than one, you need to use an encoder that can encode multiple dimensions of data at once. Or have to run the data through a SP to remove unwanted information.

  1. Yes, that is a valid approach.

  2. Yes it will. You are absolutely correct.

  3. TM can only predict a single variable at a time. Having multiple variables at once will confuse TM and make it less effective. You’ll want to do one of the following:

    • Encode the channels separately. Than run the combined encoding through a Spatial Pooler to remove redundant information.
    • Use a Grid Cell (up to 2D) or the experimental Hyper Grid Transform (up to 40D) encoder ro encode all channels simotaniously
    • Use a RBM, neural network or a algorithm to reduce the dimention
    • Or any combination above. They have different properties and have to try to know which is the best
2 Likes

Hi Martin,

Thank you so much for taking the time to respond to me!!

How much EEG leads do your signal have?

16!

Encode the channels separately. Than run the combined encoding through a Spatial Pooler to remove redundant information.

I believe this is what I have started doing (correct me if I’m wrong). I’ve started by using the opf experiments generator. I have an example feature (kurtosis) extracted from each of the different channels and have my encoders in my config file like this:

'encoders': {
	u'kurt0': 
		{'fieldname': u'kurt0',
		'n': 187,
		'name': u'kurt0',
		'maxval': 1206.88293211,
		'minval': -3.0,
		'type': 'ScalarEncoder',
		'w': 23},
	u'kurt1': 
		{'fieldname': u'kurt1',
		'n': 187,
		'name': u'kurt1',
		'maxval': 916.64827936,
		'minval': -3.0,
		'type': 'ScalarEncoder',
		'w': 23},
	u'kurt2': 
		{'fieldname': u'kurt2',
		'n': 187,
		'name': u'kurt2',
		'maxval': 1416.52370922,
		'minval': -3.0,
		'type': 'ScalarEncoder',
		'w': 23},
..........

And so on for the 16 channels.

So just to confirm my understanding, if I use cross channel features, the number of encoders will grow significantly, but I could potentially use PCA or another dimensionality reduction technique to help combat this. Right?

The process after those steps would be spatial pooling -> temporal pooling -> the SDR or KNN classifier to produce the binary classification result?

Use a Grid Cell (up to 2D) or the experimental Hyper Grid Transform (up to 40D) encoder ro encode all channels simotaniously

This is very exciting! Have you tried it out yet?

Yes, some kind of dimensionality reduction would be well advised if combining all the features into a single model. I’d also recommend trying an alternative to this, where you have n different models for your n dimensional data. You can get anomaly metrics at each time step from each model, and monitor for times of high simultaneous anomalies across models. So all time steps with > x anomalous models are classified as pre-seizure, and all other times aren’t.

So the main benefits as I see it are:

  1. You can scale to any number of dimensions, without concern for overloading the 1 model

  2. You can see which individual features are anomalous and which aren’t, rather than a single score on an aggregated model

1 Like

With the HTM algorithms being an online sequence learning system, any and all behavior in the training set eventually gets learned, and will therefore cease to show as anomalous. If that is the case, and if you intend to flag pre-seizure and seizure events as anomalous, then it must be necessary to train the network on normal scan data only. All anomalous behavior would have to be kept out of the training set and only introduced after learning has been disabled. Would this be an accurate assessment?

So, if my understanding is correct, the only other alternative for categorizing the normal/pre-seizure/seizure states would be to do some form of post processing (i.e. k-nearest neighbor, SVM, etc.) on the learned SDR’s. (I assume the HTMClassifier was developed for that purpose.) In that case, one has to wonder what is HTM doing for you that you couldn’t have gotten from some other form of regression analysis?

2 Likes

Yes thats right, any behavior you want to be seen as anomalous should be kept out of the training set – or at least not repeat in the training set. Good point!

1 Like

Hi Sam,

Thanks again for your advice!

I’d also recommend trying an alternative to this, where you have n different models for your n dimensional data

So you achieve this by spawning a process per feature in Python?

You can see which individual features are anomalous and which aren’t, rather than a single score on an aggregated model

Okay wow so this is feature selection in effect?

Hey @dee, certainly!

Right, each numeric column appearing in the data gets its own 1D NuPIC model. Since different fields can have very different statistical distributions, I customize the encoding parameters for each separately.

I see it more like feature individuation, since none of the features (/fields/columns/variables) are being removed. Each is given its own detector, so some features may show themselves more anomalous than others at different times.

For your purpose of detecting seizure onset, you could do feature selection by looking at the time periods soon before known seizures and see which features are most anomalous. It may be that some features are more telling of trouble than others, or maybe they all go off. I’d certainly be curious to know.

As @CollinsEM rightly pointed out, this approach relies on the idea that the models learn only on normal/non-seizure data – to ensure that any seizure activity would be unfamiliar to the models.

1 Like

Hi Eric,

Thanks for your reply!!

With the HTM algorithms being an online sequence learning system, any and all behavior in the training set eventually gets learned, and will therefore cease to show as anomalous. If that is the case, and if you intend to flag pre-seizure and seizure events as anomalous, then it must be necessary to train the network on normal scan data only. All anomalous behavior would have to be kept out of the training set and only introduced after learning has been disabled. Would this be an accurate assessment?

That makes sense yes, so I would need to set the inference config param to true after training on all ‘normal’ recordings.

So, if my understanding is correct, the only other alternative for categorizing the normal/pre-seizure/seizure states would be to do some form of post processing (i.e. k-nearest neighbor, SVM, etc.) on the learned SDR’s. (I assume the HTMClassifier was developed for that purpose.) In that case, one has to wonder what is HTM doing for you that you couldn’t have gotten from some other form of regression analysis?

That is definitely a valid point, this is a research project however, so I plan to implement and compare multiple approaches to this problem with regards to sensitivity/specificity and efficiency.

Right, each numeric column appearing in the data gets its own 1D NuPIC model. Since different fields can have very different statistical distributions, I customize the encoding parameters for each separately.

This is a really interesting approach. Have you seen any examples of this from Numenta/on the forum?

I customize the encoding parameters for each separately.

How do you decide what is a good ‘w’ for your ‘n’?

It may be that some features are more telling of trouble than others, or maybe they all go off. I’d certainly be curious to know.

From what I’ve read, it will be the features derived from the electrode signals local to the section of the brain where the seizure will occur. It will be different for each patient so I have to train on a per patient basis, with some patient’s preictal phases being more difficult to detect than others.

I’m not sure if anyone actively uses it (other than me), but it wasn’t my original idea. I remember it mentioned in some Numenta paper – I actually don’t remember which, maybe NAB. I can’t help but suspect that @subutai and @Scott were involved maybe among others. I just implemented it and added the automated encoder setting.

So I use RDSE encoders with fixed ‘n’ and ‘w’ (400 and 21 as default in the RDSE code). What I do decide is the RDSE ‘resolution’, by finding min/max values and dividing (max-min) by ‘numBuckets’ (=140). I use percentiles for min/max, for example ‘min’ may be the 5th percentile of a feature’s values and ‘max’ may be the 95th percentile. This is meant to avoid influence from any extreme outliers.

Also

I think what you want to do is model.disableLearning() once you’re done learning. The key is to avoid learning from the seizure patterns, so I’m confident inference can be on the whole time.

1 Like

Thanks @sheiser1, I’ll have a search for that paper

1 Like

Hi @marty1885,

Would you mind explaining to me why a rolling window would not work?

1 Like

It will work. But it will be less effective comparing not having a rolling window. See, HTM is a online sequence learning algorithm and it will attempt to predict all possible futures if it is uncertain about it. Given a sequence of A, B, C, B, A, … (the same applies to continuous signals, but it’s easier to describe for discrete cases). Asking HTM what comes after B without context will give you the result A and C. But give HTM the knowledge of ‘A comes before the current B’ it will respond C confidently.

The issue of HTM not being confident can come in 3 situations.

  1. The network is not deep (number of cells per column) enough to remember the sequence
  2. No prior context is given (asking HTM to perform the first prediction)
  3. No context is available while training (the value is the first value of the training sequence)

Using a sliding window effectively causes 3. The network has no way to be sure what comes after the value, so it predicts all possible futures. And thus becomes less effective.

You might want to watch the TM videos in HTM School

1 Like

Thank you @marty1885! that all makes more sense to me. If you could answer one more question… how does temporal memory handle Nan values? If we have pattern A, B, C, D and on another occasion A, B, Nan, D, is that learned? I imagine that it is a bad idea to remove Nans as it will be skipping part of a sequence? e.g. A, B, D.

1 Like

Then it depends on how you encoded Nan. Most encoders only encodes real values so a Nan would likely crash the program or at least raise an error. But two common solutions:

  1. Encode Nan as nothing

If you setup the encoder such that when Nan is encountered, it generates SDR of all 0s. This effectively resets the TM. The TM will forget all context while also generating no predictions. And the next prediction will be the TM doing it’s best without context. Your example of A, B, Nan, D will yield the prediction of B, C, NULL

  1. Give Nan a special encoding

If you make it that Nan is it’s own encoding. It effectively turns Nan into it’s own symbol. Just like A, B, C and D. In such case, A, B, Nan, D predicts B, (C or Nan or both), D. Depending on how much Nans exist in your data.

Yes, I recommend against removing the Nans.

2 Likes