A hotgym example shows how to encode date + scalar values. I have a time serie s where there are ordinal numbers instead of dates. How can I to encode that? To use scalar encoders twice? Can anyone help?
The DateEncoder used in the hotgym example is actually just a set of non-overlapping scalar encoders which encode parts of the date-time. Their SDR’s are concatinated together into a single SDR. It would be safe to replace the DateEncoder with a single scalar encoder if you have an ordinal value to represent an evenly spaced duration of time. You could use either the ScalarEncoder object or the RDSE object. You would only need one.
Doing this will not give exactly the same results because it will be more difficult to identify a “weekend” or the end of a day. But…try it out and see what you get.
Do these ordinal number repeat or just increase monotonically? If they don’t repeat I’d just drop the column and use raw values only
Thank you for quick response.
Still using hotgym as an example - having two columns (date & values) we have such concatenation:
encoding = SDR( encodingWidth ).concatenate([consumptionBits, dateBits])
But having the first column as the values increasing +=1 and after dropping it out, will this code line look like that:
encoding = consumptionBits
Any field (‘feature’ in common ML speak) that doesn’t have periodicity over time is basically just adding noise to the system.
In the hotgym data set the date column does have periodicity tho, through days of the week and hours of the day. I think this helps the system pick up on the pattern of demand shift from weekday to weekend.
The system will naturally learn shorter term patterns faster because they become apparent faster (in hotgym it’s the daily pattern that plays outs over hours). The weekday to weekend shift pattern tho plays out over days and repeats weekly, so the periodicity takes much longer to show itself.
Before using this algorithm with my data I tried to do some experiments with hotgym example.
- Hotgym without any changes shows the anomaly score on the chart as repeatable peaks (ca. 25 pcs) and very low values inbetween - mostly lower than 0.05.
- After replace the time with ordinal numbers - two columns as oridinal numbers and consumption values I got the anomaly score very low everywhere (after first ca. 400 values as learning (?)) - it looks like very low “comb” in range 0 - 0.1, placed under the red input values.
- After removing the first column, with one column only, containing consumption values I have got everywhere irregular but very high values of anomaly score in the range of 0.6 - 1.
How to explain such big differences between 2nd and 3rd cases?
BTW - is there a possibility to show that here?
It must be because the encodings are very different in the 2 cases. In case 2 there’s a 2nd column of just ordinal values, and in case 3 there’s only 1 column of raw values. In case 1 (‘Hotgym without any changes’), the 2nd column is timestamp, which is perfectly periodic – adding to the predictability of the total encoding and causing drop in anomaly scores relative to case 3 (where timestamp column is not included)