Best model parameters for sales prediction

Hi All, I have the following data and I think swarming gives me bad parameters:

timestamp,number_of_transactions
datetime,float
T,
2015-10-10,188
2015-10-11,272
2015-10-12,257
2015-10-13,239
2015-10-14,277
2015-10-15,227

Swarm gives the following:

{ 'aggregationInfo': { 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0},
  'model': 'CLA',
  'modelParams': { 'anomalyParams': { u'anomalyCacheRecords': None,
                                      u'autoDetectThreshold': None,
                                      u'autoDetectWaitRecords': None},
                   'clParams': { 'alpha': 0.03283512472589326,
                                 'regionName': 'SDRClassifierRegion',
                                 'steps': '1',
                                 'verbosity': 0},
                   'inferenceType': 'TemporalMultiStep',
                   'sensorParams': { 'encoders': { '_classifierInput': { 'classifierOnly': True,
                                                                         'clipInput': True,
                                                                         'fieldname': 'number_of_transactions',
                                                                         'n': 190,
                                                                         'name': '_classifierInput',
                                                                         'type': 'AdaptiveScalarEncoder',
                                                                         'w': 21},
                                                   u'number_of_transactions': { 'clipInput': True,
                                                                                'fieldname': 'number_of_transactions',
                                                                                'n': 45,
                                                                                'name': 'number_of_transactions',
                                                                                'type': 'AdaptiveScalarEncoder',
                                                                                'w': 21},
                                                   u'timestamp_dayOfWeek': None,
                                                   u'timestamp_timeOfDay': { 'fieldname': 'timestamp',
                                                                             'name': 'timestamp',
                                                                             'timeOfDay': ( 21,
                                                                                            9.153762368506069),
                                                                             'type': 'DateEncoder'},
                                                   u'timestamp_weekend': None},
                                     'sensorAutoReset': None,
                                     'verbosity': 0},
                   'spEnable': True,
                   'spParams': { 'columnCount': 2048,
                                 'globalInhibition': 1,
                                 'inputWidth': 0,
                                 'maxBoost': 1.0,
                                 'numActiveColumnsPerInhArea': 40,
                                 'potentialPct': 0.8,
                                 'seed': 1956,
                                 'spVerbosity': 0,
                                 'spatialImp': 'cpp',
                                 'synPermActiveInc': 0.05,
                                 'synPermConnected': 0.1,
                                 'synPermInactiveDec': 0.08050806656161408},
                   'tpEnable': True,
                   'tpParams': { 'activationThreshold': 13,
                                 'cellsPerColumn': 32,
                                 'columnCount': 2048,
                                 'globalDecay': 0.0,
                                 'initialPerm': 0.21,
                                 'inputWidth': 2048,
                                 'maxAge': 0,
                                 'maxSegmentsPerCell': 128,
                                 'maxSynapsesPerSegment': 32,
                                 'minThreshold': 10,
                                 'newSynapseCount': 20,
                                 'outputType': 'normal',
                                 'pamLength': 2,
                                 'permanenceDec': 0.1,
                                 'permanenceInc': 0.1,
                                 'seed': 1960,
                                 'temporalImp': 'cpp',
                                 'verbosity': 0},
                   'trainSPNetOnlyIfRequested': False},
  'predictAheadTime': None,
  'version': 1}

I feel like it doesn’t capture the seasonality of the sales date which spans across multiple years. Any advice on how to improve this model?

P.S. The weird thing it does it when I try to predict for the future dates outside of training dataset, it just repeats the values. Why does it do that?

First, since your data consists of counts (integers), change your data header to be:

timestamp,number_of_transactions
datetime,int

How many years of data are you sending into the swarm? If you don’t send multiple years, it won’t find seasonal patterns.

1 Like

I sent two years of data. I think the bigger issue is the way it handles unknown dates:

147.0 @ 2016-10-29 : 298.496725523
Error: 1.03058996954
452.0 @ 2016-10-30 : 439.4
Error: -0.33960901433
331.0 @ 2016-10-31 : 298.496725523
Error: 0.32749244713
309.617253 @ 2016-11-01:309.617253
309.617253 @ 2016-11-02:309.617253
309.617253 @ 2016-11-03:309.617253
309.617253 @ 2016-11-04:309.617253
309.617253 @ 2016-11-05:309.617253
309.617253 @ 2016-11-06:309.617253
309.617253 @ 2016-11-07:309.617253
309.617253 @ 2016-11-08:309.617253
309.617253 @ 2016-11-09:309.617253
309.617253 @ 2016-11-10:309.617253
309.617253 @ 2016-11-11:309.617253
309.617253 @ 2016-11-12:309.617253
309.617253 @ 2016-11-13:309.617253
309.617253 @ 2016-11-14:309.617253

Looks like its just copying them.

What am I looking at above?

I agree it does not make any sense that the swarm gave you a “time of day” encoder when you only have one data point a day. Something certainly is wrong, but it is hard to tell what without seeing your swarm description. How are you running the swarm?

1 Like

Dates in November are the ones I am trying to predict. I have sales data up to the end of October and I need to forecast next n days. What I am doing is I am feeding the model the date and the value it has predicted for this day to get the next value and this repeats n times.

My swarm description:

SWARM_DESCRIPTION = {
    "includedFields": [
        {
            "fieldName": "timestamp",
            "fieldType": "datetime"
        },
        {
            "fieldName": "number_of_transactions",
            "fieldType": "int",
        }
    ],
    "streamDef": {
        "info": "pay_view",
        "version": 1,
        "streams": [
            {
                "info": "Pay/View Matrix",
                "source": "file://stores/1.csv",
                "columns": [
                    "*"
                ]
            }
        ]
    },

    "inferenceType": "TemporalMultiStep",
    "inferenceArgs": {
        "predictionSteps": [
            1
        ],
        "predictedField": "number_of_transactions"
    },
    "iterationCount": -1,
    "swarmSize": "medium"
}

You mean you’re feeding the prediction into the model as if it were the next data point n times to predict several steps ahead? I don’t think that will work very well. Better if you used multistep predictions.

1 Like

hmm, like predicting n-steps ahead? why won’t it work if I feed the next prediction? my logic was that it should correlate the datetime pattern with possible values and produce reasonable result. how do people usually use this to forecast ahead?

The next prediction will be the next input for the most probable sequence the TM thinks it is seeing. This could be wrong. If you take that prediction and send it back in as input, the TM will reinforce its assumption about the sequence it is seeing even though it is not right. The next prediction will further reinforce that assumption, and so on.

Using the multi-step option is better because you are not honing on one sequence, but taking into account all the possible sequences that input might represent.

At least that’s how I think it works.