Hi,I am trying to use algorithm api to detect anomaly on nyc_taxi.csv. But my result is poor.So i try to repeat the NAB result. I use the same parameter. The sp’s output is same to NAB’s. However BacktrackingTMCPP is not. I don’t know the reason.
from nupic.algorithms.spatial_pooler import SpatialPooler as SP
from nupic.algorithms.backtracking_tm_cpp import BacktrackingTMCPP as TM
from nupic.encoders.random_distributed_scalar import RandomDistributedScalarEncoder
from nupic.encoders.date import DateEncoder
from nupic.algorithms import anomaly
from datetime import datetime
import numpy as np
from tqdm import tqdm
from matplotlib import pyplot as plt
DATA_URL = "https://github.com/numenta/NAB/tree/master/data/realKnownCause"
PATH = "C:/Users/mi/PycharmProjects/nupic-master/nab/"
DATA = "nyc_taxi.csv"
value = np.loadtxt(PATH+DATA,usecols=(1),delimiter=',',skiprows=1,dtype=np.int32)
time = np.loadtxt(PATH+DATA,usecols=(0),delimiter=',',skiprows=1,dtype=np.str)
sp = SP(inputDimensions=(454, 1),
columnDimensions=(2048, 1),
potentialRadius=2048,
potentialPct=0.8,
globalInhibition=True,
localAreaDensity=-1,
numActiveColumnsPerInhArea=40,
synPermInactiveDec=0.0005,
synPermActiveInc=0.003,
synPermConnected=0.2,
dutyCyclePeriod=1000,
boostStrength=0.0,
seed=1956)
tm = TM( numberOfCols=2048,
cellsPerColumn=32,
initialPerm=0.21,
connectedPerm=0.50,
minThreshold=10,
newSynapseCount=20,
globalDecay=0.0,
activationThreshold=13,
doPooling=False,
segUpdateValidDuration=5,
burnIn=2,
collectStats=False,
seed=1960,
verbosity=0,
checkSynapseConsistency=False,
pamLength=3,
maxInfBacktrack=10,
maxLrnBacktrack=5,
maxAge=0,
maxSeqLength=32,
maxSegmentsPerCell=128,
maxSynapsesPerSegment=32,
outputType='normal',
)
dateEncoder = DateEncoder(timeOfDay=(21, 9.49))
randomEncoder = RandomDistributedScalarEncoder(resolution=422.03538461538466,seed=42)
def encode(date,value):
t = datetime.strptime(date[2:-3], "%y-%m-%d %H:%M")
dateSdr = dateEncoder.encode(t)
valueSdr = randomEncoder.encode(value)
sdr = np.concatenate((dateSdr,valueSdr))
return sdr
prdictiveColumns = np.zeros(2048)
x = np.arange(0,10320)
y = []
for t, v in tqdm(zip(time, value)):
sdr = encode(t, v)
column = np.zeros(2048)
sp.compute(sdr, True, column)
activateColumns = np.nonzero(column)[0]
activateColumns = activateColumns.astype(np.int32)
print 'activateColumns:',activateColumns
prdictiveColumnsSdr = tm.topDownCompute().copy()
prdictiveColumns = prdictiveColumnsSdr.nonzero()[0]
print 'prdictivaColumns:', prdictiveColumns
tm.compute(activateColumns, True,True)
score = anomaly.computeRawAnomalyScore(activateColumns, prdictiveColumns)
print score
y.append(score)
y = np.array(y)
plt.plot(x,y)
plt.show()
I carefully check the NAB code and tm_region.py, and do the same thing.
BacktrackingTMCPP use the same parameter and got the same input, but output is strange.
prdictivaColumns: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39 40 42 43 44 46
48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
78 80 82 84 85 86 87 88 89 90 91 92 93 94 95
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
126 127 128 130 131 132 133 134 135 136 137 138 139 140 141
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
172 174 175 176 177 178 179 180 181 182 183 184 185 186 187
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
203 204 205 206 207 208 209 210 211 212 213 214 215 216 218
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278
279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
294 295 296 297 298 299 300 301 302 303 304 306 307 308 309
310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339
340 341 342 343 344 345 346 347 348 350 351 352 353 354 355
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370
371 372 373 374 375 376 377 378 379 380 381 382 383 384 385
386 387 388 389 390 391 392 393 394 395 396 398 400 402 404
406 408 410 412 414 416 418 420 422 424 426 428 430 432 434
436 438 439 440 441 442 443 444 445 446 447 448 449 450 451
452 453 454 455 456 457 458 459 460 461 462 463 464 465 466
467 468 469 470 471 472 473 474 475 476 477 478 479 480 482
483 484 486 488 490 492 494 496 498 500 502 504 506 508 510
512 514 516 518 520 522 524 526 527 528 529 530 531 532 533
534 535 536 537 538 539 540 541 542 543 544 545 546 547 548
549 550 551 552 553 554 555 556 557 558 559 560 561 562 563
564 565 566 567 568 570 574 575 576 578 580 582 586 587 588
590 592 594 610 611 612 614 1170 1171 1172 1174 1176 1178 1180 1182
1184 1188 1190 1192 1193 1194 1195 1198 1200 1201 1202 1204 1206 1208 1210
1212 1214 1216 1217 1218 1220 1228 1230 1232 1246 1248 1250 1266 1268 1269
1270 1272 1274 1276 1278 1280 1282 1284 1285 1286 1288 1302 1303 1304 1306
1314 1316 1318 1334 1336 1337 1338 1340 1342 1344 1346 1348 1350 1352 1353
1354 1356 1364 1366 1368 1382 1384 1386 1402 1404 1405 1406 1408 1410 1412
1414 1416 1418 1420 1421 1422 1424 1432 1434 1436 1450 1452 1454 1470 1472
1473 1474 1476 1478 1480 1482 1484 1486 1488 1489 1490 1492 1500 1502 1504
1518 1520 1522 1538 1540 1541 1542 1544 1546 1548 1550 1552 1554 1556 1557
1558 1560 1568 1570 1572 1586 1588 1590 1606 1608 1609 1610 1612 1614 1616
1618 1620 1622 1624 1625 1626 1628 1636 1638 1640 1654 1656 1658 1674 1676
1677 1678 1680 1682 1684 1686 1688 1690 1692 1693 1694 1696 1704 1706 1708
1722 1724 1726 1742 1744 1745 1746 1748 1750 1752 1754 1756 1758 1760 1761
1762 1764 1772 1774 1776 1790 1792 1794 1810 1812 1813 1814 1816 1818 1820
1822 1824 1826 1828 1829 1830 1832 1840 1842 1844 1858 1860 1862 1878 1880
1881 1882 1884 1886 1888 1890 1892 1894 1896 1897 1898 1900 1908 1910 1912
1926 1928 1930 1946 1948 1949 1950 1952 1954 1956 1958 1960 1962 1964 1965
1966 1968 1976 1978 1980 1994 1996 1998 2014 2015 2016 2017 2018 2020 2022
2024 2026 2028 2030 2032 2033 2034 2036 2044 2046]
And in NAB, it appears like this:
prevPredictedColumns: [ 52 628 714 719 788 820 1196 1495]
It must be wrong somewhere. So can someone tell me what’s wrong with me?