Towards a faster local inhibition

I started messing with local inhibition since I drank too much tea last night and can’t force myself to sleep.

Please inform me if I’m wrong. If I understand correctly, local inhibition is where for each column in the spatial pooler. You try to find if the given cell is in the top N% in a radius (so called local area). If it is then the cell fires, otherwise it is suppressed.

After some toying around in a 2D case (we have a sheet of 2D columns). I find some useful optimization to greatly speedup local inhibition. 20x on my computer.

  • Reuse rows already loaded
  • Instead of finding weather a cell is in the top N% by sorting
    • Compare it’s activation to all neighbor and count how many of them is larger
  • Perform the above computation as you load activation.
  • Details in the source code.

Here’s my proof of concept code. It is no means production ready nor I have checked the result is 100% correct. Please compile with the clang compiler with the flags -O3 -march=native. Otherwise performance suffers.

There are some optimization strategies I haven’t explored yet. Feel free to use the code and build a better one. (Consider it released under WTFPL)

And the result on my workstation.

[==========] Running 2 benchmarks.
[ RUN      ] local_inhib.local_inhib_base (5 runs, 5 iterations per run)
[     DONE ] local_inhib.local_inhib_base (10385.058472 ms)
[   RUNS   ]        Average time: 2077011.694 us (~2546.346 us)
                    Fastest time: 2073573.629 us (-3438.065 us / -0.166 %)
                    Slowest time: 2080586.494 us (+3574.800 us / +0.172 %)
                     Median time: 2076693.502 us (1st quartile: 2074947.782 us | 3rd quartile: 2077288.207 us)
                                  
             Average performance: 0.48146 runs/s
                Best performance: 0.48226 runs/s (+0.00080 runs/s / +0.16580 %)
               Worst performance: 0.48063 runs/s (-0.00083 runs/s / -0.17182 %)
              Median performance: 0.48153 runs/s (1st quartile: 0.48194 | 3rd quartile: 0.48140)
                                  
[ITERATIONS]        Average time: 415402.339 us (~509.269 us)
                    Fastest time: 414714.726 us (-687.613 us / -0.166 %)
                    Slowest time: 416117.299 us (+714.960 us / +0.172 %)
                     Median time: 415338.700 us (1st quartile: 414989.556 us | 3rd quartile: 415457.641 us)
                                  
             Average performance: 2.40730 iterations/s
                Best performance: 2.41130 iterations/s (+0.00399 iterations/s / +0.16580 %)
               Worst performance: 2.40317 iterations/s (-0.00414 iterations/s / -0.17182 %)
              Median performance: 2.40767 iterations/s (1st quartile: 2.40970 | 3rd quartile: 2.40698)
[ RUN      ] local_inhib.local_inhib_optim (10 runs, 10 iterations per run)
[     DONE ] local_inhib.local_inhib_optim (2224.816118 ms)
[   RUNS   ]        Average time: 222481.612 us (~244.980 us)
                    Fastest time: 222171.270 us (-310.342 us / -0.139 %)
                    Slowest time: 222918.120 us (+436.508 us / +0.196 %)
                     Median time: 222442.704 us (1st quartile: 222254.075 us | 3rd quartile: 222675.044 us)
                                  
             Average performance: 4.49475 runs/s
                Best performance: 4.50103 runs/s (+0.00628 runs/s / +0.13969 %)
               Worst performance: 4.48595 runs/s (-0.00880 runs/s / -0.19582 %)
              Median performance: 4.49554 runs/s (1st quartile: 4.49936 | 3rd quartile: 4.49085)
                                  
[ITERATIONS]        Average time: 22248.161 us (~24.498 us)
                    Fastest time: 22217.127 us (-31.034 us / -0.139 %)
                    Slowest time: 22291.812 us (+43.651 us / +0.196 %)
                     Median time: 22244.270 us (1st quartile: 22225.408 us | 3rd quartile: 22267.504 us)
                                  
             Average performance: 44.94753 iterations/s
                Best performance: 45.01032 iterations/s (+0.06279 iterations/s / +0.13969 %)
               Worst performance: 44.85952 iterations/s (-0.08801 iterations/s / -0.19582 %)
              Median performance: 44.95540 iterations/s (1st quartile: 44.99355 | 3rd quartile: 44.90849)
[==========] Ran 2 benchmarks.

5 Likes