Releasing BrainBlocks 0.6: Building ML Applications with HTM-Like Algorithms

We are pleased to announce the public release of our software to the HTM community. Feel free to explore the README and the example code. We will be following up with more detailed analysis, discussion, and documentation of the more interesting and impactful features in the coming weeks. Below is a snippet of the README.

Introduction

BrainBlocks is a framework for building applications using principles from the theory of Hierarchical Temporal Memory. BrainBlocks focuses on solving general machine learning problems with a focus on accuracy and scalability.

Developed internally at The Aerospace Corporation since 2018, the current version is the 3rd complete rewrite. The design of BrainBlocks represents the practical experience gained from solving machine learning problems using an HTM-like approach.

BrainBlocks is a Python 3 library that runs on a single CPU core with a C-backend. It is fast and has a low memory footprint due to novel algorithm design .

Motivation

Although, the existing implementations of HTM algorithms retain neuroplausibility, they are not aimed at solving machine learning problems effectively. Furthermore, there are many open questions about how to solve many ML problems with HTM-like systems.

We built BrainBlocks with the following goals in mind:

  • be able to solve practical ML applications
  • be highly scalable
  • use it to gain a computer science understanding of HTM divorced from the neuroscience

Gallery

Online Learning of Multivariate Time-Series Abnormalities

Multivariate Abnormalities

Comparison of BrainBlocks and Scikit-Learn Classifiers

Granulated Representation of Space with Hypergrids

Hypergrids

Research Contributions

The work on BrainBlocks has produced a number of useful contributions that are of interest to the HTM community.

Can Now Solve ML Applications

BrainBlocks uses HTM methods to solve practical ML applications. In particular, it solves the following problems:

  • Multivariate abnormality detection on time-series
  • Classification task with a distributed supervising learning module

Dramatically Increases Scalability

The BrainBlocks implementation lean and uses novel algorithm design to achieve efficient computation.
This allows the creation of much larger networks of components that can be run in a real-time setting.

  • Reduced memory footprint by using sub-byte representation of states
  • Novel algorithm design is the source of most gains in scalability
  • Algorithms implemented in C and running on single core CPU

Learns Fast

The modules of BrainBlocks that perform learning are much faster and more stable than the baseline HTM implementations.
In most cases, BrainBlocks will learn a pattern with one training sample and will resist catastrophic forgetting when learning new patterns.
This is achieved by treating the “synaptic connections” of HTM as a scarce resource and being deliberate about when and where they are created.

  • Achieves one-shot learning
  • Resistant to catastrophic forgetting

Can Encode N-Dimensional Data

A major challenge of using HTM for ML applications was how to encode data with more than a couple dimensions and do it effectively.
By taking the “grid cell” concept and generalizing it into N dimensions, we can now effectively encode feature vectors with arbitrary dimensionality, using a new algorithm called the Hypergrid Transform (HGT).
The curse of dimensionality still exists for HGT encodings, but effective ML results have been seen with N<=40.

  • Encode N-dimensional feature vectors to distributed binary patterns
  • Generalization of grid cell concept with Hypergrid Transform

Theory of Distributed Binary Patterns

A major foundation of HTM theory is the Sparse Distributed Representation (SDR), the underlying high dimensional binary vectors.
However, a great deal of HTM processing entails binary vectors that violate the properties of being “distributed” and being “sparse”.
Most forms of data encoding fall into this category.
We then must ask, what are the properties of the binary patterns created by different encodings? How do we know which encodings to choose?
To answer these questions, we have been developing a new theory of Distributed Binary Patterns (DBP), where SDRs are a subset.

  • Define DBPs as superset of SDRs, without distributed and sparsity requirements
  • Achieve better understanding of how DBPs work
  • Understand how information in a DBP is processed

Identified Unfair Advantages of an HTM-like Approach

A major question that gets asked about HTM is, why should I use it at all?
Why not used Deep Learning?
This question is perfectly valid and there was no good response.
After overcoming the above issues to build a tool that can actually solve ML problems, we feel we can start to answer.

Using the BrainBlocks tool and leveraging the design decisions it has made, we’ve identified the following advantages:

  • Can learn on small data sets
    • Achieves one-shot learning
  • Robust to data outside training set
    • Out-of-family detection is built-in
  • ML decisions are explainable by applying classifier to internal state
    • Compared to inscrutable Deep Learning networks

Differences from Vanilla HTM

For those familiar with the individual modules of most HTM implementations, we provide a breakdown of the differences with the BrainBlocks approach, as well as the different terms we use.

Encoders

BrainBlocks has three different encoders to convert data to DBP representation.
The Scalar Encoder and the Symbol Encoder are the main workhorses of vanilla HTM and are provided by BrainBlocks virtually the same.

BrainBlocks also provides the HyperGrid Transform (HGT), which generalizes the grid cell encoding approach for arbitrary dimensionality.
This is the primary vehicle for encoding data in BrainBlocks.

Pooler

BrainBlocks’ version of the HTM Spatial Pooler (SP) is called the Pattern Pooler (PP).
The main difference between PP and SP are:

  • PP does not use boosting but instead uses a learning percentage parameter to create a randomized bitmask when updating receptor permanences

Sequence Learner

BrainBlocks’ version of the HTM Temporal Memory ™ is called the Pattern Sequence Learning (PSL).
The main differences between PSL and TM are:

  • PSL starts with no distal synaptic connections, where TM starts randomly initialized
  • PSL creates new synaptic connections, when predictions fail
  • PSL treats a column as a reservoir for representing hidden states
  • PSL does not use bursting. Instead, hypothesizes new transitions between existing and novel hidden states in deliberate manner.
  • PSL does not require pooling. An encoder can be connected directly to the input of PSL
  • Using PSL for abnormality detection does not require the use of “anomaly likelihood”. Direct output performs well.

Classifier

BrainBlocks has a new component that vanilla HTM implementations do not have called the Pattern Classifier (PC).
This adds a supervised learning capability to the HTM ecosystem.
The PC is very similar to the PP with the following differences:

  • PC associates individual neurons to class labels
  • Modulatory label input controls reward/punishment step when neurons activate
  • Predicted class can be computed by counting the class membership of each active neuron

About Us

The Aerospace Corporation

This projected was developed internally at The Aerospace Corporation by:

License

This project is licensed under AGPLv3.

© The Aerospace Corporation 2020

16 Likes

Just wow… I have to read the code after my exams. This looks super amazing. :heart:

3 Likes

If you have any specific questions I wrote most of the C backend code and would be happy to help.

5 Likes

TM distal synapses are randomly initialized? I always thought it had no initial distal synapses. It seems like there’s no point to randomly initialize it to me.

2 Likes

I’m also thinking about it too. I can confirm HTM.core, NuPIC and Etaler all have empty distal synapse initially.

2 Likes

Wow.

I don’t get the scalability paragraph, on one hand you emphasize its scalability features on the other you say the algorithm implementation is single cpu.

Maybe we got that wrong. It’s been a while since we’ve looked at the NUPIC source code. We’ll change that claim if that’s true.

1 Like

That’s fair. We started off using GPUs and we were fast. However, the data transfer delay over the bus was limiting our maximum processing rate. After several years of optimization and rewriting the code without the GPU, we found the plain CPU implementation to be faster.

We’re not saying that doing GPU or multithreading wont improve the speed even more, so long as its done properly. It’s just that, our algorithms are unreasonably fast as they are. It’s probably helpful to understand why they are so fast before trying to parallelize them again.

3 Likes

It still surprises me how fast CPUs are when you treat them right. Great job, and a lot of clever thinking that went into this project.

Might be worth it to sit as a team there and make a list of all the assumptions and streamlines that went into the library, or even what values or rules of thumb people made while writing the code. I know that I’d enjoy reading about it, and the journey of discovery or fixes along the way.

Spent a few hours a couple days ago writing a tiny spatial pooler system on a microcontroller to attempt anomaly detection on a few streaming data sources… it wasn’t half bad, and I found that the ultimate limit was memory, rather than processing bottleneck (at 16mHz with 2kb of sram). Stored distal connection maps as uint32’s with various functions for bitflipping. By preallocating all the data, didn’t have to worry about unexpected crashing or other system issues.

1 Like

:slight_smile: Looking at the readme in the scalability experiment folder, if one step needs to pass through all the memory, e.g. last row: 2.5Gbyte memory, 0.15 sec. processing… that’s 16Gbyte/sec. Second table is even more impressive at 25Gbytes/sec. For reference simple memory benchmarks for that cpu measure raw memory speed (e.g memory copy, fill) between 10 and 20 GBytes/sec.

I don’t know what that means, it could be your algorythm doesn’t need to pass through all allocated memory or the bottleneck is indeed the memory bandwidth.

I believe the correct way to read that is that the model and data structures take up 2.5GB of memory. That’s things like neuron activations, permanences, and connectivity information. That memory is for 256 detectors, which is 256 scalar encoders, 256 pattern poolers, and 256 sequence learners. The amount of data being processed is 256 floats per step, which would be 1702 floats per sec, or 6.5kbytes per second.

Believe it or not, it’s really hard to keep that 2.5GB memory footprint down to process that much information. Without some simplifications and assumptions, it would be impossible to run 256 parallel abnormality detectors in a single process, let alone a single machine.

1 Like

It surprises me too. There’s definitely room to push the limits of just a simple single threaded CPU even further that might reap benefits in multithreaded implementations in the future. Truthfully, I don’t understand how to exploit the hardware using function profilers, the tricks of optimizing the underlying assembly, and exploiting memory caches. It’s truly wizardry.

This would be a worthy thing to do. Our algorithms development was extremely dynamic and lots of changes were made along the way. The biggest contributions are the functions in bitarray.h/.c that handle bitwise operations and use magic numbers.

This is also my experience with the HTM Spatial Pooler, BB Pattern Pooler, and BB Pattern Classifier. Inference and learning is fast with bitarrays. The slowest and most memory-hungry block is the HTM Temporal Memory / BB Sequence Learner. It’s also the most difficult to understand and implement.

1 Like

@jacobeverist @ddigiorg thanks for sharing your excellent work. I have just looked at the code and especially your test using C++.
Could you please inform me how to get the predicted value of scalar data? Thanks

I believe we don’t have a prediction example yet. However, our code would support it, but would require some modification. We would essentially need to turn the sckit-learn style BBClassifier class into a BBRegressor class to get a continuous value. It would then regress the full predictive state of the Sequence Learner to extract the predicted value.

Of course, you can do this with any regression algorithm from scikit-learn to get the same result, but the quality of the predictions would not be as good, but probably serviceable.

I believe we also support keeping a history of activation states so that you can make multi-step predictions, or predict based on a window of data.

@jacobeverist ok, thanks!
Do you have any runnable c++ example for any Classification task, e.g. MNIST classification?

Unfortunately we don’t have a c++ example for classification, but we do have a python version for running on MNIST:

mnist_binarized.py

We have a c++ test that does abnormality detection (brainblocks/tests/cpp/test.cpp) that you can modify if you need a c++ implementation specifically:

c++/test.cpp

You can also use the C PatternClassifier test for reference:

c/test_pattern_classifier.h

edit:

For reference, here are the default pattern classifier parameters found in blocks.py. We replace “neurons” with “statelets” in our convention.

labels=(0,1),       # user-defined labels
num_s=512,          # number of statelets
num_as=8,           # number of active statelets
perm_thr=20,        # receptor permanence threshold
perm_inc=2,         # receptor permanence increment
perm_dec=1,         # receptor permanence decrement
pct_pool=0.8,       # pooling percentage
pct_conn=0.5,       # initially connected percentage
pct_learn=0.25,     # learn percentage
random_state=None): # random state integer

So the easiest place to get started for classification is to use the python example which uses all of our Python helper code:

Also, here is the MNIST example we have:

Sadly, we have not yet made any C and C++ example code although you can see how to build C/C++ code in the tests folder:

You can modify this file to get the C++ example to work, but you will not be able to use a lot of the helper libraries that are only available in python.

This python file shows the bare bones creation of a network to create the classification task:

You would recreate the same network using C++ and has virtually the same class names.

@jacobeverist thanks, let me have more time for running it!
What is about your classification results with MNIST?
In HTM.core we reach now approx. 97%!

1 Like

The best we got was ~90% on MNIST using our simple “spatial pooler -like” classifier. In our implementation each neuron is assigned a label. When a bitarray is inputted into the Pattern Classifier it computes the overlap score of each neuron’s dendritic segment with the input bitarray. It then chooses the k-highest winners and bins the winner neurons labels into a probability array. Learning rewards neurons with correct labels and punishes neurons with incorrect labels.

97% is impressive! Is that on binarized image data? What classifier is HTM.core using?

@jacobeverist it uses only SP and SDR classifier in HTM.core of HTM-Community!