It seems to be that you do not hear anything about our activity.
Here is this links
This is just awesome, congrats @jacobeverist. Can’t wait to dive in the code.
Hebbian learning makes new synapses between active cells, which in turn causes those cells to activate more often. It’s a positive feed-back loop. I don’t see how this mechanism which you’ve described prevents the feedback loop from taking over.
You can detect this feedback loop by measuring the duty cycle aka activation frequency of each cell. Then do some statistics to determine if there are cells which are over-active or under-active. In my analysis I typically either plot a histogram, or calculate the binary-entropy as a fraction of the maximum possible entropy of the system.
So in our Pattern Pooler, we are not considering duty cycle to find over-active and under-active neurons. In fact, we are not interested in these properties at all. Hence, why we are not doing boosting.
To be honest, we could never get boosting to work and still be stable. The active neurons would thrash back and forth as the boosting would compensate for duty cycle. Calibrating the hyperparameters for boosting turned out to be a huge headache and the optimal parameters were always very specific to the particular network being studied. Thus, we did away with it completely.
In its place, we now allow neurons to be over-active and under-active as they please. Common patterns create a positive feedback loop and their mapping becomes “locked in”. The learning percentage parameter just ensures that this process doesn’t happen too quickly and allows the pattern mapping to stabilize.
Neurons that don’t have strong mappings will be quickly allocated when new patterns emerge. These new patterns will not affect the previously learned patterns represented by the “locked in” neurons.
This approach not only makes the pooled representations stable, but it also allows you to continue learning new patterns without affecting the old ones.
To be honest, we still need to do some research and experiments on the actual value of doing pooling at all. We’ve found that doing sequence learning directly on encoded inputs is often sufficient for applications. We need more empirical studies of how pooling affects the representation and what the benefits are.
Furthermore, our approach to pooling is very different from HTM spatial pooling which is designed to recover from lesions and maximizes the spatial distribution of a representation. We don’t really care about these things with BrainBlocks.
The feature we do care about is being able to build a representation for a new pattern. The WTA algorithm selects that best fitting neurons and reinforces them. The learning rate ensures that this new pattern wont steal all the neurons from a very similar pattern, and ensures that the other patterns will be able to “fight back” to ensure that their pooled output is sufficiently dissimilar.
I know of an alternative method which is far easier to use than Numenta’s method. For reference this is Numenta’s boosting function:
boost_factor = e ^ (boost_strength * (target_sparsity - activation_frequency))
This is a better function:
boost_factor = log(activation_frequency) / log(target_sparsity)
This function has a zero-crossing at cell activation frequency of 100% and an asymptote to infinity at activation frequency of 0%. These properties give it stronger theoretical guarantees than the exponential boosting function. It also has fewer parameters, which makes it easier to use. The only parameter is the period of the exponential moving average which tracks the
Thanks. Where did this equation come from?
We’d have to go back and review the things we tried.
The only drawback of using this for us is that we would have to put the bookkeeping back in to track the duty cycle and store and update that data. Unfortunately, that increases the memory footprint and update time of our Pattern Pooler since we need to maintain a float for each neuron. We could probably reduce it by using an uint8 instead of a float which is what we do for our permanences.
Of course, if we do this, this essentially becomes a Spatial Pooler.
One thing that I forgot to mention is that we don’t consider topology at all. There is no spatial relationship between input bits, and thus, no receptive field for neurons. This simplifies the computation, and it also helps to analyze things theoretically.
Finally my exam ends.
This feature is super interesting. I’ve tried to make a generalized grid cell encoder before. But I didn’t find anything useful after toying with linear algebra. How does it work? And how does it differ from standard grid cell?
Also, I’m curious about some design choices. It seems most of the HTM related code is in C. But others (ex: HGT, BBClassifer) are in Python. Is there a design reason behind this separation? Or it’s mostly because C is faster but more complex?
I came up with it, and also this article has a very similar equation: doi:10.3389/fnsyn.2014.00008
@jacobeverist I am interested too on the differences between HGT and standard GC!
Could you please explain me?
The HGT is in Python because it is much less mature than the other components. There was a lot of numpy operations which was best left to python until we got a good understanding to be able to optimize it for C/C++.
Basically I generalized the concept of a “grid” as finding a set of subspace basis vectors, mapping a point in N-space to the subspace, applying a modulo along each subspace axis (grid period), and then binning the point into k intervals (bin frequency).
The subspace can be any dimension, but it is usually 1 or 2 dimensions. I haven’t experimented too much with higher dimensions, but have found that having many overlapping hypergrids at the same time represents high dimensions well, so long as the their subspace basis vectors, grid periods, and bin frequencies are sufficiently random. How to randomize and how they should be distributed is still something I’m studying and there are definite answers here.
I’m working on a paper for this which should help explain things and give a more mathematical description.
I will follow up here with some plots to illustrate some HGT concepts.
I have some experience dealing with embedded sub-spaces. I’d be happy to help out if you need someone to bounce ideas off of or to discuss different ways of expressing your algorithms mathematically.
Is there a paper or reference about Distributed Binary Patterns?
@bela.berde working on it right now
Have you had a chance to review the code? Do you have any questions? I’m still working on the documentation, so I will follow-up when it’s ready.
I’d like to be able to leverage your expertise gained from Etaler development to do things like TBB, and maybe templates if they fit somehow. Also, possibly a backend/frontend separation like you did in Etaler. Although, we currently only have one backend at the moment. We would have to resurrect our OpenCL code to make a GPU backend, with some re-engineering to get to work at least as fast as the current single-core code.
The backend code is callled bbcore. The C++ wrapper is applied first to create the C++ interface. Secondly, the python wrapper is applied to the generated C++ interface with pybind11. A number of python modules are created to comprise the generated python package which can be imported with
Below is a glossary of our naming conventions that will you help you interpret the code and how it relates to HTM terminology. Where relevant, we provide links to source or example code. The “Headers” refer to the backend ‘.h’ files which each a companion ‘.c’ file which is easy to find. “Block Examples” refer to building a network of blocks manually. “Template Examples” show the use of templates that auto-assemble a network of blocks into a common architecture. The “Sklearn-Style Examples” demonstrate our Python classes that emulate the interface of scikit-learn estimators or transformers. This makes it easy to use and compare with scikit-learn’s library of classifiers and tools. Some code is implemented in Python and we refer to that as “Python Source”.
Block - similar to a region or layer in HTM. A standard interface for all the components of BrainBlocks.
Pattern Pooler - (PP), like HTM Spatial Pooler but with differences. Header
Persistence Encoder - this is a unique BrainBlocks thing where you can represent the passage of time as the same input is received. this creates input changes when you have long sequences like AAAAAAABBBBBBB. This helps with learning and also works great if you have missing data. Header
Pattern Classifier - (PC), our natively distributed classifier. it provides a supervised learning capability to HTM-like architectures. You can assign labels to sets of neurons, and it will train those neurons to activate when the labeled inputs are received. It works quite well in comparison to classic classifier algorithms. Blocks Example, Template Example, Sklearn-Style Example, Header, Python source
BlankBlock - a no-op block that is useful if you want to control the bit encoding directly from your scripts instead of using the backend tools. This is used in conjunction with the Hypergrid Transform. Example, Header
Hypergrid Transform - (HGT), a Python sklearn-style transformer that converts M-dimensional scalar vectors into numpy binary arrays. Can be input into BrainBlocks with the BlankBlock. Example, Python source
Page - these are the input/outputs of the blocks. A page is capable of having parent-child relationships with other pages. The content of the child pages are concatenated to create the content of the parent page. So to connect the output of an encoder to the input of a pooler, you would add the encoder output page as a child of the pooler input page. The pages have both the BitArray and ActArray representation available which are created as needed. Header
BitArray - The full bit representation of neuron activity. This is compact and can represent 8 neurons per byte. Header
ActArray - The sparse active neuron representation. An array of addresses that represent the active neurons. Sometimes this is a preferred representation, but often times the BitArray outperforms it. Header
Permanence - same as HTM permanence.
Statelets - these are analogous to neurons but without any implications of biological function. Either a statelet is active or not. And like its name, it represents a fragment of some greater state representation.
Column - equivalent to HTM minicolumn. This is just a convenience referring to the geometry without needing to explaining the difference between minicolumns and cortical columns. Again, we’re trying to avoid biological discussion and focus on algorithms.
CoincidenceSet - This analogous to a dendrite with synapses. A CoincidenceSet is owned by a statelet or shared by a column of statelets (in the sequence learner block). We renamed it to describe what its functional role is, which is to find statelets that activate which are coincident with the statelet that owns this coincidence set. Header
Receptors - The set of statelets that a CoincidenceSet is using for input (i.e. the potential pool of inputs in HTM parlance). Again, this reflects their functional role of creating a “receptive field” for a particular statelet that owns the CoincidenceSet.
That’s all for now. Let me know if you have more questions and I’ll try and to answer them and make this is as a sort of guide.
Brilliant work. I’m doing some exploring at the moment and wondering how I would set reset to signal the start of a new sequence? Could I just reset time?
Looks like we forgot to put that in. We used to have it in our old internal version, but we neglected to put it back in for the current release. We’ll put it back in as soon as we can.
A hack-ish workaround would be to add a reset code, kind of like a newline or EOF character. This would indicate the end and start of a new signal and would nicely break up your sequences and prevent them from stitching together.
This encoding has to be completely different and has no overlap with all other inputs you would receive. That should get you moving forward until we put in the fix. Actually, it would help if you put this in as an issue on our github page.
I wouldn’t use time as an input unless your waveforms are consistently occurring at the same times. Otherwise, use the PersistenceEncoder.
Thanks @jacobeverist. I’ll try it out. Issue created: https://github.com/the-aerospace-corporation/brainblocks/issues/4