Alternative algorithms for dendrite optimization problem

Here I want to discuss a problem in which the main goal is to find a set of dendrites to optimally cover a data set of high-dimensional SDRs.
“high” means something like tens of thousands or more bits instead of hundreds or a couple thousands as the input pattern series are sized in HTM.

Why? because

  • the search space in HTM-like tools is performance capped by the size/complexity of input.
  • the eager strategy on tapping on immediate available potential patterns within a SDR might not be optimal.

What does “optimal” means? There are two types of metrics to consider, one is the individual dendrite level and the other one is global, population-of dendrites level.

  • a dendrite’s sharpness - it is considered to be “sharp” when its all synapse inputs are either active or inactive. And “confused” when only some of its synapses are active. We can call it “consistency” - the dendrite is optimized for a clear micro-pattern (or feature) in the large SDR.
  • at population level we have completeness or coverage criteria - a complete coverage is when all 1 bits within an any given input SDR are covered by signalling dendrites. Which means “no bits left unaccounted” in input SDRs
  • also at population level we have two minimalist criteria,
    • at dataset level to have a minimum and sufficient number of dendrites necessary to cover any input SDR in the dataset.
    • minimize redundancy on any individual SDR which means again have as few as possible active dendrites.

As you see the optimising for all three criteria above could be a tricky problem.

This seems related to the tiling problem in which dendrite is a tile. The differences are:

  • we can pick any “shape” for a “tile”,
  • there is no limited number of tiles BUT we want to find those particular shapes that allow for a minimum number of tiles.
  • the are many “spaces” to be covered - each input SDR On bits form a different space.

Further motivation:

Biological intelligence manages to somehow not only figure out a way to “recognize” a “large” pattern but to also home in to a minimum set of micro-patterns that are necessary and sufficient (== representative) for the larger pattern.

The highly praised “few shot learning” animals are allegedly capable of might involve more than just quickly adding/removing some synapses as in “here-s a paper then voila I understood (e.g.) Newtonian gravity”. It takes time to dig into finding the significant correlations needed to represent then understand any given problem or spatio-temporal context.

There are reasons to believe the high number of (mini)columns are needed not only to record and recognize all patterns we encounter, but also for large scale data mining needed discover a minimalist set of relevant ones.

1 Like

Is this also partly a problem of a higher order of complexity by tyring to take a moment in time sensory snapshot equivalent and feed it into a stateful biological equivalent that only deals with smaller incremental changes to build that larger thousand bit state.

i.e. we operate on a change basis and not a singular sensory pattern (e.g. saccades, surface contact area involved in touch as the finger moves). The couple thousand bit SDR is the incremental result of many smaller itterative bit changes, rather than the whole pattern flipping in a singular itteration.

When you create the source input SDR is part of the solution actually hiding in that process rather than the first layer of the HTM for dendrite thinning ?

Do we view SDR’s as static flash moments in time or are they always temporal constructs ?

When the pattern / SDR build process occurs does this then help with a fracturing / thinning process of the dendrites into smaller groups (reduce the performance cap constraint back to smaller permutation clusters).

Just some thoughts that may help bump your internal SDR into recognising something that works !

2 Likes

The algorithm is agnostic regarding what its “input” is. Let’s call it “receptive field” and it can be

  • input layer, whatever that means e.g. sensor data.
  • output from other cells/dendrites, aka previous “layers”
  • some (hopefully useful) processing function of the above, e.g. most active cells during past 10 seconds,
  • It might even be “full brain size” meaning all neurons that fired during last time step.

And anything else encoded as “many lists of bits (== sdr-s) all of the same size”

It simply considers each list of bits as a receptive field of points that can turn either 0 or 1
The algorithm’s purpose is to generate a population of dendrites that optimally harvest the 1 bits in the field on future samples of the receptive field.

Regarding whether we want to include a previous state or history of the receptive field…

  • it simply can be treated as a larger receptive field - its size multiplies with the number of previous states.
  • or pass some time series input through some recurrent network - e.g. a RNN and use its hidden state as receptive field. Another examples of recurrent network is the Temporal Memory, or reservoirs.
1 Like

What about a type of progressive attention sub-selection type approach on the input pool. i.e. if you have 100,000 bit space with 1% active and 1,000 active bits these bits would have a correlation already as part of what they represent (i.m thinking of say sensory inputs from an area on the skin - no noise within the input data for training). If the attention goes over the bit pool saccade style part of this switching is then passing though some of that desired optimization already.

Within HTM there is an assumption the SDR contains all of the data and the actual layout of the bits within the SDR are random (bit of a conflict with columns and hex grids). Can the saccade across windows of the bit pattern infer part of the source correlations and therefore be part of the optimization ?

The 100k space could start off under connected and then added to, reverse way to the thinning approach. Using the input saccade approach to build weight on the input stage rather than a one shot step approach a few saccades per step method instead to identify the best connections ?

You start of with dominant synapses that are subsequently diluted with learning as they learn more context.

Thinking back to your MNIST experiments and thinking along the lines of rather than feeding in the whole array at once you take a straw approach and scan over the image for several steps to recognise the digit (deeper rather than flatter topography). I can’t remember if you already experimented with that approach, I seem to remember something along those lines on this forum.

1 Like

Thanks, these ideas are interesting.

Till now I was only fantasising about

  1. using a foveal approach in vision (limited view → movement chain). But without a metric of distance between pixels and available directions we are limited to random sub-sampling
  2. That’s why is important to figure out a metric of “closeness” between bits within a stream of SDRs
  3. combining 1 & 2 above for enhanced vision :sunglasses:

Regarding this topic’s problem, a solution for 2. is a good beginning, because if two bits are “close” (== fire together often) it makes sense to have them “watched” by the same dendrite

1 Like

Ok I’ll go with the following genetic algorithm based on cake rewards shared between active dendrites for each input SDR.
I’m writing it here so I won’t get lazy on it.

The main idea is that each input SDR’s active bit (=1) represents a reward of one cake.

Dendrites activate by threshold as usual - e.g. 6 active synapses → dendrite becomes active, aka awakens

All awake dendrites share the available cakes as following:

  • each synapse counts as a “mouth” biting from it’s input bit-cake.
  • if it bites from a 0 bit the synapse’s reward is 0
  • if synapse bites from a bit that is 1, the corresponding cake is shared between all synapses trying to “eat” it.
  • synapses of dormant dendrites won’t get any cake even if their corresponding input bit is 1. They get the chance to bite only when their own dendrite becomes active.

For each generation, a total score is kept for each synapse over a significantly large subset of input SDRs.
A dendrite’s fitness score is simply the sum of its synapse scores.

Least fit dendrites are discarded.

The most fit ones are used as parents of the new generation.

A child dendrite inherits a mix of best performing synapses from its parent(s) and a few random synapses.

The cake sharing trick promotes competition and prevents all dendrites from caring only to most frequent bits/patterns,
Forces them to search for less obvious (but still rewarding) input sub-patterns into the data.

1 Like

This might be a stupid question, but you mentioned something I care about a lot:
Do you have any algorithms/ideas for learning the “closeness”/“direction” for 2 inputs, or how the information that these 2 inputs are close spatially would be stored?

1 Like

Let’s make sure I understand your question…

There are two types of inputs that can be matched for “closeness”.

  1. Two full data frames, e.g. two embedding vectors of an autoencoder For that are few vector distance metrics e.g. euclidean, cosine or hamming distance.

  2. Two scalar positions within above vector or two bit positions within a SDR. The question is how close are bit 75 and 515 semantically across a whole dataset? Or any two bits within a given representation.

What I recall discussing is the pt. 2. above, using binarized & flattened
(and eventually permuted) 784bit long MNIST images as an example case.
That’s what you are interested in?

1 Like

Yeah, I think #2, assuming the input comes in completely randomized/scrambled; with an mnist image example, I want to solve for a situation where we do not know pixel positions/directions relative to each other, nor do we know the resolution of the original image, nor do we even know the dimensionality of the input or what it is.

More than wanting to know how close 2 bits are, I also want to know how to figure out the direction they are to each other. Meaning this algorithm might also have to learn what directions are possible.

idk much about vector embedding, but maybe that’s what I want?

1 Like

Well, there-s a lot of questions you have :slight_smile:

Beware that the underlying 2D space is one thing but there might be any other “hidden” semantic dimensions within the data. Separating between the two (if that is what you want to do) might be a tough problem.

For now I’m a bit biased towards NOT separating them because any correlation is potentially useful.

All I managed to do for now was to build a co-occurence table 784x784 and used MNIST training set to increment each X,Y cell for every image in which both X and Y pixels are 1.
I takes a couple seconds to parse all 60000 digits and generate the table.

This table can be regarded as a inverse distance graph between pixels - the more the pixels “fire together”, the “closer” they are.
Beware the MNIST digits have a 2 pixel “white” border on all images, so making spatial assumptions about the border is both useless and impossible. From the table/graph perspective they are simply pixels infinitely far away from every other pixel into an unknown. irrelevant direction.

I am more interested in generating “useful dendrite segments” now so I did this by picking random starting pixel for a segment and add “nearby” pixels to the segment, till I get e.g. 16 pixel (or synapses) dendrite segments.

Visualizing these 16 pixels on the actual 28x28 space they look mostly like connected blobs of various shapes, some more “close” some more “slanted” and some separated in smaller blobs or a blob + 1-2 disconnected pixels.

For now I’m happy with these because they look “spatially biased” indeed.
Will have to follow to testing how “good” they are at “seeing” actual digit images then follow up with a classification task.

PS
they look also “semantically biased” in the sense that since the blobs originate from MNIST drawings, I expect them to “fit” better to these than arbitrary 2d constructs like convolutions.

Fingers crossed, this method could be a cheap way to make “semantic convolutions” on arbitrary data. But it is premature to claim that.


Another use of a pixel-proximity map could be in dataset augmenting.

When you are aware of e.g. the underlying 2D representation one can easily generate rotate/translate/resize a scarce dataset to produce more training data - and it improves classification results indeed.

But without such information about structure, augmenting data by swapping blobs with “nearby” ones (== a blob with high pixel overlap with the one replaced) could be an useful means to augment data, either at training or at inference time.
At least better than random noise which also seems to improve results only slightly.

2 Likes