Thoughts about topology

I wanted to talk a bit about using topology with HTM. Most of this is just me working these thoughts out in my head, so some of it might be confusing or inaccurate. I welcome feedback and discussion.

With no topology (the default today in NuPIC), the input space for the Spatial Pooler contains cells with no relationships to each other. The state of any bit in the input will never be correlated to the state of any other bit in the input. With topology enabled, the Spatial Pooler will calculate “neighborhoods” around SP columns based on Euclidean distance in N-dimensional space. This allows us to create a multidimensional input space and Spatial Pooler, also effectively enabling sequence memory to work on multidimensional input. These “neighborhoods” can be calculated no matter the number of dimensions in the input space.

We don’t do it much today

Pretty much all our code examples do not use topology. One major reason is that each additional dimension added to the input space exponentially increases the processing time necessary to make inferences because of the neighborhood calculations involved. With no topology (or “global inhibition”), there is only one neighborhood to calculate. For each additional dimension of topology, that calculation will be increased by a factor of the # of columns in the SP.

Another reason we don’t use topology much in our examples is because we focused on scalar anomaly detection applications for several years. Those applications did not need topology to generate valuable anomaly indications, so we stepped away from it.

Experiments

From my experiments with 2D topology, I have been limited to an input space of 64x64 bits, which is not much. Anything larger than that is prohibitively expensive to run on my laptop for visualizations.

Here is an example of topology. You can ignore the boost factors grid on the right.

The input space is a two-dimensional input (50x50 bits). There are 60 steps in the animation, and it repeats forever. The white dots on the grids are the active columns in the spatial pooler. The green / red gradient indicates active duty cycle for each column (on the center grid) and applied boost factors (on the right grid, which you can ignore).

As you can clearly see in the SP, the active columns have spatial relationships to each other. As the active duty cycle gradient shows, the majority of the active columns are within the center point of the animation, where most of the activity is occurring.

Getting Predictions

I haven’t tried analyzing the predicted cells coming out of the temporal memory algorithm at this point, but I would like to as a part of this discussion. I assume I will need to use a classifier to do this, but I have not done it before. If anyone wants to help guide me in creating a classifier to turn predicted cells into input space bits, let me know.

Beyond the Brain?

The topic of N-dimensional processing brings up an interesting idea. Cortical mini-columns in the brain are topologically restricted to 3 dimensions by the reality of our physical world. But there’s nothing keeping us from creating cortical structures in computers that can process any number of dimensional inputs. Just a thought. :slight_smile:

4 Likes

After discussing with Matt, I think we should try to come up with a nice explanation of the benefits of topology and circumstances under which it should be used. Most current applications lack the input complexity that necessitates topology.

A few comments on your post:

With no topology (the default today in NuPIC), the input space for the Spatial Pooler contains cells with no relationships to each other. The state of any bit in the input will never be correlated to the state of any other bit in the input.

Topology actually limits the relationships that can be learned. Again, without topology, any two cells could be correlated. With topology, any two cells can also be correlated depending on the inputs.

With topology enabled, the Spatial Pooler will calculate “neighborhoods” around SP columns based on Euclidean distance in N-dimensional space. This allows us to create a multidimensional input space and Spatial Pooler, also effectively enabling sequence memory to work on multidimensional input. These “neighborhoods” can be calculated no matter the number of dimensions in the input space.

Without topology, it is better to think of the SP being n-dimensional, where n is the number of columns and the input being x-dimensional, where x is the number of input bits. Using topology doesn’t really change the dimensionality. The columns can learn features across any of the dimensions (it doesn’t know which bits represent a given dimension, it seems them all as distinct, but it will still learn coincidences across whichever bits they occur). Topology actually limits the bits that columns can learn features across. Applying the wrong topology to data could result in features not being represented in the SP output. But it can also aid learning by limiting columns to a local set of bits when only local features should be learned.

Pretty much all our code examples do not use topology. One major reason is that each additional dimension added to the input space exponentially increases the processing time necessary to make inferences because of the neighborhood calculations involved.

The main reason we don’t use topology in current applications is that it doesn’t provide much benefit and requires some extra work to define the topology. The inhibition step (choosing winning columns) is slower when you have to compute it with topology but that isn’t a primary reason we avoid topology (AFAIK).

Another reason we don’t use topology much in our examples is because we focused on scalar anomaly detection applications for several years. Those applications did not need topology to generate valuable anomaly indications, so we stepped away from it.

This sums it up pretty well.

4 Likes

I’ve been ruminating lately on the role of dendritic trees in the functional properties of cells. There is a growing body of evidence that suggests that the location within the dendritic tree where a synapse is formed is important in the formation of spikes. If that is the case, there’s potentially a biologically plausible explanation for topology at some level. In HTM theory we limit the constraint to different synaptic integration zones (proximal vs distal, apical vs lateral, etc.), but I believe there’s a case to be made about the importance of dendritic morphology, too. A few areas where this may be relevant for consideration by anyone else interested in topology include:

  1. Tie breakers. All things being equal, two axons firing at the same time are integrated at different times. A synapse that is closer to the soma will detect one of those faster than a synapse that is further away.
  2. Subsampling. We tend to do random subsampling, but if you could imagine that instead of a contiguous range of cells from which to sample, you could traverse specific branches in the tree you could have a very different sample of inputs from potentially different, but also potentially overlapping regions.

If I were to implement toplogy, I think those are some important details to consider.

1 Like

If the foundations of a subject are not clear you should keep an open mind.
The very slow development of AI is reminiscent of how electricity was explored in the 19th century.
And in fact the study of electricity went through a very mathematical phase too before many of the basic practical applications came into use.

1 Like

I think we may be on the same page.

Jeff Hawkin’s papers have consistently repeated the theme of understanding what the cortex does. The brain goes to considerable effort to maintain topology in the various maps throughout the brain[1]. The patterns that flow through the maps are processed spatially in the brain. I would assume that such an obvious organization feature should not be casually discarded. This may be a key feature that should be understood.

I have been thinking about this and an obvious thing that falls out of this organization is that for a feature transition in input space - it will be sampled by the dendrites of cells around this feature. The cell bodies “around” this transition will learn this feature. The net effect is to distribute memory of this feature in a fuzzy halo of cell activation around the feature transition.

I suggest an efficient method to perform spatial dendrite sampling here:

[1] http://hubel.med.harvard.edu/book/bcontex.htm

1 Like

To me, learning input topology in the spatial pooler has always seemed crucially beneficial, so I’ve always been a bit unclear why others seem to feel so differently. There was actually a thread on Nupic-theory where David Ray and I disagreed on this point specifically (May 2016: Spatial pooling training does not converge on SDR). I can’t tell if there is a theoretical reason for the lack of emphasis on topology, or if it’s more practical. Scott’s response above doesn’t quite help make sense of it to me yet.

If the spatial pooler forms potential synapses to a random set of input bits for each proximal dendrite (no topology), then that means neighboring HTM columns will activate from completely different, non-overlapping sets of active input bits. So, even though different input patterns are semantically similar from spatially correlated bits within the columns’ receptive fields, the resulting SDR represented by HTM columns activated by the SP will not be similar at all. In other words, a significantly different set of columns will represent semantically similar input patterns.

As an example, imagine the input is a visual pattern of the letter A. If you add just a bit of noise or flip a small number of the input bits, then, with or without topology, the active columns will likely be similar. But if you shift the entire input pattern by even a few positions without topology, the effect on the activated columns will be large. If you occlude one third or one half of this A input pattern, the change in active columns is pretty much undefined since there’s no knowing what columns are connected to those bits, right? If input topology is reflected in the SP, then occluding half of the pattern is straightforward: no activity occurs in the corresponding half of columns.

Maybe that example is not representative of typical HTM sensory input, and that could be the cause for my stumbling.

Assuming that anything I just said made any sense, if I could get a good explanation of why I’m wrong or if I am over-emphasizing something that isn’t a big deal, then I think I would finally get it. Maybe I’m just being dense.

2 Likes

I just wanted to second this question. As @dillon.bender points out, with no topology neighboring SP columns will likely become active from totally different encoding activations, so the column next to you won’t respond any more similarly to you to the input than the column furthest from you. The input encoding will either overlap enough with the synapses on your proximal dendrite segment to active you (the column) or it’ll activate another column that may or may not be nearby to you.

If I have this right, it seems this would be ok since the only similarity measure for SDR’s is their overlap, how many bits they share. Two column SDR’s are considered no more similar if their active columns are near each other, only if they overlap. With topology though it may be possible to have similarity based not just on column overlap but on column proximity, how nearby in the SP.

I’ll try and show what I mean with the following 3 SDR’s (‘a’, ‘b’ and ‘c’):

a) 1000001000001000001000001
b) 0100000100000100000100000
c) 00000000*1110000000000000

When these SDR’s are generated with no topology, SDR ‘b’ is no more similar to SDR ‘a’ than it is to SDR ‘c’ since it has the some overlap of 0 with both. The fact that ‘b’'s active bits are close to those of ‘a’ is a coincidence. However with topology, these active bits of ‘a’ and ‘b’ being close to each other would imply that the inputs were more similar. Is that correct?? Regardless this similarity wouldn’t be caught by overlap alone since there is no overlap, only proximity. In order to capture this similarity by proximity another measurement besides overlap would be needed.

In conclusion I return to @dillon.bender 's question, would there be no practical purpose for incorporating this kind of similarity? It seems that it hasn’t been needed for most of NuPIC’s applications so far, though there must be imaginable scenarios where it would have value?

Thanks!

4 Likes

I am not sure yet of my understanding of topology, but wouldn’t that be the case even with topology activated? The potential synapses would still be connected to a random set of input bits, the difference being that some columns will be active over weaker neighboring ones.

So you are implying that local inhibition will inhibit the activation of a potentially useful input pattern just becuase it is connected to a neighboring column? This makes sense, and I also feel there is something wrong with this, and that topology should also be applied somehow to the input field side not just the column field side. But @dillon.bender’s point was that topology as implemented in SP is always useful compared to no topology, so I’m a bit confused.

I don’t think the SP has any way (with or without topology) of manifesting invariance on translation of a certain pattern over the input field, so maybe you’d need to explain more how the 2D letter example is relevant?
If someone knows anything about pattern invariance properties of SP then it would be really beneficial to post / explain it here :slight_smile:

1 Like

Assuming I understand this topic correctly, if the SP is learning input topology, then each column has a local receptive field that overlaps with a very specific part of the input space, typically the “natural center” that the column has with the input. I believe this is the definition of topology in the spatial pooler, right? So, the potential synapses are not connected to a random set of input bits, but a specific, localized subset of them. And normally adjacent, neighboring columns have overlapping receptive fields within a local radius, so the inhibition radius is not global.

That is how the SP works. Spatially similar input patterns are pooled into a single set of active HTM columns. Even if the set of active columns is not perfectly the same, the SP will still generate a highly correlated cluster of active columns. In theory, if the spatial pooler initializes the mini-columns with overlapping receptive fields, and you translate an image only a couple pixels left or right, then most of the columns will still retain nearly the same amount of input activity, therefore the same columns will become active. Now, the amount of translation invariance that the SP can handle is not that great, but there is a definite threshold up to which it can pool different, but highly similar, input patterns.

Instead of the letter A, imagine an input pattern that is a simple vertical line only a few pixels wide in the center of the input space. With topology, the horizontally central HTM columns will become active. Move the input pattern 1 pixel to the right, the same columns will almost certainly be active because the input pattern is still in their receptive fields. This will continue until the input pattern starts entering the next columns’ receptive fields.

Oh right I don’t know what I was thinking, thanks for refreshing my understanding. But as Scott says, SP global inhibition just means topology in n dimensions.

I wouldn’t call that invariance, instead it’s just noise tolerance. Because to the extent that it tolerates small input pattern “shifts” (and generates similar SDR’s), it also loses selectivity. This means that another letter for example that is in the same position as the first one could generate a similar SDR as well, which might not be what we want.

I am wondering if the principles used by convolutional neural networks (which were supposed to recreate the visual cortex) can be somehow applied in SP by “natural” means instead of kernel convolution which is a very artificial mathematical operation.

In my experience topology doesn’t matter that much. The input and output SDR of a region don’t have to correlate. The semantics of SDRs are only local concerns. The SDRs can be reencoded many times through-out the layers and regions but the semantics remain the same throughout the stream. In other words - a ‘AND/OR operation’ only occurs between local layers (ie. 4 and 2/3).

It could be the case that receptive fields are local for the simple reason that proximal dendrites can only physically grow to a maximum radius.

Spatial pooling is just classifying consistent spatial patterns within an arbitrarily noisy input stream. The consistent patterns can be encoded in a vastly different DR, the next layer doesn’t care. This is a part of the beautiful flexibility and robust nature of the cortex.

Its probably true that receptive fields help aid in some level of spatial invariance, but it would only be on a very small scale. (Each receptive field only represents a small detail like an edge).

Then why does the brain maintain topology from map to map?

While you are considering that reflect on what happens when you do have a proper stream of connections and the maps are brought back together at some point in processing.

From my take on the connectome project, there are several loops that jump past the “next” map and maintaining topology allows the projected data to be in alignment with the output of the intermediate map.

The ‘maps’ are combined together in layer 6, but the topology can still be arbitrary. You could even just look at the physical structure of dendrites to appreciate they have incredibility arbitrary topologies.

I’d be interested in any links you could provide me with your line of thinking. Are there articles from the connectome project that relate to this?

Please check out this classic description of the visual topologic map organization:
http://hubel.med.harvard.edu/book/bcontex.htm

And this one showing preservation of topology in the subsequent map-to-map connections.
http://rspb.royalsocietypublishing.org/content/280/1750/20121372

Much of this is highly topologically arranged.

Following up on a basic HTM tenant - the brain uses much the same arrangement everywhere. I have no reason to believe that this basic principle fails as we move from the sensory areas to the association areas. I have been looking to the connectome project to see if the association areas are surrounded with specialty processing areas like V1 and the auditory cortex. (Examples: motion, texture, color, phase delay, …)

I will have to dig through my papers to find the ones that show the topology being preserved going from map-to-map in other sensory streams but for now here are some related links showing how important preservation of topology is in the sensory encoding areas:
(Check the links on the bottom of this wiki page!)



http://www.nature.com/nature/journal/v533/n7601/abs/nature17941.html
http://www.nature.com/neuro/journal/v15/n4/fig_tab/nn.3046_F3.html

http://homepages.inf.ed.ac.uk/jbednar/papers/bednar.tn15_accepted.pdf

We just discussed topology further at HTM Hackers Hangout, here is the video if anyone wants to watch. We start talking about topology about 11 minutes in.

2 Likes

Awesome. The reasoning behind when and why topology is important makes much more sense to me. I had a feeling in my original comment on this thread that the example input data I was considering was just not representative of that used by most Nupic applications.

So, if you really wanted to, I think you could still get away with initializing the spatial pooler’s proximal connections between the columns and input bits in a localized, topological organization if you just mix the various “sub-representations” of scalars, dates, etc., instead of logically grouping them in a continuous section (“x”, “y”, and “date” from Matt’s drawing on the whiteboard). Then columns would receive input that is locally mixed, and you would achieve the same goal as global inhibition without topology.

I don’t know why exactly you would want to do that, though. Except maybe if you wanted to plan for the future when topology is necessary, as Matt explained.

Thanks for all the links. I think the last link was the most useful.

The paper describes the topographic maps between non-cortical regions that eventually map to an input sensory cortical region, ie. whiskers, brainstem, thalumus then cortex. This makes sense as non-cortical regions do not have pyramidal neurons. But you’re right, the maps remain consistent through a cortical hierarchy (visual atleast).

However I think it’s important to realize that the receptive field topology is not necessary for cortical computation - actually it might just be an emergent structure of physical constraints. I believe basal and apical dendrites are arbitrary (meaning they just connect to any axons they find in their target layers), but they seem structured because dendrites can only grow out so far (physically limited within a radius or ‘neighborhood’). If dendrites could grow out to infinite distances then we’d probably find ‘receptive fields’ looking more like the potential pools in NuPIC - spanning across the whole region - connecting to any input cell. Because we use computers we don’t have the same limitations as biology.

I’ve done a few silly simple illustrations to show that it really doesn’t matter if the feedforward connections are local or distributed - its the pathway that matters.

Topographic maps in a hierarchy have a many-to-one relationship. However, again - it can be distributed or local - it doesn’t make a difference.

I am happy to learn from any opposing ideals. I am curious to any reason why maps must correlate locally (receptive field) for the cortical hierarchy to work.

The brain repeats and matches up the maps to an almost unbelievable degree. This map organization is preserved in serially connected maps. This related mapping response to input stimulation is so prevalent that it is used to trace the map-to-amp connections in research. Much the same is true for all senses.

http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-879X2002001200008

Local processing (second order maps) for comparing parts of two adjacent maps (left/right eye) is used for special processing like stereo or texture in some maps.

The perceived real world has organization; I think this biological connection programming forces the columns to learn useful spatial relationships about the inputs. There is a spreading of activation as the image is relayed from map to map, but it stays aligned; I think this is accordance with the original HTM vision model that Jeff Hawkins proposed.

Considering that we are trying to tease apart the function of the cortex algorithm with the HTM model it seems very cavalier to just discard this basic organization until we really do understand how it works.

IMHO - controlling of distribution of receptive fields before training is part of defining the function of the area; I think that it defines what the area(s) will try to learn. I have been thinking of this as a fundamental part of configuring (Programming ? Designing?) a layer. A deeper understanding of (and use of) the of map-to-map connections should be part of defining how data will be spread through the hierarchy when configuring a system. This configuration should be at about the same level of planning as setting up the encoders.

In the hacker hangout video above - rhyolight mentions that the time field needs to be connected directly to other parts of the map to be part of what is learned about the input data. When we try to do everything with one layer we end up having to do “unnatural” connections to get it to work at all. A different way to doing a time field is to have a distributed map of time with widespread connections throughout the projected fields into an association area; I think that this leads to a better sensor fusion function.

I suspect that this will allow our tiny models of the brain to do more with less.

I agree with you on the mapping (especially the very nuanced visual cortex). I’ve dug up some old bookmark links illustrating the mapping in the hierarchy (for the sake for clarity).

This allows for edges to combine into shapes that combine into objects that combine into larger objects. Upon each level each feature is represented in local spatial groups.

Like the guys were saying in the video the distributed/arbitrary topology is used because the input data is not like visual data - so local receptive fields are not needed (could even be a hindrance). However, distributed typologies are so flexible they can still adapt to become local receptive fields if the potential pool is big enough. This is possible because of Hebbian re-enforcement.

The top figure shows a classic receptive-field-like topology. The second shows a part of a distributed topology where the permanence values of this representation has been learned - essentially the same topology as the receptive field.

I’m not sure if NuPIC implement this feature - but like in the cortex there is constant synaptic genesis and pruning. Each cell/dendrite can constantly generate new segments that represent a new spatial feature in the input space. As the synapses are pruned away after Hebbian learning and column competition each segment will represent something unique. If visual data were fed into the region it will naturally form locally grouped connections to the input space that will represent edges, shapes, objects, etc. If another type of data were fed it then the connections will naturally generate topologies most suitable for that data.

3 Likes

Right now it seems like NuPIC has been used on problems where temporal locality is important, but not spatial locality. If I remember the “Hot Gym” example correctly, NuPIC was given power usage and date-time info and learned to determine how power would be used next. In that case, the added detail of how physically close the representation of date-time is to the representation of the number indicating power usage shouldn’t matter.

However, with more spatial problems, like how to balance a robot limb, how one protein interacts with another protein, or how to find your car keys, the nearby inputs are much more important than the distant inputs. After all, you wouldn’t start looking for your car keys by tracing your steps back all the way back to your very first birthday.

It makes sense that the local algorithms are only useful for a certain set of problems, but they are important problems. However, I believe it would be a good idea to optimize, or at least look for different ways of implementing things, because only being able to work on a 64x64 image limits being able to experiment with hierarchies on detailed spatial data. Meanwhile, here’s a highly computational spatial operation done on a GPU:

That’s a 2010 GPU simulating local interactions between a million particles at 2-3 FPS. If NuPIC is parallizable enough—and I believe it really should be, as long as ‘local’ is defined as a short enough range—then NuPIC should receive as much benefit from GPU optimization. Plus, because of how much of NuPIC is designed around the neural column, it should be easier to design it using a library for parallel or swarm computing.

For example, if I remember local inhibition correctly, the most activated columns in the spatial pooler inhibit nearby columns, so the highest activated remain activated, but less activated don’t. That reminded me of edge detection, so I messed around with convolution matrices in gimp, and changed an edge detection matrix so that edges were highlighted within brighter regions and the inner parts were dimmed instead of removed:

What’s interesting about the third image is that I can change the average brightness of the image by setting the central value between 24.0 and 25.0, and a pixel that was previously invisible in the source image is now hard to miss. Here’s the central image for comparison:

That inhibition matrix can be made any size, and the central value will be between the number of items in the matrix and one less than the number of items in the matrix. After that matrix is applied across a spatial pooler of columns with, in this example, 24.5 as the central value (half original image brightness), the top 2% of columns could be chosen to maintain sparsity, which could also be optimized with a GPU.

Though, further researching led me to something called an unsharp mask, and it didn’t produce the artifacts of the matrix I made. It highlights the ‘invisible’ pixel without generating any lines, so it should work for local inhibition without generating and detecting textures that aren’t there.

Oh yeah, speaking of looking at libraries to use, I believe TensorFlow has a function for applying operations to individual numbers in an n-dimensional array of numbers and using the GPU to apply them. I think I’ll look into using a 4-dimensional tensor to give an i,j location and receive a 2-dimensional “local connection strength” tensor centered upon the i,j location of a neural column and input matrix in separate 2-dimensional tensors. Then, to activate I’ll apply rectified linear of input to each column, apply an unsharp mask to the activated columns, choose the top 2% of activated columns (within a certain area?), and then increase the “local connection strength” values for active inputs for activated columns and decrement the values for inactive inputs. Then I need to add boosting by computing active duty cycles over time, comparing them to local columns, and adding the a value related to the percent above/below the average inverted and added to one. That should implement a spatial pooler with localized connections, inhibition, learning, and boosting. (Sorry if this paragraph is extremely confusing. I’m using this to think out loud and keep track of everything I need to do later.)

I’ll try implementing some of that tomorrow. :grin:

1 Like