Thoughts about topology

I am not sure yet of my understanding of topology, but wouldn’t that be the case even with topology activated? The potential synapses would still be connected to a random set of input bits, the difference being that some columns will be active over weaker neighboring ones.

So you are implying that local inhibition will inhibit the activation of a potentially useful input pattern just becuase it is connected to a neighboring column? This makes sense, and I also feel there is something wrong with this, and that topology should also be applied somehow to the input field side not just the column field side. But @dillon.bender’s point was that topology as implemented in SP is always useful compared to no topology, so I’m a bit confused.

I don’t think the SP has any way (with or without topology) of manifesting invariance on translation of a certain pattern over the input field, so maybe you’d need to explain more how the 2D letter example is relevant?
If someone knows anything about pattern invariance properties of SP then it would be really beneficial to post / explain it here :slight_smile:

1 Like

Assuming I understand this topic correctly, if the SP is learning input topology, then each column has a local receptive field that overlaps with a very specific part of the input space, typically the “natural center” that the column has with the input. I believe this is the definition of topology in the spatial pooler, right? So, the potential synapses are not connected to a random set of input bits, but a specific, localized subset of them. And normally adjacent, neighboring columns have overlapping receptive fields within a local radius, so the inhibition radius is not global.

That is how the SP works. Spatially similar input patterns are pooled into a single set of active HTM columns. Even if the set of active columns is not perfectly the same, the SP will still generate a highly correlated cluster of active columns. In theory, if the spatial pooler initializes the mini-columns with overlapping receptive fields, and you translate an image only a couple pixels left or right, then most of the columns will still retain nearly the same amount of input activity, therefore the same columns will become active. Now, the amount of translation invariance that the SP can handle is not that great, but there is a definite threshold up to which it can pool different, but highly similar, input patterns.

Instead of the letter A, imagine an input pattern that is a simple vertical line only a few pixels wide in the center of the input space. With topology, the horizontally central HTM columns will become active. Move the input pattern 1 pixel to the right, the same columns will almost certainly be active because the input pattern is still in their receptive fields. This will continue until the input pattern starts entering the next columns’ receptive fields.

Oh right I don’t know what I was thinking, thanks for refreshing my understanding. But as Scott says, SP global inhibition just means topology in n dimensions.

I wouldn’t call that invariance, instead it’s just noise tolerance. Because to the extent that it tolerates small input pattern “shifts” (and generates similar SDR’s), it also loses selectivity. This means that another letter for example that is in the same position as the first one could generate a similar SDR as well, which might not be what we want.

I am wondering if the principles used by convolutional neural networks (which were supposed to recreate the visual cortex) can be somehow applied in SP by “natural” means instead of kernel convolution which is a very artificial mathematical operation.

In my experience topology doesn’t matter that much. The input and output SDR of a region don’t have to correlate. The semantics of SDRs are only local concerns. The SDRs can be reencoded many times through-out the layers and regions but the semantics remain the same throughout the stream. In other words - a ‘AND/OR operation’ only occurs between local layers (ie. 4 and 2/3).

It could be the case that receptive fields are local for the simple reason that proximal dendrites can only physically grow to a maximum radius.

Spatial pooling is just classifying consistent spatial patterns within an arbitrarily noisy input stream. The consistent patterns can be encoded in a vastly different DR, the next layer doesn’t care. This is a part of the beautiful flexibility and robust nature of the cortex.

Its probably true that receptive fields help aid in some level of spatial invariance, but it would only be on a very small scale. (Each receptive field only represents a small detail like an edge).

Then why does the brain maintain topology from map to map?

While you are considering that reflect on what happens when you do have a proper stream of connections and the maps are brought back together at some point in processing.

From my take on the connectome project, there are several loops that jump past the “next” map and maintaining topology allows the projected data to be in alignment with the output of the intermediate map.

The ‘maps’ are combined together in layer 6, but the topology can still be arbitrary. You could even just look at the physical structure of dendrites to appreciate they have incredibility arbitrary topologies.

I’d be interested in any links you could provide me with your line of thinking. Are there articles from the connectome project that relate to this?

Please check out this classic description of the visual topologic map organization:
http://hubel.med.harvard.edu/book/bcontex.htm

And this one showing preservation of topology in the subsequent map-to-map connections.
http://rspb.royalsocietypublishing.org/content/280/1750/20121372

Much of this is highly topologically arranged.

Following up on a basic HTM tenant - the brain uses much the same arrangement everywhere. I have no reason to believe that this basic principle fails as we move from the sensory areas to the association areas. I have been looking to the connectome project to see if the association areas are surrounded with specialty processing areas like V1 and the auditory cortex. (Examples: motion, texture, color, phase delay, …)

I will have to dig through my papers to find the ones that show the topology being preserved going from map-to-map in other sensory streams but for now here are some related links showing how important preservation of topology is in the sensory encoding areas:
(Check the links on the bottom of this wiki page!)



http://www.nature.com/nature/journal/v533/n7601/abs/nature17941.html
http://www.nature.com/neuro/journal/v15/n4/fig_tab/nn.3046_F3.html

http://homepages.inf.ed.ac.uk/jbednar/papers/bednar.tn15_accepted.pdf

We just discussed topology further at HTM Hackers Hangout, here is the video if anyone wants to watch. We start talking about topology about 11 minutes in.

2 Likes

Awesome. The reasoning behind when and why topology is important makes much more sense to me. I had a feeling in my original comment on this thread that the example input data I was considering was just not representative of that used by most Nupic applications.

So, if you really wanted to, I think you could still get away with initializing the spatial pooler’s proximal connections between the columns and input bits in a localized, topological organization if you just mix the various “sub-representations” of scalars, dates, etc., instead of logically grouping them in a continuous section (“x”, “y”, and “date” from Matt’s drawing on the whiteboard). Then columns would receive input that is locally mixed, and you would achieve the same goal as global inhibition without topology.

I don’t know why exactly you would want to do that, though. Except maybe if you wanted to plan for the future when topology is necessary, as Matt explained.

Thanks for all the links. I think the last link was the most useful.

The paper describes the topographic maps between non-cortical regions that eventually map to an input sensory cortical region, ie. whiskers, brainstem, thalumus then cortex. This makes sense as non-cortical regions do not have pyramidal neurons. But you’re right, the maps remain consistent through a cortical hierarchy (visual atleast).

However I think it’s important to realize that the receptive field topology is not necessary for cortical computation - actually it might just be an emergent structure of physical constraints. I believe basal and apical dendrites are arbitrary (meaning they just connect to any axons they find in their target layers), but they seem structured because dendrites can only grow out so far (physically limited within a radius or ‘neighborhood’). If dendrites could grow out to infinite distances then we’d probably find ‘receptive fields’ looking more like the potential pools in NuPIC - spanning across the whole region - connecting to any input cell. Because we use computers we don’t have the same limitations as biology.

I’ve done a few silly simple illustrations to show that it really doesn’t matter if the feedforward connections are local or distributed - its the pathway that matters.

Topographic maps in a hierarchy have a many-to-one relationship. However, again - it can be distributed or local - it doesn’t make a difference.

I am happy to learn from any opposing ideals. I am curious to any reason why maps must correlate locally (receptive field) for the cortical hierarchy to work.

The brain repeats and matches up the maps to an almost unbelievable degree. This map organization is preserved in serially connected maps. This related mapping response to input stimulation is so prevalent that it is used to trace the map-to-amp connections in research. Much the same is true for all senses.

http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0100-879X2002001200008

Local processing (second order maps) for comparing parts of two adjacent maps (left/right eye) is used for special processing like stereo or texture in some maps.

The perceived real world has organization; I think this biological connection programming forces the columns to learn useful spatial relationships about the inputs. There is a spreading of activation as the image is relayed from map to map, but it stays aligned; I think this is accordance with the original HTM vision model that Jeff Hawkins proposed.

Considering that we are trying to tease apart the function of the cortex algorithm with the HTM model it seems very cavalier to just discard this basic organization until we really do understand how it works.

IMHO - controlling of distribution of receptive fields before training is part of defining the function of the area; I think that it defines what the area(s) will try to learn. I have been thinking of this as a fundamental part of configuring (Programming ? Designing?) a layer. A deeper understanding of (and use of) the of map-to-map connections should be part of defining how data will be spread through the hierarchy when configuring a system. This configuration should be at about the same level of planning as setting up the encoders.

In the hacker hangout video above - rhyolight mentions that the time field needs to be connected directly to other parts of the map to be part of what is learned about the input data. When we try to do everything with one layer we end up having to do “unnatural” connections to get it to work at all. A different way to doing a time field is to have a distributed map of time with widespread connections throughout the projected fields into an association area; I think that this leads to a better sensor fusion function.

I suspect that this will allow our tiny models of the brain to do more with less.

I agree with you on the mapping (especially the very nuanced visual cortex). I’ve dug up some old bookmark links illustrating the mapping in the hierarchy (for the sake for clarity).

This allows for edges to combine into shapes that combine into objects that combine into larger objects. Upon each level each feature is represented in local spatial groups.

Like the guys were saying in the video the distributed/arbitrary topology is used because the input data is not like visual data - so local receptive fields are not needed (could even be a hindrance). However, distributed typologies are so flexible they can still adapt to become local receptive fields if the potential pool is big enough. This is possible because of Hebbian re-enforcement.

The top figure shows a classic receptive-field-like topology. The second shows a part of a distributed topology where the permanence values of this representation has been learned - essentially the same topology as the receptive field.

I’m not sure if NuPIC implement this feature - but like in the cortex there is constant synaptic genesis and pruning. Each cell/dendrite can constantly generate new segments that represent a new spatial feature in the input space. As the synapses are pruned away after Hebbian learning and column competition each segment will represent something unique. If visual data were fed into the region it will naturally form locally grouped connections to the input space that will represent edges, shapes, objects, etc. If another type of data were fed it then the connections will naturally generate topologies most suitable for that data.

3 Likes

Right now it seems like NuPIC has been used on problems where temporal locality is important, but not spatial locality. If I remember the “Hot Gym” example correctly, NuPIC was given power usage and date-time info and learned to determine how power would be used next. In that case, the added detail of how physically close the representation of date-time is to the representation of the number indicating power usage shouldn’t matter.

However, with more spatial problems, like how to balance a robot limb, how one protein interacts with another protein, or how to find your car keys, the nearby inputs are much more important than the distant inputs. After all, you wouldn’t start looking for your car keys by tracing your steps back all the way back to your very first birthday.

It makes sense that the local algorithms are only useful for a certain set of problems, but they are important problems. However, I believe it would be a good idea to optimize, or at least look for different ways of implementing things, because only being able to work on a 64x64 image limits being able to experiment with hierarchies on detailed spatial data. Meanwhile, here’s a highly computational spatial operation done on a GPU:

That’s a 2010 GPU simulating local interactions between a million particles at 2-3 FPS. If NuPIC is parallizable enough—and I believe it really should be, as long as ‘local’ is defined as a short enough range—then NuPIC should receive as much benefit from GPU optimization. Plus, because of how much of NuPIC is designed around the neural column, it should be easier to design it using a library for parallel or swarm computing.

For example, if I remember local inhibition correctly, the most activated columns in the spatial pooler inhibit nearby columns, so the highest activated remain activated, but less activated don’t. That reminded me of edge detection, so I messed around with convolution matrices in gimp, and changed an edge detection matrix so that edges were highlighted within brighter regions and the inner parts were dimmed instead of removed:

What’s interesting about the third image is that I can change the average brightness of the image by setting the central value between 24.0 and 25.0, and a pixel that was previously invisible in the source image is now hard to miss. Here’s the central image for comparison:

That inhibition matrix can be made any size, and the central value will be between the number of items in the matrix and one less than the number of items in the matrix. After that matrix is applied across a spatial pooler of columns with, in this example, 24.5 as the central value (half original image brightness), the top 2% of columns could be chosen to maintain sparsity, which could also be optimized with a GPU.

Though, further researching led me to something called an unsharp mask, and it didn’t produce the artifacts of the matrix I made. It highlights the ‘invisible’ pixel without generating any lines, so it should work for local inhibition without generating and detecting textures that aren’t there.

Oh yeah, speaking of looking at libraries to use, I believe TensorFlow has a function for applying operations to individual numbers in an n-dimensional array of numbers and using the GPU to apply them. I think I’ll look into using a 4-dimensional tensor to give an i,j location and receive a 2-dimensional “local connection strength” tensor centered upon the i,j location of a neural column and input matrix in separate 2-dimensional tensors. Then, to activate I’ll apply rectified linear of input to each column, apply an unsharp mask to the activated columns, choose the top 2% of activated columns (within a certain area?), and then increase the “local connection strength” values for active inputs for activated columns and decrement the values for inactive inputs. Then I need to add boosting by computing active duty cycles over time, comparing them to local columns, and adding the a value related to the percent above/below the average inverted and added to one. That should implement a spatial pooler with localized connections, inhibition, learning, and boosting. (Sorry if this paragraph is extremely confusing. I’m using this to think out loud and keep track of everything I need to do later.)

I’ll try implementing some of that tomorrow. :grin:

1 Like

It makes sense that the local algorithms are only useful for a certain set of problems, but they are important problems. However, I believe it would be a good idea to optimize, or at least look for different ways of implementing things, because only being able to work on a 64x64 image limits being able to experiment with hierarchies on detailed spatial data.

There’s a way to optimize a HTM implementation by using a ‘propagation algorithm’. There is very little high-level operation - its mostly all local interactions between Cell and Dendrite/Segment objects. All the cells and dendrites are connected by object instance references. When a cell gets a feedforward propagation from a dendrite (segment activation) the cell then propagates forward to all the segments that connect to it (via the ‘axon’). When the segments reach their threshold they then forward propagate to their target cells.

The same general idea is used for local inhibition. When a cell propagates to segments connected to its ‘axon’ it can also propagate negative feedforward values to neighboring cells - which causes sparsity and competition. The structure of columns emerge from the schematic of cell classes/layers and local connectivity.

The benefit of local computation is that the limit on the number of cells and segments you have is based on computer memory capacity, not CPU/GPU processing capacity. No matter how many object instances you have (i.e gigabytes) the propagation algorithm will compute very fast as it only processes the sparse activated cells. Most cells are inactive (so therefore segments too) so there is very little iteration within each feedforward step. If there were 4098 cells in a region then only ~164 cells (4% sparsity) will need to propagate (be processed) in each step.

I found this works quite well, except the initialization of the region takes some time to construct all the objects. But from there it’s light.

Anyway, Sunday morning blabber. Need to eat breakfast!

1 Like

True, but even then, with a million columns (enough for input from a typical computer monitor), 4% activation gives 40000 active columns, and a temporal pooler just beginning learning might activate all 8 cells in many of the activated columns which could strengthen/weaken 4 connections per cell (guessing value via this post), performing operations on 1,280,000 segments independently. If I choose the method of adding up activation input values for each column and have 100 local input locations per column, that’s 100 million floating point additions.

For storage, the temporal memory would require connection strength to be stored for each connection, which means 8 million cells times about 4 segments per cell, giving 32 million strength values, or 128 MBytes of storage if 32 bit floats are used. Meanwhile, the spatial pooler has 1 million columns with, depending on input radius, around 100 connections per cell, which requires about 400 MBytes of storage. If all that’s right, it comes up to about 500 MBytes of storage.

That means my laptop’s GeForce 940MX should be able to store about four of those million column networks on GPU memory. It should also be able to handle 100 million floating point additions per second, which is way under the tens of Gigaflops most GPUs are capable of performing. (I think that means it should be able to run the HTM layer at 100 FPS, even using an un-optimized column activation method.)

Whew, that was fun! Now I need to eat breakfast too.

Edit: I think this means I could theoretically run a 4-layer, 1 million column per layer, 8 cell per column, HTM network on my laptop.

That makes sense given the memory footprint of the HTM apps I run.

Assuming your figures are correct …

Going further - 10 fps is about what a human does in dealing with the world.

If you can run 100 fps with one area then you should be able to run 10 layers at 10 fps with a 5 GB footprint; well within the capabilities of modern hardware.

Assuming that not all of your areas need to be a full 1K on a side frame buffer you could run more but smaller areas (say 256 x 256) for a very fancy hierarchical system.

BTW: I don’t see any reason you need to do full floating point; 8-bit synapses and integer math should be enough. Depending on the processing hardware there may be a significant speed-up to be had.

1 Like

Don’t worry, they’re probably off by at least an order of magnitude. It was a ballpark estimate. I can hope though.

However, it’ll be a little while before we see how far off I am. Right now, I’m still writing a visualizer to see what I’m doing with any HTMs I try to make:

Working with VTK was harder than I thought it would be, even in python. I just got animation working though. Now I need a few functions for setting up planes of points, which shouldn’t be too hard, and then I can work on connecting that with TensorFlow and/or Nupic… which could be very hard depending on how successfully I can shove the running code from those libraries into a single vtk callback function…

I used floats for simplicity, but yeah, I could imagine 4x or 8x less data usage and 4x or 8x speed up if the right types are used. If that could be applied to everything, then the number of simulated neurons would probably be pretty impressive.

1 Like

This is very interesting… would you say topology represents something like receptive fields? It would be very useful in visual applications for having parts of the spatial pooler (I’m new here… still getting used to the HTM/nupic lingo) detecting local features like orientation or intensity and propagate that higher up the hierarchy to more abstract concepts like shapes.

Do you think the boosting concepts can be used to implement on-center off-center receptive fields similar to this?

I can’t wrap my head around how this would work computationally. That’s probably the biggest bottleneck at the moment. How can we use the distributed processing technologies that are maturing these days? Or offload some of the fixed, expensive processing to an FPGA.

Hey,

interesting discussion. I am wondering if there are new thoughts on it in light of the new theories and discoveries?

When thinking about a network hierarchy which mainly looks at the input with different scales of perceptive fields and an allocentric location integrated such that every column builds feature and object representation (together with motor movements…), topology seems very important to integrate well into this framework.

I will have to read up the newer papers again focussing on that aspect, but maybe someone can align the theory more clearly.

Kind regards

PS:
Additionally, I saw in the code and running the visualizations that we the SP does not account for columns on the sides/edges to have a smaller potential pool due to the end of dimension.
We can use the wrap option to avoid unfair competitions (due to a smaller active pool -> less connections) for this columns but that barely seems like a quick fix and not really a solution for maintaining topology.
Jeff mentioned in a talk about the tangential theories about e.g. sensors for touch are not uniformly distributed, could a topology account for that? How does it play together with allocentric location on an object?