Monotonic grey code representations of numbers

I was thinking of a way to represent color, since all of the neural networks I’ve seen either seem to work with a greyscale image or a black and white image, and the normal sparse ways of representing color seemed… a bit much.

With images, there are already millions of inputs, and, without losing accuracy, if I represented them using an encoder that used about one new column per color value, I’d have about 768 columns per pixel, which would mean 768 million columns for a decent screen size. However, if I used non-sparse encodings like normal binary, then similar numbers, like 7 and 8, would have huge amounts of differences in on/off columns (0111 and 1000: 0% in common). That would mean a learning algorithm predicting upwards counting would have to predict a change in every bit, and if a single bit was changed, the number could change by a large amount, no matter how many bits the representation was.

However, this problem is actually what grey code was created for. (Well, actually, it was made because not all switches would always switch at the same time before calculations were carried out, so 0111 to 1000 would have spurious numbers in between that could be used for calculations.) 7 and 8 in grey code are 1001 and 1000. Any change in a single bit in grey code will result in a much smaller change in the number, since only one bit change is needed to increment or decrement grey code.

There is another problem with normal grey code though: it’s cyclical. 0 is 0000, and 15 is 1000. (Though, 0010 is an equal distance from 1010, which is another problem.) That’s fine for repeating numbers like seconds or minutes, but for things like color, bright red shouldn’t be adjacent to black. Luckily, there are monotonic grey codes, where the number of bits on increases as the number increases, so it will be unlikely for a neural network to confuse a large number with a small number. Also, any n-bit monotonic grey code representation can be created using the Savage-Winkler algorithm, so we don’t have to worry about certain numbers. (Unfortunately, the algorithm seems to produce the entire sequence of numbers instead of mapping from binary to monotonic grey code, so it’ll be necessary to create a mapping function. I was thinking I could use a Karnaugh map for mappings of different representations in amounts of bits to find a general algorithm.)

Anyway, due to the locality of the representations of numbers with respect to nearby numbers, I believe grey codes could be used for neural network friendly large number representations, where the neurons can learn anything related each different number, while using the smallest number of columns possible. If a more redundant representation is needed, some of the bits could be repeated.

[For anyone interested in monotonic grey codes, a good description of them and the algorithm used to produce them can be found here.]

1 Like

I’ve decided, before I invest too much time into making a fast function for converting unsigned binary to monotonic grey code, I should probably test my hypothesis that monotonic grey codes perform better. For now, I’ll just use a file as a table to look up the 16 bit values.

I’ll be using four encoders and running the same test data through. (Hotgym, I guess? As well as whatever else I can find.) The encoders will be 16 bit monotonic grey code, with on-bits becoming on-columns, standard grey code, binary, and the scalar encoder that’s already in nupic.

Now I just have to figure out how to use nupic… :neutral_face:

I’m sure most of the people here have done that before. Any help setting up the tests would be appreciated. I shouldn’t need help making the encoder classes though.

Start here: GitHub - numenta/nupic-legacy: Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

Let me know how it goes.

Thanks!

I decided to use the example in the main nupic repository that used the temporal memory class directly. It worked fairly well

So far, it looks like monotonic grey code is much easier to predict than binary, but it’s still off by a large amount. However, whenever inputs are predicted, the input columns are correctly predicted fairly often, there’s just a problem of too many columns being predicted. This is seen the most when using pure binary, where nupic’s temporal memory class usually predicts the highest number its seen with all bits on… I wonder if inhibition on prediction would help.

Right now though, I still have to test reflected grey code, but I’m finishing up my visualizer so I can really see what’s going on.

I think you can affect this by lowering the numActiveColumnsPerInhArea spatial pooler parameter. (I looked this up from Spatial Pooling Parameter Descriptions)

Maybe, but doesn’t that limit the number of columns that can be predicted on? If it does that, then the temporal memory class wouldn’t predict numbers larger than a certain amount.

Are you trying to create a new type of scalar encoding or trying to represent color images? (In the cortex, the representation of color and grey scale images is pretty different from scalar RGB values.)

If you are working on new scalar encodings, you might want to take a look at the RandomDistributedScalarEncoder in nupic/encoders/random_distributed_scalar.py Perhaps the grey code idea is a better way to implement the RDSE, which uses a brute force technique.

It sounds like you are using the temporal memory directly, not the spatial pooler. The TM does require a fixed sparsity input representation, so it would not be desirable for the number of ON bits to change significantly. If you feed your code through the SP, it will create a fixed sparsity representation.

(BTW, in the SP I’m not sure I would recommend changing the numActiveColumnsPerInhArea parameter unless you really know what you are doing. A number of TM parameters, such as the thresholds, are dependent on it.)

3 Likes

Yup! I knew it would probably be pretty different, but from what I’ve seen when researching colors before here, it still seems to require a lot of scalars.

I’ll take a look at the RandomDistributedScalarEncoder. It seems like it could be similar to what monotonic grey code could produce.

I’m was using the temporal memory directly, since I didn’t want fixed sparsity. I wanted to see how well the algorithm would predict if just handed raw values. However, if I wanted a relatively fixed sparsity, I could limit the monotonic grey code to produce a certain range of sparsity. That wouldn’t be too bad if I could successfully work with around 50% sparsity. For example, with 16 bits, 16 choose 8 + 16 choose 9 gives me 24310 values to work with, while 2^16 gives me 65536 values. Would 50% sparsity work though?

If you want to deal with noise then 16 bits choose 8 could lead to a lot of errors. It will also be almost impossible for the TM to make multiple simultaneous predictions with 50% sparsity. Something like 200 bits choose 8 will be more robust in those respects and give you a lot more capacity.

In most NuPIC implementations we choose 40 out of 2048 bits. Note that the TM parameters we usually use assume something like 40 bits are on. If you only have 8 bits on, you will need to significantly decrease the thresholds.

2 Likes

Ah.

It might help a little if I allow more than one amount of sparsity, and do a summation, but not by too much for smaller values. Also, that would make working with multiple predictions harder.

Well, I’ll definitely fiddle with things as much as I can.

Edit: I wonder if I could restrict nupic to making only the most likely prediction, or if that would even be a good idea.

1 Like

Actually, you can get the entire probability distribution from the model’s output result, so NuPIC already does this.

1 Like

Just a little more background on what the guys have answered so far - it helps me sometimes to actually look at the code and see what’s going on - though I don’t know if that’s good for you?

Making “multiple simultaneous predictions” is not only a feature of the TM but is unavoidable because the multiple predictions come from classifier buckets representing each normalized input (normalized == similar inputs within a threshold are added to the same bucket).

You can see the statistics over each bucket by looking at the Classifier return value from its infer(...) function, where each bucket is available for examination:

See here for the actual code

retval:     dict containing inference results, one entry for each step in
            self.steps. The key is the number of steps, the value is an
            array containing the relative likelihood for each bucketIdx
            starting from bucketIdx 0.

            for example:

                  {
                   'actualValues': [0.0, 1.0, 2.0, 3.0]
                               1 : [0.1, 0.3, 0.2, 0.7]
                               4 : [0.2, 0.4, 0.3, 0.5]
                  }

Where 1 and 4 are steps ahead, and each entry is the vote or prediction for each of the 4 buckets. Each bucket’s “votes” are a result of statistics done over the number of times bits belonging to each bucket are “active” within the SDRs the Classifier is given.

@rhyolight @subutai @SimLeek Though I tend to phrase my verbiage in terms of absolutes - I do have a question as to what “multiple simultaneous predictions” within NuPIC refers to? Is it like I said, the Classifier field/bucket votes; or is that statement a reference to all the lateral presynaptic cells within the TM which have permanences above a given activation threshold? I believe it’s one or the other?

I assumed it meant getting not only the “best prediction” but a probability distribution of all potential future states.

1 Like

Although the classifier is used to retrieve multiple predictions, the issue is actually more fundamental. The temporal memory itself is always making multiple predictions. It learns all common sequence transitions, and then makes predictions based on that. For example, if it has seen a bunch of coin flips, and you are about to flip a coin, it will always predict both “heads” and “tails”. Multiple predictions are fundamental to the way it works, and a very powerful and useful property of the TM.

The classifier merely maps internal TM states to predictions and probabilities in a form that we can easily read out.

1 Like

@subutai Thanks! I had a suspicion that it was referring to the TM! That’s a question that was unclear for me for a long time, thanks…

1 Like

Hi all, I suddenly went down the “HTM Encoding with Grey Codes?” rabbit hole about a month ago, and ended up here:

thanks.

In computer coding Gray code is good for dealing with sensors where you need to sort between two adjacent codes. It is a "packed code and not really compatible with SDRs.

For SDRs we have the position of the bits representing a value like this:

With rods and cones we have mixed populations of cells that form the SDRs:
Rods%20-n-%20Cones%20to%20SDRs
In this case, the SDRs are composed of red, green and blue sensitive cells.

As far as the original poster’s concern about the number of columns required to code for colors - the brain deals with that problem by coding for color at a much lower spatial resolution than for brightness.

For example - in the old NTSC color television standards the color was added as a second lower resolution stream that was phase coded to indicate what color tint should be added to the luminance information. This is why early apple II text displays tended to look funky on TVs.

4 Likes

1 Like