I have found a way to decode SDRs in some cases.
Suppose you are encoding a scalar. Suppose you use 21 bits to encode each number.
Each number is shifted over by one in the encoder, so adjacent numbers are semantically similar. You now feed the encoded bits into the spatial pooler, and get an SDR. Suppose you would like to be able to do the reverse - get the original encoded value from the SDR.
One way to do it is this: (steps follow).
1. create a buffer the size of the encode buffer, but instead of bits, make it integers.
2. For every active column do:
3. For every synapse in that column that has permanence >= permThreshold Do:
4. lookup the input that feeds that synapse
5. increment the buffer-count for that input
The buffer will have many non-zero counts. Too many. Some will be larger than others.
So what you have to do is count all non-zero counts above a threshold. But what threshold
should you use? To find out, you do this:
go through the buffer looking for the max count.
Lets call it: "maximumValueOfAnyCountInBuffer". Then:
for percent = 100 down to 1
threshold = percent * maximumValueOfAnyCountInBuffer
count the number of values >= threshold
if that count >= 21 bits then
STOP.
At this point, you make all bits 1 that are >= threshold, and all other bits zero.
There are a few noise bits that are on in the resulting encoder (but are obvious outliers since this encoding requires contiguous input bits)
Some issues:
what happens if there is noise in the encoded SDR that you are trying to decode? (I have not tried that)
what happens when you use other types of encoder? (I have not tried that)
what happens when you use the temporal pooler, which has all sorts of complications, including having unions of several predicted possibilities?
So Iām not sure if this is of any practical use.
To give these values a numerical interpretation you could apply a probability-based activation function on top of this distribution: P(Vi | random sampled SDR), i.e. the probability of getting this much overlap by sampling an SDR at random. This would quantify the chance that this is a random match, and you could apply a hypothesis test (P < 0.001 or something) to the lowest such probability (the best match), and youād have a measure of confidence in your match.
And further, if you have an idea of the noise distribution N your SDRs are subject to, then you could score the possibilities by P(X | Ei, N), i.e. the probability of degrading Ei into X given noise N.
Jake:
One question occurred to me on your method. Suppose there is a tie. Lets take a case where there are 3 bits per pattern. if bits 4,5,6 are on together, we will say that we have encoded the number ā55ā. If bits 5,6,7 are on together, that stands for the number ā56ā. And suppose, after projecting the columns backwards to the encoder, the bits 4,5,6,7 are on. Now we donāt know which of the two patterns we have. They both overlap by the same amount (in this case 100%).
My method used āvotesā. So if 10 columns together say that bit ā4ā is on, but only 2 columns say that ābitā 7 is on, then my method would choose ā4,5,6ā as the pattern. My question is, in cases of ātiesā in your method, would it make sense to use āvotesā to say one of the two patterns was more probable than the other?
I coded your method, and my method is much better, assuming I got your meaning correct, which is that you just set bits on, without taking any votes, and then try to find what patterns overlap the most. However, I will look into your idea of getting a measure of confidence.
This might be a good alternative to the existing decoding methods that Numenta uses.
It generalizes well, and the average error is low (though I notice when I encode numbers and then decode them back that numbers at the far ends of the range do not decode as well), and you can change many of the active columns and still get the correct answer.
Pretty cool.
@gidmeister Iād like to suggest a small alteration to your method: After step five, instead of dealing with thresholds and percentages, how about simply sorting the inputs in descending order by their buffer-counts and then taking the first 21 of those?
Here is the source code to decode SDRs, with your suggested changes, minus the probability distribution, which I have to look at. Its in VB.net, because I rewrote the spatial pooler in that language. Maybe sometime in future Iāll try it on the temporal pooler. Average error when using 2 buckets per scalar number is 0.4.
Sub DecodeSDR(ByRef maxValue As Integer, ByRef minValue As Integer, ByRef overlapscore As Double, ByRef displaystring As String, ByRef recreateBoolBuf() As Boolean,
ByRef firstbit As Integer, ByRef lastbit As Integer, ByRef bestguess As Double, ByRef percent As Double, ByRef countEntries As Integer)
Dim ul As Integer
Dim i As Integer
Dim startindex As Integer
Dim votebuffer() As Integer = Nothing
Dim vote2buffer() As Integer
ClassConstants.sp.voteEncoder(votebuffer)
ul = votebuffer.Length - 1
ReDim vote2buffer(ul)
For i = 0 To ul
vote2buffer(i) = votebuffer(i)
Next
Array.Sort(vote2buffer)
startindex = vote2buffer.Length - classEncoder.bitsPerPattern
minValue = vote2buffer(startindex)
maxValue = vote2buffer(ul)
ClassUtils.DisplayIntegerBufferWithMinCutoff(votebuffer, 30, minValue, displaystring, recreateBoolBuf, countEntries)
classEncoder.findBestMatch(recreateBoolBuf, bestguess, overlapscore, firstbit, lastbit)
End Sub
' **********************
Public Sub voteEncoder(ByRef votebuffer() As Integer)
Dim i As Integer
Dim j As Integer
Dim c As classColumn
Dim s As classSynapse
Dim ul As Integer = classEncoder.encodedInput.Length - 1
ReDim votebuffer(ul)
For i = 0 To ul
votebuffer(i) = 0
Next
For i = 0 To activeColumnIndices.Count - 1
c = columns(activeColumnIndices(i))
For j = 0 To c.pDendrite.Count - 1
s = c.pDendrite(j)
With s
If .permanence >= ClassConstants.connectedPerm Then
votebuffer(.sourceInputIndex) = votebuffer(.sourceInputIndex) + 1
End If
End With
Next
Next
End Sub
' **********************
Public Shared Sub DisplayIntegerBufferWithMinCutoff(ByRef intbuf() As Integer, ByVal wrapPoint As Integer, ByVal MinCutoff As Integer,
ByRef displayString As String, ByRef encbuf() As Boolean, ByRef countEntries As Integer)
Dim i As Integer
Dim j As Integer
Dim sb As New StringBuilder("")
Dim sb2 As New StringBuilder("")
Dim sbAll As New StringBuilder("")
Dim firstwrap As Boolean = True
Dim stri As String
Dim lenStrI As Integer
Dim k As Integer
Dim strK As String = ""
ReDim encbuf(intbuf.Length - 1)
countEntries = 0
If intbuf.Length = 0 Then
displayString = "empty"
Exit Sub
End If
j = 0
For i = 0 To intbuf.Length - 1
encbuf(i) = False
If j > 0 Then
sb.Append(",")
sb2.Append(",")
End If
sb2.Append("(" & i & ")")
stri = i.ToString
lenStrI = stri.Length
strK = ""
For k = 2 To lenStrI
strK = strK & " "
Next
If intbuf(i) >= MinCutoff Then
countEntries = countEntries + 1
sb.Append("[*")
sb.Append(strK)
sb.Append("]")
encbuf(i) = True ' key part
Else
sb.Append("[ ")
sb.Append(strK)
sb.Append("]")
End If
j = j + 1
If j = wrapPoint Then
If Not firstwrap Then
sbAll.AppendLine("")
End If
j = 0
firstwrap = False
sbAll.AppendLine(sb2.ToString & Environment.NewLine & sb.ToString)
sb2.Clear()
sb.Clear()
End If
Next
If sb.Length > 0 Then
sbAll.AppendLine("")
sbAll.AppendLine(sb2.ToString & Environment.NewLine & sb.ToString)
End If
displayString = sbAll.ToString
End Sub
' **********************
Public Shared Sub findBestMatch(ByRef recreateBoolBuf() As Boolean, ByRef bestguess As Double, ByRef overlapscore As Double, ByRef firstbit As Integer, ByRef lastbit As Integer)
Dim i As Integer
Dim j As Integer
Dim tempoverlap As Double
Dim testUpToIndex As Integer
Dim tempindex As Integer
Dim ul As Integer
Dim guesses As New List(Of Double)
Dim scores As New List(Of Double)
ul = recreateBoolBuf.GetUpperBound(0)
For i = 0 To ul
If recreateBoolBuf(i) Then
firstbit = i
Exit For
End If
Next
For i = ul To 0 Step -1
If recreateBoolBuf(i) Then
lastbit = i
Exit For
End If
Next
If firstbit > lastbit Then
tempindex = lastbit
lastbit = firstbit
firstbit = tempindex
End If
testUpToIndex = lastbit - bitsPerPattern + 1
For i = firstbit To testUpToIndex
tempoverlap = 0
For j = i To i + bitsPerPattern - 1
If recreateBoolBuf(j) Then
tempoverlap = tempoverlap + 1
End If
Next
guesses.Add(decodePatternStartingAtIndex(i))
scores.Add(tempoverlap)
Next
For i = 0 To guesses.Count - 1
If i = 0 Then
bestguess = guesses(i)
overlapscore = scores(i)
Continue For
End If
If overlapscore < scores(i) Then
bestguess = guesses(i)
overlapscore = scores(i)
End If
Next
End Sub
@gidmeister I havenāt coded in BASIC for many years, but if I understand your logic, in DisplayIntegerBufferWithMinCutoff(), youāre converting an integer representation, intbuf(), into a Boolean representation, encbuf(), using a threshold.
If you change encbuf() and recreateBoolBuf() from Boolean to Integer, then replace āencbuf(i) = Falseā with āencbuf(i) = 0ā and āencbuf(i) = Trueā with āencbuf(i) = intbuf(i)ā, then in findBestMatch() replace ātempoverlap = tempoverlap + 1ā with ātempoverlap = tempoverlap + recreateBoolBuf(j)ā, that might give you a suitable probability distribution and better accuracy.
After this you could also see what happens if you remove the threshold altogether. It would be interesting to see a comparison of results from these different variations.
By the way, that was a nice pun on my username. Lol.
I like this discussion. Here is how NuPIC handles the encode/decode interface. We have an Encoder base class, and subclasses must provide an implementation of encodeIntoArray and (optionally) decode. Here are NuPICās ScalarEncoder implementations of encodeIntoArray and decode.
You might notice that the CoordinateEncoder does not provide a decode function. That means we canāt get predictions of future values. We can still do anomaly detection, however.
Matt: Why is there no decode function for the Coordinate Encoder? It is impossible to go back from a deterministic hash function?
Iāll try out the latest idea from āEntropyā also.
I think it just was not a priority. We wanted to prove out the Coordinate encoder with anomaly detection, and that worked really pretty well. But it was all just an example of a potential HTM application. Iām not entirely familiar with the inner workings of the Coordinate Encoder (although it does not seem all that complex). Could be someone anyone could work on.
Looking at the various types of encoders, I realized my method is limited.
First of all, my method is an āSDR classifierā, not a ādecoderā. I used the word ādecoderā because I thought that any process going backwards is a decoder:
There are 2 SDRs in the spatial pooler. The first one is produced by the encoder. That first SDR is fed into the spatial pooler to make the second SDR.
If you encoded a scalar, then going backwards from the encoder to the scalar could be called ādecodingā. But going back from the spatial pooler SDR is not called ādecodingā at least not by Numenta.
This raises a problem. Some encoders use hash functions. Hash functions are āmany to oneā. So if you have an encoder such as the coordinate encoder, and then try to reverse it, Iām not sure you would be able to.
Suppose we take the coordinate encoder as an example. Using my method you take columns in the spatial pooler as input, and produce some bits in the encoder. But then what? If you canāt decode those bits to the original coordinates then the method is useless.
Furthermore, there would be noise (some bits on in the encoder that should not be, and some off that should be on). I wonder what that would do with trying to decode a encoder that depends on hash functions.
In case anyone has ideas to salvage the method, let me know.
If I encode the SDR classifier in Python, I need access to the synapses and permanences of Nupic in the routine. My classifier canāt work just by having a list of columns that are on or off. Would I have access to the various data-structures?
I was thinking about your encoders (date encoders, coordinate encoders, GPS encoders ā¦), and for my purposes, there always has to be a ādecodeā method. Some of them have it, some of them do not. But in practice, any encoder that has a hash function cannot have a ādecodeā method (I think).
So I then started thinking why people use hash functions, and it seems that the only reason is in situations when you have a limited number of bits in the encoder, but you have too many numbers to represent separately. So you hash those numbers, which is a āmany to oneā operation, which means you do get collisions sometimes. And that rules out decodersā¦
So this means that when you make SDR classifiers, you need to either use statistics, or a neural net (that you train) to go from the list of active columns directly to the value those columns represent. So since my method doesnāt do that (it goes indirectly first from the list of active columns to the bits in the encoder, and secondly from those encoder bits to the original value), I had to come up with an alternative to hash functions. And that leads to question 3:
For my alternative to hash functions, my initial idea was to concatenate two encoders (the way you handle dates in the date-encoder). The first encoder might be a scalar encoder from zero to 99, and the second would be a log encoder. The first encoder would use the simple mathematics MOD function, so 101 would become 1. That initial idea had problems, but it did allow for an estimate of magnitude. Then I realized you could do something a little more clever. You could keep the idea of a MOD 99 encoder, but you could also encode the start of the range separately (as a log of 10).
So for 0 - 99, the start is 0, for 100 - 199, the start is 1, for 200-299 the start is 2. So this is a separate encoder, but just like in the date-encoder, you can concatenate encoders. It seems by this method that you can get a big range out of a relatively small number of bits, and not miss accuracy either. Iām not completely sure this works, it seems like getting something for nothing, but I will try it out. Does this seem like a valid way to go?
Thanks
No, I think youād only have access to predicted cells.
Oh sure, Iām not trying to convince you that youāre wrong, just providing more resources. Iāve always said that encoders is a huge area for innovative thinking in HTM. There are many things that could work, and I encourage you to experiment and continue posting your results for the community. Thanks for what you have come up with so far. I donāt have time to test it out (working on more videos), but perhaps some of our community members can try.
To: Entropy
Here is a probability distribution based on your 2 suggestions for improvements. It has an interesting shape. The value encoded was 44, it was decoded to 43 (but Iām using 1 bucket per bit, which is the least accurate). This is my last post for this thread, but I will hopefully try the algorithm out on more complicated encoders, and if I have the time, on the temporal pooler. Maybe it can be used in practice for anomaly detection, but I certainly canāt recode it into nupic.