How to reconstruct the SP algorithm?

Hi every one
I am looking for a code with which I can feed a specific dataset to the SP algorithm and then reconstruct the output of the SP and get the original data. For example, give one of the MNIST digits to SP, reconstruct the sparse output of the SP algorithm, and get the original input again.
Can anyone help me in this regard? Can you send me the code or the article and explain how to reconstruct the SP algorithm?
Thanks a lot

1 Like

reconstruct the output of the SP and get the original data

Sounds like the SDR Classifier is doing what you describe here.
https://nupic.docs.numenta.org/1.0.5.dev0/api/algorithms/classifiers.html#module-nupic.algorithms.sdr_classifier

If I’m not mistaken, the SDR Classifier is only doing classification, not reconstruction. That is it will only recover which digit category, not reconstruct the original image. What you are probably looking for is something like an autoencoder. I am currently unaware of any SDR autoencoders, but it sounds like a fun project.

2 Likes

Any reason not to keep both original digit images and corresponding generated SDRs in the same order?


And yes classifier does tell the category not the original data.


For approximative search… you can either compare your unknown SDR with all known SDRs or use an indexing method or associative memory

The bit-pair maps like diadic memory or sdr-id-map could work to a certain extent - the results depend on solidity of the SDR and how the bits are distributed.

A more reliable method would be to use an ANN (Approximate Nearest Neighbor) indexer accepting a overlap based distance function for SDRs. Like pynndescent

PS How big are your SDRs? (total bits AND % of 1s?)

PS Straight KNN search would work too but it’s quite sluggish.

2 Likes

no, I don’t think the SDR classifier does reconstruction.

I am looking for a code that can generate the original data from the SDR data, as shown below.
reconstructed
I would be grateful if you could send me a code or explain a little more about how to do this.

I hacked together this demo some time ago to demonstrate something similar to what I think you are trying to do. Let me know if you find it interesting or want to discuss it further.

Sparse encoding of MNIST digits using a simple dictionary lookup algorithm.

Atoms in the dictionary are initialized by sub-sampling from a set of random images in the training set. The samples are obtained by splitting the 28x28 pixel images into a 4x4 array of 7x7 pixel patches. Thereafter these atoms are used as an over-complete basis set to encode portions of subsequent images. The encoding selects the best atom by direct projection (dot product of image and basis atom) to obtain a correlation coefficient. The product of this coefficient and the basis atom is subtracted from the image leaving a residual. This residual is then subjected to the same procedure as before to select the next atom that best captures the image features that were not present in the first atom. This continues until the atom limit is reached or the magnitude of the residual falls below a minimum threshold. The reconstructed image is then displayed along with the residual.

NOTE: This demo is not currently learning or adapting the atoms after the initial sampling stage. This simple choice for the basis set yields some fairly impressive results which can best be appreciated by comparing them to the reconstructions that results if you enable the “random atoms” checkbox in the menu. A slightly better implementation would be to add new atoms only if the residual cannot be sufficiently captured by a linear combination of the existing atoms.

2 Likes

To “Reconstruct the SP algorithm” you could use the information of the connected synapses lying in the SP data structure. The first step would be to get the synapses with the function getConnectedSynapses.

Here is an example from the NuPIC Github repo:

#Spatial Pooler example from NuPIC Walkthrough
sp = SpatialPooler(inputDimensions=(15,),
                   columnDimensions=(4,),
                   potentialRadius=15,
                   numActiveColumnsPerInhArea=1,
                   globalInhibition=True,
                   synPermActiveInc=0.03,
                   potentialPct=1.0)

for column in xrange(4):
    connected = numpy.zeros((15,), dtype="int")
    sp.getConnectedSynapses(column, connected)
    print connected

To refer to your MNIST example:
The column parameter in getConnectedSynapses should now get all active “SDR bits” from the middle row of the image you posted. The connected bits from the original input are then in “connected”. With these you can draw conclusions which bits were active in the original.

If you get stuck or are looking for another solution, you can also show us your code

3 Likes

Hello @ CollinsEM,
It was interesting. Are there codes for this implementation? My goal is to have the input vector, the sparse vector, and the reconstructed vector performs multiple operations on them. Are there more explanations of this method on the white paper or website?

Thank you very much for your explanation @ Markus
But I still don’t fully understand how to feed an image to the SP code and get it reconstructed. If possible, can you explain more or send me the codes related to the reconstruction?

I got good enough results for me by just transposing the SP and running it backwards.

2 Likes

What I was proposing was to search the training sample that best matches your test (or query) sample.

E.G. :

  • Considering MNIST’s 60000 training digit images you use spatial pooler to generate 60000 corresponding SDRs.
  • You know/preserve the one-to-one correspondence between each digit image and its corresponding SDR in a let’s call it training table
  • When you come up with an unknown SDR, instead of trying to compute a matching image you search instead the best matching SDR in your training table and return the matching image.

Results might not be perfect but considering “sdr machinery” in general is based on approximation matching, and that a “reverse spatial pooler” would be approximative too, this might be an acceptable replacement of an end-to-end trained autoencoder.


That being said, if you want to test an actual “inverter”, you can try to use a NN like a single or multi layer perceptron (found it in sklearn python module) and train it to generate the original MNIST digits from their corresponding SDRs.

I don’t know how computationally expensive that would be but it should be an interesting experiment.

PS that way you should get an end-to-end autoencoder, encoding part being the spatial pooler and decoding one - the trained MLP

Ok, I’ve tinkered a minimal example here. I use here only a part of the training data (set trainSteps to a higher value to learn with more training data). My SpatialPooler parameters are also just arbitrary guesses. I’m sure you can get more out of it. I also don’t use test data yet. Only this small part of the training data. It’s just to show how it works with the MNIST images and the getConnectedSynapses function.

# MNIST & NuPIC / HTM Spatial Pooler


# Download MNIST dataset from here: http://yann.lecun.com/exdb/mnist/
# This files:
# train-images-idx3-ubyte.gz:  training set images (9912422 bytes)
# train-labels-idx1-ubyte.gz:  training set labels (28881 bytes)
# t10k-images-idx3-ubyte.gz:   test set images (1648877 bytes)
# t10k-labels-idx1-ubyte.gz:   test set labels (4542 bytes) 


# load training set images (60000 images, 28x28 pixel grayscale)
import gzip
f = gzip.open('train-images-idx3-ubyte.gz','r')

image_size = 28
num_images = 60000

import numpy as np
f.read(16)
buf = f.read(image_size * image_size * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = data.reshape(num_images, image_size, image_size, 1)

# load training set labels (digits corresponding to the order of the training
# set images. The images of the digits are in an unordered sequence)
f = gzip.open('train-labels-idx1-ubyte.gz','r')
f.read(8) # labels begin after the first 8 byte
buf = f.read()
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)

# setup NuPIC SpatialPooler
from nupic.algorithms.spatial_pooler import SpatialPooler

coldim1 = 30
coldim2 = 30

sp = SpatialPooler(inputDimensions=(28,28),
columnDimensions=(coldim1,coldim2),
potentialRadius=5,
potentialPct=0.1,
globalInhibition=False,
localAreaDensity=0.1,
numActiveColumnsPerInhArea=-5.0,
stimulusThreshold=5,
synPermInactiveDec=0.008,
synPermActiveInc=0.05,
synPermConnected=0.1,
minPctOverlapDutyCycle=0.001,
dutyCyclePeriod=1000,
boostStrength=0.9,
seed=-1,
spVerbosity=0,
wrapAround=True
)

outputSDR = np.zeros((coldim1*coldim2,), dtype="int")
trainSteps = 2000
for i in range(trainSteps):
	sp.compute(np.array(np.array(data[i].flatten(), dtype=bool), dtype=int), learn=True, activeArray=outputSDR)
	print("Progress: " + str( float(i) / float(trainSteps) * 100) + "%")

np.array(np.array(data[i].flatten(), dtype=bool), dtype=int)
np.array(data[i].flatten(), dtype=int)

np.array(np.array(data[0].flatten(), dtype=bool), dtype=int)
dat = np.array(np.array(data[14999].flatten(), dtype=bool), dtype=int)


# some lists to store the images
reconstrimages = []
originalimages = []
sdrimages = []

for reconstrIdx in range(10):
	inputDigit = np.array(np.array(data[reconstrIdx].flatten(), dtype=bool), dtype=int)
	originalimages.append(data[reconstrIdx].squeeze())
	outputSDR = np.zeros((coldim1*coldim2,), dtype="int")
	sp.compute(inputDigit, learn=False, activeArray=outputSDR)
	sdrimages.append(outputSDR.reshape(coldim1,coldim2))
	
	reconstr = np.zeros((784,), dtype="int")
	for colindex in xrange(coldim1*coldim2):
		if outputSDR[colindex] == 1:
			connected = np.zeros((784,), dtype="int")
			sp.getConnectedSynapses(colindex, connected)
			for i in range(len(connected)):
				if connected[i] == 1:
					reconstr[i] += 1
	
	arr = np.array(reconstr).reshape(28, 28)
	reconstrimages.append(np.asarray(np.array(arr, dtype=int)).squeeze())

import matplotlib.pyplot as plt
w = 10
h = 10
fig = plt.figure(figsize=(8, 5))
plotcolumns = 10
plotrows = 3
j = 0
for i in range(1, plotcolumns*plotrows +1):
	if i <= 10:
		img = originalimages[i-1]
	elif i > 10 and i <= 20:
		img = sdrimages[i-11]
	else:
		img = reconstrimages[i-21]
	fig.add_subplot(plotrows, plotcolumns, i)
	plt.imshow(img)

plt.show()

I played around a bit with the parameters and now it looks rather similar to the picture in the comment above that was mentioned. I also used global inhibition and hence the training is a lot faster.

# MNIST & NuPIC / HTM Spatial Pooler


# Download MNIST dataset from here: http://yann.lecun.com/exdb/mnist/
# This files:
# train-images-idx3-ubyte.gz:  training set images (9912422 bytes)
# train-labels-idx1-ubyte.gz:  training set labels (28881 bytes)
# t10k-images-idx3-ubyte.gz:   test set images (1648877 bytes)
# t10k-labels-idx1-ubyte.gz:   test set labels (4542 bytes) 


# load training set images (60000 images, 28x28 pixel grayscale)
import gzip
f = gzip.open('train-images-idx3-ubyte.gz','r')

image_size = 28
num_images = 60000

import numpy as np
f.read(16)
buf = f.read(image_size * image_size * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = data.reshape(num_images, image_size, image_size, 1)

# load training set labels (digits corresponding to the order of the training
# set images. The images of the digits are in an unordered sequence)
f = gzip.open('train-labels-idx1-ubyte.gz','r')
f.read(8) # labels begin after the first 8 byte
buf = f.read()
labels = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)

# setup NuPIC SpatialPooler
from nupic.algorithms.spatial_pooler import SpatialPooler

coldim1 = 37
coldim2 = 37

sp = SpatialPooler(inputDimensions=(28,28),
columnDimensions=(coldim1,coldim2),
potentialRadius=36,
potentialPct=1.0,
globalInhibition=True,
localAreaDensity=-1,
numActiveColumnsPerInhArea=8,
#stimulusThreshold=5,
synPermInactiveDec=0.008,
synPermActiveInc=0.05,
synPermConnected=0.1,
minPctOverlapDutyCycle=0.001,
dutyCyclePeriod=1000,
boostStrength=17,
seed=-1,
spVerbosity=0,
wrapAround=False
)

outputSDR = np.zeros((coldim1*coldim2,), dtype="int")
trainSteps = 100
epochs = 50
for j in range(epochs):
	for i in range(trainSteps):
		sp.compute(np.array(np.array(data[i].flatten(), dtype=bool), dtype=int), learn=True, activeArray=outputSDR)
		print("Progress: " + str( float(j) / float(epochs) * 100) + "%")

np.array(np.array(data[i].flatten(), dtype=bool), dtype=int)
np.array(data[i].flatten(), dtype=int)

np.array(np.array(data[0].flatten(), dtype=bool), dtype=int)
dat = np.array(np.array(data[14999].flatten(), dtype=bool), dtype=int)


# some lists to store the images
reconstrimages = []
originalimages = []
sdrimages = []

for reconstrIdx in range(10):
	inputDigit = np.array(np.array(data[reconstrIdx].flatten(), dtype=bool), dtype=int)
	originalimages.append(data[reconstrIdx].squeeze())
	outputSDR = np.zeros((coldim1*coldim2,), dtype="int")
	sp.compute(inputDigit, learn=False, activeArray=outputSDR)
	sdrimages.append(outputSDR.reshape(coldim1,coldim2))
	
	reconstr = np.zeros((784,), dtype="int")
	for colindex in xrange(coldim1*coldim2):
		if outputSDR[colindex] == 1:
			connected = np.zeros((784,), dtype="int")
			sp.getConnectedSynapses(colindex, connected)
			for i in range(len(connected)):
				if connected[i] == 1:
					reconstr[i] += 1
	
	arr = np.array(reconstr).reshape(28, 28)
	reconstrimages.append(np.asarray(np.array(arr, dtype=int)).squeeze())

import matplotlib.pyplot as plt
w = 10
h = 10
fig = plt.figure(figsize=(8, 5))
plotcolumns = 10
plotrows = 3
j = 0
for i in range(1, plotcolumns*plotrows +1):
	if i <= 10:
		img = originalimages[i-1]
	elif i > 10 and i <= 20:
		img = sdrimages[i-11]
	else:
		img = reconstrimages[i-21]
	fig.add_subplot(plotrows, plotcolumns, i)
	plt.imshow(img)

plt.show()

@katrin : May I ask where the picture mentioned above comes from? Is it from one of the Numenta Papers? I mean the picture with the subtitle “Fig. 8…”