Stacked denoising autoencoders

It’s a public forum, for whatever person to read.
And so there is this sort of denoising autoencoder:!topic/
I actually have to read up on what you can do with denoising autoencoders, but cool.

The HTM paper suggests that HTM doesn’t work so well with high dimensional data. This aligns with feedback I’ve received on other parts of the forum.

That said - I would be interested to know how bad it is, with some actual numbers.

If you end up comparing auto-encoders ( or any other DL technique ) with HTM, please publish those numbers here with the data set you used :slight_smile:

This might also give everyone a useful launching point for subsequent research.

I mainly know about the neuron/synapse related papers they have produced. Which are a valuable contribution to science.
The HTM side of things seems to be based on sparse bit vectors but I could be very wrong about that.
The networks I have are real valued and highly dimensional. They have been out the the public domain for around 15 years. Somebody else could have seized on them in that time and really pushed forward with them. For example by running the algorithms on application specific integrated circuits, or on field programmable gate arrays. Who knows?
Anyway I am sure comparing the 2 approaches would be like comparing apples and oranges.

As I said I (personally) have only just figured out that random projection neural nets can be used as denoising autoencoders. That creates a lot of possibilities to investigate and hopefully I can do some things like colorizing gray scale images etc.

Cool. I’m very curious…what are these data sets for which you are referring ? Do you have download links ? I come at autoencoders from the DL side - so I use them for computer vision and acoustic modeling ( I can provide those datasets to you if interested… )

I’m using 256*256 color images. I believe that if you used Mathematica you could convert the Basic code in a few minutes. The only things you need are the Walsh Hadamard transform (WHT), and (recomputable) random sign flipping and random permutation.
I can get about 5000 65536-point WHT’s per second on a CPU. If Mathematica has a GPU based WHT algorithm then maybe you can get 10 by or 50 by that figure. Actually the rate limiting step is likely to be the random permutation because of the extremely messy memory accesses hammering the CPU/GPU caches. You can just use random sign flipping on its own in many cases if you have a need for speed.
Anyway, I have some new territory to explore.

So it sounds like the experiment we want to do is to:

Compare the following FT-like feature extractors for performance:

Some other attributes of this study:

  • Performance is measured two fold 1) accuracy in a task and 2) speed

  • We could choose a task like image classification (MNIST) or even speaker/voice recognition on a standard voice dataset. We would have to decide on a classifier for all ( I vote SVM ).

  • We could have GPU and non-GPU versions of feature extraction and measure that performance.

  • For extra credit, we should compare some of the state-of-the-art end-to-end recognition pipelines such as Baidu’s speech API or MXNET convolutional nets.

Good plan!

You can find some WHT code here:

or even:

'Walsh Hadamard transform.  Array must have 2^n elements (2,4,8,16....)
sub wht(x() as single)
	dim as ulongint hs=1,n=ubound(x)+1
	dim as single sc=1.0/sqr(n)
	if (n and (n-1))<>0 then error(1)	'x must have 2^n elements
	while hs<n
		var i=0ULL
		while i<n
			var k=i+hs
			while i<k
				var a=x(i)
				var b=x(i+hs)
	for i as ulongint =0 to n-1
end sub