Deep Generative Models

I was looking more into ML models and came across these slides on deep generative models. My impression is that some members here may not be aware of the difference in the latest DNN compared with the early CNN. Even though this is from 2017 we can see how much broader DNN has become.

If anyone knows of a more recent and/or better presentation on deep generative models then please let us all know.


I’ve been doing a lot with GANs for my job, so I can definitely vouch for their impressive capabilities. I’ll see if I can find some more recent presentations to post here.

This one hits on a wide range of topics, as a summary from one observer’s perspective of the 2021 Deep Learning 2.0 Summit, and it has links to a lot of references. (more broadly about deep learning, not only generative networks)

This here is a relatively recent use-case that is one of the more impressive examples (IMO). Basically lets you just make up photo-realistic images of people who do not exist (not a presentation, just demonstrating how impressive generative models can be)

A paper on a more complex generative model for multi-agent behavior (also not a presentation, but again demonstrating impressive capabilities)

*(so far these are not exactly what the OP is asking for, but supporting the point that the latest in deep learning is way more capable than early CNN)

1 Like

(This is not exactly about a presentation but an interesting movement in deep learning regarding generative models):
Implicit neural representations is the new thing that’s getting a lot of acceleration as well.
Maybe they’re not deep generative models in a sense that they don’t interpolate between data points from different objects(or scenes) in the latent space to generate novel samples but they do have some remarkable rendering capabilities.
There are deep learning models that get coordinates in the physical space as input and renders 2D or 3D scnes by forming continuous representation of that physical space.
NeRF for example gets positions and view directions a long a ray to render a 3D scene pixel by pixel and it’s known that with less memory requirement than the actual images for training combined, successfully renders a 3D scene with a continuous representation by training from 2D images from different views regardless of if they’re synthetic or real.
After the rapid research around this model, now NeRF is not just a single model, rather it’s a new paradigm for rendering 3D or 4D(3D + time) scenes.
By the way, the presentation on the original NeRF model is here:

1 Like

One of the most powerful features of these generative models is in data augmentation IMO. it provides a great way to harness unlabelled image data (which is plenty) and use it to increase the accuracy of supervised models.

Just one of the things I am doing is to have GAN’s trained on unlabelled data to further increase unlabelled data to be used in self-training of a supervised model. I expect an accuracy boost of about 3-4% in respect to the results obtained by researchers. it shows how useful they can be in real world tasks, as well as their importance in almost all fields of DL.


(Yet another paradigm, not exactly about a presentation. Thanks for posting this amazing idea of discussing deep generative models, BTW!)

An interesting fact popped up in my head.

There’s something called VQ-VAE(Vector Quantised-Variational AutoEncoder).
Their key idea is to have some sort of code-book and only use the vectors present in it for the latent representation.
To this end, the algorithm picks the closest vector in the code-book from the original encoded vector.
(by argmin… Do you see the resemblance to k-winner-take-all? :thinking:)

I was thinking about it and I couldn’t help but feel something very familiar… then it struck me:
This vector quantization concept is very analogous to the SDR representation except it’s for deep learning.

They have some very similar properties as a consequence:

  • VQ only allows combinations of the code-book vectors. → SDR only allows specific combinations of sparsely active mini-columns.
    • Which forces it to define very limited (but robust) manifold compared to the original (N-D or N-bit) space.
  • Both representations make the inputs to be almost always transformed to settle to a point on the said manifolds which makes them really robust.
  • With the well-defined mainfolds, it’s very efficient to learn something useful really fast.
    • TM can even do few-shot learning!
    • VQ enables fast convergence.
  • The list goes on…

Another interesting fact:
VQ suffers from different code-book vectors representing almost the same thing and having dead vectors meaning they hardly get used.
And this is extremely similar to what SP suffers from as well!
SP alleviates this problem by boosting and the DL community has come up with something to resolve this as well.
Maybe boosting would also work for VQ as well? :thinking:

I want to hear what the HTM community has to say! :smiley:


VQ is very similar to the Sparse and Redundant Representation algorithms that I have been learning about (MOD, K-SVD). The redundant code-words (atoms, filters) in the code-book are a feature, not a bug. They allow you to capture a wide variety of potential inputs using a linear combination of a very small number of code-words.

While there are ways to minimize the size of the code-book by increasing the distance between code-words (removing redundant filters), you do so at the expense of potentially needing to use additional code-words to accurately capture some of the more subtle perturbations (residuals) from the mean representations in the code-book. In essence you need to generate additional code-words (basis functions) to encode the space around the primary filters.

My demo of the Orthogonal Matching Pursuit algorithm is an initial step towards trying to figure out how to incorporate these encoding schemes into a Temporal Memory implementation.