Autoencoders are unstructured learning models that use the power of neural networks to perform an analog learning task. In the context of machine learning, representational learning means including components and features of the original data in some low-dimensional structure in order to better understand, visualize and extract useful information. These low dimensional vectors can help us to get amazing information about our data like how close two instances are to the data set, finding the structure and patterns in the data set, etc.
Table of contents
- The current scenario of the industry
- Learning with unlabeled data
- Introduction to Autoencoders
- History of bot programmers in the papers
- Introduction to auto-variable encoders
- VAE variables
- Autoencoders app
The current scenario of the industry
In this age of big data, where petabytes of data are generated and processed by leading social networking sites and e-commerce giants, we live in a world of data glut. Our machine learning algorithms have mainly exploited rare and expensive labeled data sets. Most of the data generated is unstructured and untagged, so it’s time for our machine learning community to focus on unsupervised and not just supervised learning algorithms to unlock the true potential of AI and machine learning.
“If intelligence is a cake, then unsupervised learning will be the cake, supervised learning will be the icing on the cake, and reinforcement learning will be the cherry on the cake” – Yann LeCunn
So why not eat the whole cake?
Learning with unlabeled data
Data representation is really a mapping. If we have a data point x ∈ X, and we have a function f: X → Z for some data space Z, then f is a representation. The new point f(x) = z ∈ Z is sometimes called the x representation. Good acting makes the final tasks easier.
Introduction to Autoencoders
Autoencoders, also known as self-encoders, are networks trained to reproduce their own inputs. They fall under the category of unsupervised learning algorithms, in fact, some researchers suggest autoencoders as self-supervised algorithms as in the x training example, the label is x itself. But in general, it is considered uncensored as there are no hashtags or regression.
If the autoencoder does this perfectly, the output vector x` is equal to the input vector x. The autoencoder is designed as a special two-part structure, which is the encoder and the decoder.
AE = decoder (encoder (x))
Train the model using a reconstruction loss that aims to reduce the difference between x and x’. We can define the reconstruction loss as something like MSE(x,x’) if the input is real value. The z dimensions are usually less than x, which is why autoencoders are also called bottleneck neural networks. We impose a compressed cognitive representation.
The encoder is used to map invisible data to low-dimensional Z and a good representation always focuses on the point where no important information was lost during compression. Autoencoders are quite similar to Principal Component Analysis (PCA) which is itself a dimensionality reduction algorithm, but the difference is that PCA is linear in nature, while autoencoders are nonlinear in nature due to the neural network based architecture. To better understand the latent space we can use an example where the observed variable x can be something like (number of people at the beach, ice cream sales, daily temperature) while the z-latent space can be something like the slope inward of the Eath axis (ie season of the year) Because using the season information we can roughly predict the number of visitors at the beach, ice cream sales, etc.
History of bot programmers in the papers
Here are the research papers that are the first of a few articles that introduce AEs into the world of machine learning:
- Boltzmann Hardware Learning Algorithm, D. H. Ackley, GEHinton, T. J. Sejnowski. Cognitive Science, 1985. Describes a simple neural network trained with self-supervision.
- Learning representations by propagating errors, d. Rummelhart, Jeffrey E. Hinton, RJ Williams. Nature, 1986. “We describe a new learning procedure, reverse propagation, for networks of neuron-like units.”
- Associative Learning Procedures, G.E. Hinton. Machine Learning, 1990. Describes a “self-supervised” neural network.
Introduction to auto-variable encoders
Variable autoencoders are autoencoders that exploit Kullback-Leiber sampling and regulation technology. Variable autoencoders are intended to make the latent space more smooth, i.e. a small change in x will result in a small change in z latent space and a small change in z will result in a small change in x. The underlying space has to be smooth with reasonable points to be more effective and accurate and that’s what VAE is trying to achieve. In VAE, the encoder not only outputs z, but mo And sigma. Next, the sampling process chooses z from these parameters and as usual takes the decoder z as before.
A good sampling technique provides a good reconstruction of the data points, as well as a good reconstruction of the points close to the data points. The process ensures that every point is close to the latent location where you encrypted [the input x, ie z mean] It can be decoded into something similar to [x], thus forcing the latent space to be continuously meaningful. Any two close points in the latent space will decode very similar images. Continuity, combined with the lower dimensionality of the latent space, forces each direction in the latent space to encode a meaningful variance axis for the data, making the latent space highly structured and thus highly suitable for manipulation across concept vectors. The pseudo-sampling code is detected below:
z_mean, z_log_variance = encoder(x) z = z_mean + exp(z_log_variance) * epsilon x_hat = decoder(z) model = Model(x, x_hat)
in VAE, We want the data to be distributed as normal, in 𝑧 space. In particular, a multivariate normal standard, 𝑁(0,1). When using the decoder, we can be confident that all these points Correspond to a typical point. There are no “holes” because the encoder Hard work to compress the data, so as not to waste space.
“VAE parameters are trained by two loss functions: a refactoring loss that forces disjointed samples to match the initial input, and a regularization loss that helps learn well-formed latent spaces and reduce over-allocation of training data.” – Chollet.
The loss of organization requires the encoder to place the data in a normal distribution in the latent space.
The Kullback-Leibler divergence KL (𝑝 || 𝑞) is a statistical measure of the difference between a pair of distributions 𝑝 and . So it is a large number when and is alike, and approaches zero when they are alike.
KL loss is an adjustment term in our VAE loss. As always, we can adjust the regularization by multiplying KL by scalar. If it is too strong, our model will collapse, and if it is too weak it will be equivalent to classic AE.
Remember: VAE = AE + sampling + KL loss
1. Samples Procedure for sampling from a normal multivariate 𝑁 (𝜇, 𝛴) to obtain each point z.
2. The regulation procedure adds a loss to the underlying payment The distribution is similar to that of a standard multivariate normal 𝑁(0,1).
3. Usually, Dim (𝑧) is usually small compared to dim (𝑥). How small? “as small as possible”, without also increasing the reconstruction error. As always, it depends on the downstream mission. If dim (𝑧) = 2, we can visualize this easily. But this is usually very extreme: Training data cannot be well reconstructed Many. If dim (𝑧) > 2, we may use 𝑡-SNE or UMAP for visualization.
- BetaVAE (stronger regulation of non-synaptic representation).
Contractual AE (aimed at smoothness with different regulation).
Conditional VAE (decoder maps (𝑧, 𝑐) → 𝑥 , where is selected, for example 𝑐 specifies the number to be generated and specifies the pattern
Reference: Keras Autoencoder
Automated Chiller Applications
- Image noise reduction (Conv AE)
- Time Series Anomaly Detection (1D Conv AE)
Network intrusion detection by anomaly detection (VAE encoder only)
Generate video game and music levels (Conv VAE Decoder only).
Finally, I feel that autoencoders and auto-variable encoders are one of the most powerful unsupervised learning techniques that every data scientist should be aware of. Although these models have their own limitations, they require relatively larger data sets for training, etc.
The media described in this article is not owned by Analytics Vidhya and is used at the author’s discretion.