Introduction to Autoencoders for Beginners
an introduction
Autoencoders are unstructured learning models that use the power of neural networks to perform an analog learning task. In the context of machine learning, representational learning means including components and features of the original data in some lowdimensional structure in order to better understand, visualize and extract useful information. These low dimensional vectors can help us to get amazing information about our data like how close two instances are to the data set, finding the structure and patterns in the data set, etc.
Table of contents
 The current scenario of the industry
 Learning with unlabeled data
 Introduction to Autoencoders
 History of bot programmers in the papers
 Introduction to autovariable encoders
 VAE variables
 Autoencoders app
 conclusion
The current scenario of the industry
In this age of big data, where petabytes of data are generated and processed by leading social networking sites and ecommerce giants, we live in a world of data glut. Our machine learning algorithms have mainly exploited rare and expensive labeled data sets. Most of the data generated is unstructured and untagged, so it’s time for our machine learning community to focus on unsupervised and not just supervised learning algorithms to unlock the true potential of AI and machine learning.
“If intelligence is a cake, then unsupervised learning will be the cake, supervised learning will be the icing on the cake, and reinforcement learning will be the cherry on the cake” – Yann LeCunn
So why not eat the whole cake?
Learning with unlabeled data
Data representation is really a mapping. If we have a data point x ∈ X, and we have a function f: X → Z for some data space Z, then f is a representation. The new point f(x) = z ∈ Z is sometimes called the x representation. Good acting makes the final tasks easier.
Introduction to Autoencoders
Autoencoders, also known as selfencoders, are networks trained to reproduce their own inputs. They fall under the category of unsupervised learning algorithms, in fact, some researchers suggest autoencoders as selfsupervised algorithms as in the x training example, the label is x itself. But in general, it is considered uncensored as there are no hashtags or regression.
If the autoencoder does this perfectly, the output vector x` is equal to the input vector x. The autoencoder is designed as a special twopart structure, which is the encoder and the decoder.
AE = decoder (encoder (x))
Train the model using a reconstruction loss that aims to reduce the difference between x and x’. We can define the reconstruction loss as something like MSE(x,x’) if the input is real value. The z dimensions are usually less than x, which is why autoencoders are also called bottleneck neural networks. We impose a compressed cognitive representation.
The encoder is used to map invisible data to lowdimensional Z and a good representation always focuses on the point where no important information was lost during compression. Autoencoders are quite similar to Principal Component Analysis (PCA) which is itself a dimensionality reduction algorithm, but the difference is that PCA is linear in nature, while autoencoders are nonlinear in nature due to the neural network based architecture. To better understand the latent space we can use an example where the observed variable x can be something like (number of people at the beach, ice cream sales, daily temperature) while the zlatent space can be something like the slope inward of the Eath axis (ie season of the year) Because using the season information we can roughly predict the number of visitors at the beach, ice cream sales, etc.
History of bot programmers in the papers
Here are the research papers that are the first of a few articles that introduce AEs into the world of machine learning:
 Boltzmann Hardware Learning Algorithm, D. H. Ackley, GEHinton, T. J. Sejnowski. Cognitive Science, 1985. Describes a simple neural network trained with selfsupervision.
 Learning representations by propagating errors, d. Rummelhart, Jeffrey E. Hinton, RJ Williams. Nature, 1986. “We describe a new learning procedure, reverse propagation, for networks of neuronlike units.”
 Associative Learning Procedures, G.E. Hinton. Machine Learning, 1990. Describes a “selfsupervised” neural network.
Introduction to autovariable encoders
Variable autoencoders are autoencoders that exploit KullbackLeiber sampling and regulation technology. Variable autoencoders are intended to make the latent space more smooth, i.e. a small change in x will result in a small change in z latent space and a small change in z will result in a small change in x. The underlying space has to be smooth with reasonable points to be more effective and accurate and that’s what VAE is trying to achieve. In VAE, the encoder not only outputs z, but mo And sigma. Next, the sampling process chooses z from these parameters and as usual takes the decoder z as before.
A good sampling technique provides a good reconstruction of the data points, as well as a good reconstruction of the points close to the data points. The process ensures that every point is close to the latent location where you encrypted [the input x, ie z mean] It can be decoded into something similar to [x], thus forcing the latent space to be continuously meaningful. Any two close points in the latent space will decode very similar images. Continuity, combined with the lower dimensionality of the latent space, forces each direction in the latent space to encode a meaningful variance axis for the data, making the latent space highly structured and thus highly suitable for manipulation across concept vectors. The pseudosampling code is detected below:
z_mean, z_log_variance = encoder(x) z = z_mean + exp(z_log_variance) * epsilon x_hat = decoder(z) model = Model(x, x_hat)
in VAE, We want the data to be distributed as normal, in 𝑧 space. In particular, a multivariate normal standard, 𝑁(0,1). When using the decoder, we can be confident that all these points Correspond to a typical point. There are no “holes” because the encoder Hard work to compress the data, so as not to waste space.
group
“VAE parameters are trained by two loss functions: a refactoring loss that forces disjointed samples to match the initial input, and a regularization loss that helps learn wellformed latent spaces and reduce overallocation of training data.” – Chollet.
The loss of organization requires the encoder to place the data in a normal distribution in the latent space.
KullbackLeibler Difference
The KullbackLeibler divergence KL (𝑝  𝑞) is a statistical measure of the difference between a pair of distributions 𝑝 and . So it is a large number when and is alike, and approaches zero when they are alike.
KL loss is an adjustment term in our VAE loss. As always, we can adjust the regularization by multiplying KL by scalar. If it is too strong, our model will collapse, and if it is too weak it will be equivalent to classic AE.
Remember: VAE = AE + sampling + KL loss
1. Samples Procedure for sampling from a normal multivariate 𝑁 (𝜇, 𝛴) to obtain each point z.
2. The regulation procedure adds a loss to the underlying payment The distribution is similar to that of a standard multivariate normal 𝑁(0,1).
3. Usually, Dim (𝑧) is usually small compared to dim (𝑥). How small? “as small as possible”, without also increasing the reconstruction error. As always, it depends on the downstream mission. If dim (𝑧) = 2, we can visualize this easily. But this is usually very extreme: Training data cannot be well reconstructed Many. If dim (𝑧) > 2, we may use 𝑡SNE or UMAP for visualization.
VAE variables
 BetaVAE (stronger regulation of nonsynaptic representation).

Contractual AE (aimed at smoothness with different regulation).

Conditional VAE (decoder maps (𝑧, 𝑐) → 𝑥 , where is selected, for example 𝑐 specifies the number to be generated and specifies the pattern
Reference: Keras Autoencoder
Automated Chiller Applications
 Image noise reduction (Conv AE)
 Time Series Anomaly Detection (1D Conv AE)

Network intrusion detection by anomaly detection (VAE encoder only)

Generate video game and music levels (Conv VAE Decoder only).
conclusion
Finally, I feel that autoencoders and autovariable encoders are one of the most powerful unsupervised learning techniques that every data scientist should be aware of. Although these models have their own limitations, they require relatively larger data sets for training, etc.
The media described in this article is not owned by Analytics Vidhya and is used at the author’s discretion.