# Feed Forward Neural Networks – Intuition in Forward Propagation

Neural networks can be thought of as a function that can map between inputs and outputs. In theory, no matter how complex this function is, neural networks should be able to approximate this function. However, most, if not all, of the supervised learning is about learning a particular function assigned by the maps **X** And **And** And then using this function to find the appropriate **And** NS **New X. **If so, what is the difference between traditional machine learning algorithms and neural networks? The answer is known as *Inductive bias. *The term may sound new. But, it is nothing but the assumptions we make on the relationship between X and Y before fitting a machine learning model into it.

For example, if we think that the relationship between X and Y is linear, we can use linear regression. The inductive bias of linear regression is that the relationship between X and Y is linear. Hence, it fits a line or hyper plane of the data.

But when there is a complex and non-linear relationship between X and Y, the linear regression algorithm may not do a great job of predicting Y. In this case, we may need a curve or a multidimensional curve to approximate that relationship. The main advantage of neural networks is that the inductive bias is very weak, and therefore, no matter how complex this relationship or function is, the network is somehow able to approximate it.

But also, depending on the complexity of the function, we may have to manually set the number of neurons in each layer and the number of layers in the network. This is usually done by trial and error and experience. Hence, these parameters are called hyperparameters.

Neural networks are nothing but complex machines to fit a curve. – Josh Starmer

## Engineering and work of neural networks

Before we see why neural networks work, it will be appropriate to show what neural networks do. Before understanding the architecture of a neural network, we need to look at what a neuron does first.

Each input of an artificial neuron has a weight attached to it. The inputs are first multiplied by their own weights and a bias is added to the result. We can call this a weighted sum. Then the weighted sum goes through the activation function, which is basically a nonlinear function.

Therefore, artificial neurons can be considered as a simple or multiple linear regression model with an activation function at the end. Having said that, let’s move on to the architecture of the neural network

A neural network usually has multiple layers with each layer containing multiple neurons, where all neurons from one layer are connected to all neurons in the next layer and so on.

In Figure 1.2, we have 4 layers. The first layer is the input layer that looks like it contains 6 neurons, but is actually only the data that is given as input to the neural network *(there are 6 neurons because the input data probably has 6 columns)*. The last layer is the output layer. The number of neurons in the final layer and the first layer is predetermined by the data set and the type of problem *(number of output classes etc.)*. The number of neurons in the hidden layers and the number of hidden layers is chosen by trial and error.

neuron layer ** I** It will take all neurons out of the layer

**As input and calculating the weighted sum adds a bias to it, and then finally, sends it through the activation function, as seen above in the case of artificial neurons. The first neuron of the first hidden layer will be connected to all the inputs from the previous layer (**

*i-1**input layer*). Similarly, the second neuron of the first hidden layer will also be connected to all the inputs from the previous layer, and so on for all neurons in the first hidden layer. For neurons in the second hidden layer, the outputs of the previously hidden layer are considered as inputs and each of these neurons is connected to all previous neurons, similarly.

A layer with m neurons, preceded by a layer with n neurons will have n * m + m *(including bias)* Links or links with each link bearing weight. These weights are randomly initialized but when trained, they reach their optimum value to reduce the loss function of our choice. We will see how to learn these weights in detail in the next blog.

## Example of forward propagation

Let’s consider the neural network we have in Figure 1.2 and then show how forward propagation works with this network for a better understanding. We can see that there are 6 neurons in the input layer which means there are 6 inputs.*Note: For calculation purposes, I do not include biases. But, if biases are included, there will simply be an extra entry I0 whose value will always be 1 and there will be an extra row at the beginning of the weight matrix w01 , w02 … .w04*

Let the input be i = [ I1, I2, I3, I4, I5, I6 ]. We can see that the first hidden layer has 4 neurons. Therefore, there will be **6 * 4** Links (without bias) between the input layer and the first hidden layer. These connections are represented in green in the weight matrix below with w_ij values which represent the weight of the connection between them *I*Ten of the neurons of the input layer and *y*Ten neurons of the first hidden layer. if we hit *(matrix multiplication)* The *1 * 6* input matrix with *6 * 4* Weight matrix, we will get the output of the first hidden layer which is *1 * 4*. This makes sense because there are literally 4 neurons in the first hidden layer.

These four outputs are represented in red in Fig. 2.1. Once we have these values, we send them through the activation function in order to introduce the nonlinearity, and then these values will be the exact output of the first hidden layer.

Now, we continue the same steps for the second hidden layer with a different weight array for it.

i1, i2, etc. are only the outputs of the previous class. I am using the same variable for ease of understanding. Similar to what we saw earlier, the input matrix 1 * 4 will be multiplied by the weight matrix 4 * 3 *(Because the second hidden layer has 3 neurons)*, which produces a 1 * 3 matrix. The activation of the individual elements in that matrix will be the input for the next layer.

Take a simple guess as to what the weight matrix will look like for the final layer

Since the final layer contains only 1 neuron and the previous layer has 3 outputs, the weight array will be of size **3 * 1, **This marks the end of forward propagation in a simple feed-forward neural network.

## Why does this approach work?

We have already seen what each neuron in the network does not differ much from linear regression. In addition, the neuron adds an activating function at the end and each neuron has a different weight transmitter. But, why does this work?

Now we have already seen how the calculation works. But my main goal with this blog is to shed some light on why this approach works. In theory, neural networks should be able to approximate any continuous function, however complex and nonlinear it may be. I will do my best to convince you, and to convince myself of that, of the right standards *(weights and biases), *The network should be able to learn anything the way we saw above.

## The importance of nonlinearity

Before we go any further, we need to understand the power of nonlinearity. When we add two or more linear objects such as a line, plane, or hyperplane, the output is also a linear object: a line, plane, or hyperplane respectively. No, no matter what ratio we add to these linear objects, we still get a linear object.

But this is not the case for addition between nonlinear objects. When we add two different curves, we’ll likely get a more complex curve. If we can add different parts of these nonlinear curves in different proportions, we should somehow be able to influence the shape of the resulting curve.