Facial emotion detection using CNN – News Couple

Facial emotion detection using CNN

This article was published as part of the Data Science Blogathon.

A comprehensive guide to model training and deployment for detecting facial emotions using a webcam.

In our previous article, we figured out how to detect emoticons in text, which is very useful for many use cases, you can read the article here. While emotion detection with tet is a very useful industry, it is now focusing on another area which is facial emotion detection. Detecting emotions using images is very useful for identification such as detecting driver drowsiness, detecting student behavior, etc.

In this article, we are going to cover this interesting application of computer vision. As we all know nowadays, computer vision is progressing. Big tech giants are building their models to become more human-like, to do that machines must be able to detect your feelings and treat you accordingly.

This article shows you how to create a form with Tensorflow, which can tell you feelings using your photo or live webcam feed.

The checkpoints we will discuss in this article are:

  • get data
  • data processing
  • image augmentation
  • Building a model and training
  • Use the webcam to detect

Begin to discover facial emotions

So let’s dive into the implementation part of facial emotion detection.

get data

We will use the dataset fer-2013 It is publicly available on Kaggle. It contains 48*48px images of faces with their emoticons labels.

This dataset contains 7 emotions:- (0 = angry, 1 = disgust, 2 = fear, 3 = happy, 4 = sad, 5 = surprised, 6 = neutral)

Start by importing pandas and some basic libraries and then upload the dataset.

import matplotlib.pyplot as plt
import numpy as np
import scipy
import pandas as pddf = pd.read_csv('../input/facial-expression-recognitionferchallenge/fer2013/fer2013/fer2013.csv')
Source: local

This data set contains 3 columns, feelingsAnd pixels And use. The emotion column contains an integer number of encoded emotions and a pixel column
Contains pixels in the form of a string separated by spaces, usage
Indicates whether the data is intended for the purpose of training or testing.

data preparation

You see the data is not correct. We need to pre-process the data. Here X_train, X_test Contains pixels and y_testAnd y_train Contains feelings.

X_train = []
y_train = []
X_test = []
y_test = []
for index, row in df.iterrows():
    k = row['pixels'].split(" ")
    if row['Usage'] == 'Training':
    elif row['Usage'] == 'PublicTest':
Preparing facial emotion detection data
Source: local

At this stage X_train, X_test It has the number of pixels as a string, converting it to numbers is easy, we just need to dress up.

X_train = np.array(X_train, dtype="uint8")
y_train = np.array(y_train, dtype="uint8")
X_test = np.array(X_test, dtype="uint8")
y_test = np.array(y_test, dtype="uint8")

y_test, y_train It contains 1D integer encoded labels, which we need to associate with categorical data for effective training.

import keras
from keras.utils import to_categorical
y_train= to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)

num_classes = 7 It shows that we have 7 categories to categorize.

data reconfiguration

You need to convert the data into a four-dimensional tensor form (row_num, width, height, channel) for training purposes.

X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)

1 here tells us that the training data is in grayscale form, At this point, we have successfully preprocessed our data X_train, X_test, y_trainAnd y_test.

Zoom in on the image to discover facial emotions

Image data augmentation is used to improve the performance and generalization ability of the model. It is always good to apply some of them
Increment data before passing it to the formAnd Which can be done using ImageDataGenetrator provided by Keras.

from keras.preprocessing.image import ImageDataGenerator 
datagen = ImageDataGenerator( 
    rotation_range = 10,
    horizontal_flip = True,
testgen = ImageDataGenerator(rescale=1./255)
batch_size = 64
  • resale: normalizes the pixel value by dividing by 255.
  • horizontal_heart: Flips the image horizontally.
  • fill_mode: Fills the image if it is not available after some cropping.
  • rotation_range: Rotates the image by 0-90 degrees.

When testing the data, we will only apply rescaling (normalization).

Install the generator on our data

we will use batch size From 64 and after fitting our data into our image generator, the data will be generated with a batch size of 64. Using a data generator is the best way to train a large amount of data.

train_flow = datagen.flow(X_train, y_train, batch_size=batch_size) 
test_flow = testgen.flow(X_test, y_test, batch_size=batch_size)

train_flow it contains X_train And y_train While test_flow it contains X_test And y_test.

Building a facial emotion detection model using CNN

Designing a CNN model for emotion detection using a functional API. We create the blocks using Conv2D, Batch-Normalization, Max-Pooling2D, Dropout, Flatten layer and then stack them together and at the end of using the dense layer for output you can read more On how to design CNN models.

Building the model using a functional API gives more flexibility.

from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from keras.optimizers import Adam, SGD
from keras.regularizers import l1, l2
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix

FER_model It takes the input volume and returns the training model. Now let’s define the structure of the model.

def FER_Model(input_shape=(48,48,1)):
    # first input model
    visible = Input(shape=input_shape, name="input")
    num_classes = 7
    #the 1-st block
    conv1_1 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name="conv1_1")(visible)
    conv1_1 = BatchNormalization()(conv1_1)
    conv1_2 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name="conv1_2")(conv1_1)
    conv1_2 = BatchNormalization()(conv1_2)
    pool1_1 = MaxPooling2D(pool_size=(2,2), name="pool1_1")(conv1_2)
    drop1_1 = Dropout(0.3, name="drop1_1")(pool1_1)#the 2-nd block
    conv2_1 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name="conv2_1")(drop1_1)
    conv2_1 = BatchNormalization()(conv2_1)
    conv2_2 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name="conv2_2")(conv2_1)
    conv2_2 = BatchNormalization()(conv2_2)
    conv2_3 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name="conv2_3")(conv2_2)
    conv2_2 = BatchNormalization()(conv2_3)
    pool2_1 = MaxPooling2D(pool_size=(2,2), name="pool2_1")(conv2_3)
    drop2_1 = Dropout(0.3, name="drop2_1")(pool2_1)#the 3-rd block
    conv3_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv3_1")(drop2_1)
    conv3_1 = BatchNormalization()(conv3_1)
    conv3_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv3_2")(conv3_1)
    conv3_2 = BatchNormalization()(conv3_2)
    conv3_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv3_3")(conv3_2)
    conv3_3 = BatchNormalization()(conv3_3)
    conv3_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv3_4")(conv3_3)
    conv3_4 = BatchNormalization()(conv3_4)
    pool3_1 = MaxPooling2D(pool_size=(2,2), name="pool3_1")(conv3_4)
    drop3_1 = Dropout(0.3, name="drop3_1")(pool3_1)#the 4-th block
    conv4_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv4_1")(drop3_1)
    conv4_1 = BatchNormalization()(conv4_1)
    conv4_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv4_2")(conv4_1)
    conv4_2 = BatchNormalization()(conv4_2)
    conv4_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv4_3")(conv4_2)
    conv4_3 = BatchNormalization()(conv4_3)
    conv4_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name="conv4_4")(conv4_3)
    conv4_4 = BatchNormalization()(conv4_4)
    pool4_1 = MaxPooling2D(pool_size=(2,2), name="pool4_1")(conv4_4)
    drop4_1 = Dropout(0.3, name="drop4_1")(pool4_1)
    #the 5-th block
    conv5_1 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name="conv5_1")(drop4_1)
    conv5_1 = BatchNormalization()(conv5_1)
    conv5_2 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name="conv5_2")(conv5_1)
    conv5_2 = BatchNormalization()(conv5_2)
    conv5_3 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name="conv5_3")(conv5_2)
    conv5_3 = BatchNormalization()(conv5_3)
    conv5_4 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name="conv5_4")(conv5_3)
    conv5_3 = BatchNormalization()(conv5_3)
    pool5_1 = MaxPooling2D(pool_size=(2,2), name="pool5_1")(conv5_4)
    drop5_1 = Dropout(0.3, name="drop5_1")(pool5_1)#Flatten and output
    flatten = Flatten(name="flatten")(drop5_1)
    ouput = Dense(num_classes, activation='softmax', name="output")(flatten)# create model 
    model = Model(inputs =visible, outputs = ouput)
    # summary layers
    return model

Assembling a facial emotion detection model

Assembling the model using save Muhsin Adam lr= 0.001, if the accuracy of the model does not improve after the learning rate decreases in some eras by decay Worker.

model = FER_Model()
opt = Adam(lr=0.0001, decay=1e-6)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=['accuracy'])

Facial emotion detection model training

To train the model, you need to write the following line of code.

num_epochs = 100  
history = model.fit_generator(train_flow, 
                    steps_per_epoch=len(X_train) / batch_size, 

validation_steps = len (X_test) / batch_size)

  • Steps_for_step = TotalTrainingSamples / TrainingBatchSize
  • Validation_steps = TotalvalidationSamples / ValidationBatchSize

Training takes at least 20 minutes for 100 afternoons.

Source: local

Save the form

Save the structure of our model in JSON and weight the model to .h5.

model_json = model.to_json()
with open("model.json", "w") as json_file:
print("Saved model to disk")

Download the model and weights saved in a directory.

Test the form with Webcam Feed

In this part, we will test our model in real time using face detection.

Load the saved form

Let’s start by loading the trained model structure and weights so that they can be used further to make predictions.

from tensorflow.keras.models import model_from_json
model = model_from_json(open("model_arch.json", "r").read())

Download Har-Cascade for face detection

We are using Haar-cascade to determine the position of the detected faces and after getting the position we will clip the faces.

haarcascade_frontalface_default It can be downloaded using the link.

import cv2
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

Read frames and apply preprocessing with OpenCV

Use OpenCV to read frames and manipulate images.

cap=cv2.VideoCapture(0)while cap.isOpened():
    res,frame=cap.read()height, width , channel = frame.shapegray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_haar_cascade.detectMultiScale(gray_image )
        for (x,y, w, h) in faces:
            cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness =  2)
            roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
            image_pixels = img_to_array(roi_gray)
            image_pixels = np.expand_dims(image_pixels, axis = 0)
            image_pixels /= 255
            predictions = model.predict(image_pixels)
            max_index = np.argmax(predictions[0])
            emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
            emotion_prediction = emotion_detection[max_index]
  • Here prediction_feelings Returns the emotion label.
  • Normalize the test images by dividing them by 255.
  • np.expand_dims Converting a three-dimensional matrix to a four-dimensional tensor.
  • (x, y, w, h) are the coordinates of the faces in the input frame.
  • Hairstyles It only takes grayscale images.

adding an overlay

Adding an overlay on the output window and displaying the prediction with confidence gives a better look.

cap=cv2.VideoCapture(1)while cap.isOpened():
    res,frame=cap.read()height, width , channel = frame.shape#---------------------------------------------------------------------------
    # Creating an Overlay window to write prediction and cofidencesub_img = frame[0:int(height/6),0:int(width)]black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0
    res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0)
    FONT_SCALE = 0.8
    lable_color = (10, 10, 255)
    lable = "Emotion Detection made by Abhishek"
    lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0]
    textX = int((res.shape[1] - lable_dimension[0]) / 2)
    textY = int((res.shape[0] + lable_dimension[1]) / 2)
    cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0), FONT_THICKNESS)# prediction part --------------------------------------------------------------------------gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_haar_cascade.detectMultiScale(gray_image )
        for (x,y, w, h) in faces:
            cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness =  2)
            roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
            image_pixels = img_to_array(roi_gray)
            image_pixels = np.expand_dims(image_pixels, axis = 0)
            image_pixels /= 255
            predictions = model.predict(image_pixels)
            max_index = np.argmax(predictions[0])
            emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
            emotion_prediction = emotion_detection[max_index]
            cv2.putText(res, "Sentiment: ".format(emotion_prediction), (0,textY+22+5), FONT,0.7, lable_color,2)
            lable_violation = 'Confidence: '.format(str(np.round(np.max(predictions[0])*100,1))+ "%")
            violation_text_dimension = cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0]
            violation_x_axis = int(res.shape[1]- violation_text_dimension[0])
            cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7, lable_color,2)
    except :
    frame[0:int(height/6),0:int(width)] = res
    cv2.imshow('frame', frame)if cv2.waitKey(1) & 0xFF == ord('q'):

Now turn it on!!!

Facial Emotion Detection Video
Source: local


In this article, I’ve seen how to preprocess data, design a network that is able to classify sentiment, and then use Opencv
To detect faces and then pass them to predict.

You can improve accuracy further by:

  • Using pre-trained models such as VGG-16, Resnet, etc.
  • Use the stacked form
  • Make some improvements

Download the source codes from here.

Thanks for reading the article, please share it if you liked this article!

The media described in this article is not owned by Analytics Vidhya and is used at the author’s discretion.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button