Mode detection is an active area of study in the field of computer vision. You can find literally hundreds of research papers and many models trying to solve the problem of situation detection. The reason why many machine learning enthusiasts are drawn to estimating is because of its versatility and usefulness. In this article, we will cover one application for mode detection and estimation using machine learning and some very useful libraries in Python.
What is the status estimate?
Posture estimation is a computer vision technique for tracking the movements of a person or object. This is usually done by finding the location of the key points of the selected objects. Based on these key points, we can compare different movements and postures and draw ideas. Posture estimation is actively used in the field of augmented reality, animation, games and robotics.
There are many models in existence today to perform posture estimation. Here are some ways to estimate the situation:
- unlock mode
- net pose
- Blaze pose
- deep posture
- dense pause
- Deep wound
The choice of any model over another may depend entirely on the application. Also, factors such as runtime, model size, and ease of implementation can be different reasons for choosing a particular model. Therefore, it is better to know your requirements from the beginning and choose the model accordingly.
In this article, we will use the Blaze mode to detect the human posture and extract key points. The model can be easily implemented by a very useful library, known as the media tube.
media tubes Media pipe is an open source cross-platform framework for building multi-model machine learning pipelines. It can be used to implement advanced models such as human face detection, multi-handed tracking, hair segmentation, object detection and tracking, etc.
– Where most mode detection is based on COCO’s 17 key point topology, the fire mode detector predicts 33 human key points including torso, arms, leg and face. The inclusion of more key points is essential for successful applications of field-specific posture estimation models, such as hands, face, and feet. Each major point is predicted with three degrees of freedom along with the degree of visibility. The fire mode forms a sub-millisecond model and can be used for real-time applications with better accuracy than most current models. The model is available in full Blaze pose lite and Blaze pose versions to provide a balance between speed and accuracy.
Blaze pose provides many applications including fitness and yoga trackers. These applications can be implemented using an additional classifier such as the one we are going to build in this same article.
You can learn more about the Fire Position Detector here.
2D vs 3D estimating mode
Position estimation can be done in either 2D or 3D. 2D mode estimation predicts the cardinal points of the image via pixel values. Whereas 3D mode estimation refers to the prediction of the 3D spatial arrangement of key points as its output.
Prepare the data set for situation estimation
We learned in the previous section that the human pose key points can be used to compare different situations. In this section, we will prepare the dataset using the same Media Pipe Library. We will take pictures of two yoga poses, extract key points from them and store them in a CSV file.
You can download the dataset from Kaggle through this link. The data set consists of 5 yoga poses, however, in this article, I only cover two. You can use all of them if you like, the procedure will remain the same.
iport time import numpy as np import pandas as pd import os mpPose = mp.solutions.pose pose = mpPose.Pose() mpDraw = mp.solutions.drawing_utils # For drawing keypoints points = mpPose.PoseLandmark # Landmarks path = "DATASET/TRAIN/plank" # enter dataset path data =  for p in points: x = str(p)[13:] data.append(x + "_x") data.append(x + "_y") data.append(x + "_z") data.append(x + "_vis") data = pd.DataFrame(columns = data) # Empty dataset
In the above code snippet, we first import the necessary libraries that will help build the dataset. Then in the next 4 lines we import the units required to extract key points and their drawing tools. Next, we create an empty pandas data frame and insert the columns. Here the columns include thirty-three key points that will be detected by the fire position detector. Each key point has four attributes which are the x and y coordinates of the key point (normalized from 0 to 1), the z-coordinates representing the depth of the feature with the hips as the origin and the same x-scale, and finally the degree of visibility. The degree of visibility represents the probability that the feature is visible in the image or not.
count = 0 for img in os.listdir(path): temp =  img = cv2.imread(path + "/" + img) imageWidth, imageHeight = img.shape[:2] imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) blackie = np.zeros(img.shape) # Blank image results = pose.process(imgRGB) if results.pose_landmarks: # mpDraw.draw_landmarks(img, results.pose_landmarks, mpPose.POSE_CONNECTIONS) #draw landmarks on image mpDraw.draw_landmarks(blackie, results.pose_landmarks, mpPose.POSE_CONNECTIONS) # draw landmarks on blackie landmarks = results.pose_landmarks.landmark for i,j in zip(points,landmarks): temp = temp + [j.x, j.y, j.z, j.visibility] data.loc[count] = temp count +=1 cv2.imshow("Image", img) cv2.imshow("blackie",blackie) cv2.waitKey(100) data.to_csv("dataset3.csv") # save the data as a csv file
In the above code, we are iterating through the pose images individually, extracting the key points using the blaze pose model and storing them in temporary array ‘temp’. After the iteration is completed, we append this temporary array as a new record in our dataset. You can also see these landmarks by using the drawing utils present in the media pipe itself. In the above code, I have drawn these landmarks on the image as well as on a blank image ‘blackie’ to focus on the results of the blaze pose model only. The blank image ‘blackie’ has the same shape as that of the given image. One thing that should be noticed is that the blaze pose model takes RGB images instead of BGR (read by OpenCV).
After getting the key points of all the images we have to add a target value that will act as a label for our machine learning model. You can make the target value for 1st pose as 0 and the other as 1. After that, we can just save this data to a CSV file which we will use for creating a machine learning model in the later steps.
You can observe how the dataset looks like from the above image.
Creating the Pose Estimation model
Now we have created our dataset, we just have to pick a machine-learning algorithm to classify the poses. In this step, we will take an image, run the blaze pose model (that we used earlier for creating the dataset) to get the key points of the person present in that image, and run our model on that test case. The model is expected to give the correct results with a high confidence score. In this article, I am going to use the SVC(Support Vector Classifier) from the sklearn library to perform the classification task.
from sklearn.svm import SVC data = pd.read_csv("dataset3.csv") X,Y = data.iloc[:,:132],data['target'] model = SVC(kernel="poly") model.fit(X,Y) mpPose = mp.solutions.pose pose = mpPose.Pose() mpDraw = mp.solutions.drawing_utils path = "enter image path" img = cv2.imread(path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = pose.process(imgRGB) if results.pose_landmarks: landmarks = results.pose_landmarks.landmark for j in landmarks: temp = temp + [j.x, j.y, j.z, j.visibility] y = model.predict([temp]) if y == 0: asan = "plank" else: asan = "goddess" print(asan) cv2.putText(img, asan, (50,50), cv2.FONT_HERSHEY_SIMPLEX,1,(255,255,0),3) cv2.imshow("image",img)
In the above lines of code, we first imported the SVC (Support Vector Classifier) from the sklearn library. We trained the data set we created earlier on the SVC with the target variable as the label Y. Then we read the input image and extract the key points, the same way we did during the creation of the data set. Finally, we enter the temporary variable and use the model to make the prediction. The mode can now be detected using simple if-else conditions.
From the pictures above, you can see that the model has correctly categorized the situation. You can also see the mode detected by the fire mode model on the right side. In the first picture, if you observe closely, some key points are not visible, however, the situation is correctly categorized. This may be possible due to the clarity of the key point feature provided by the fire mode model.
Mode detection is an active area of research in machine learning and offers many real-world applications. In this article, we have tried to work on one of these apps and get our hands dirty by detecting the situation. We have learned about mode detection and many models that can be used for mode detection. We chose the blaze pose model for our purpose and got to know its pros and cons over the other models. In the end, we built a classifier for classifying yoga poses using a support vector classifier from the sklearn library. We also built our dataset for this purpose which can be easily expanded with more images.
Thank you. I hope you enjoyed reading the article.
Also check out the rest of my articles at https://www.analyticsvidhya.com/blog/author/ayush417/
Contact me on LinkedIn https://www.linkedin.com/in/ayush-gupta-5b9091174/