This article was published as a part of the Data Science Blogathon.
Introduction to Tensorflow 2.x
Suppose we want to focus on a particular part of an image than what concept will help us. Yes, it is a saliency map. It mainly focuses on specific pixels of images while noting others.
The saliency map of an image represents the most prominent and focused pixel of an image. Sometimes, the brighter pixels of an image tell us about the salient of the pixels. This means the brightness of the pixel is directly proportional to the saliency of an image.
Suppose we want to give attention to a particular part of an image, like wanting to focus on the image of a bird rather than the other parts like the sky, nest etc. Then by computing the saliency map, we will achieve this. It will crucially assist in reducing the computational cost. It is usually a grayscale image but can be converted into another format of a colored image depending upon our visual comfortability. Saliency maps are also termed “heat maps” since the hotness/brightness of the image has an impactful effect on identifying the class of the object. The saliency map aims to determine the areas in the fovea that are salient or observable at every place and to influence the decision of attentive regions based on the spatial pattern of saliency. It is employed in a variety of Visual Attention models. The “ITTI and Koch” Computational Framework of Visual Attention is built on the notion of a saliency map.
How to Compute Saliency Map with Tensorflow?
The saliency map can be calculated by taking the derivatives of the class probability Pk with respect to the input image X.
saliency_map = dpk/dX
Hang on a minute! That seems quite familiar! Yes, it’s the same backpropagation which we use for training the model. We only need to take one more step: the gradient does not stop at the first layer of our network. Rather, we must return it to the input image X.
As a result, saliency maps provide a suitable characterization for each input pixel in accordance with a specific class prediction Pi. Pixels that are significant for flower prediction should cluster around flower pixels. Otherwise, something really strange is going on with the trained model.
The advantage of saliency maps is that, since they depend exclusively on gradient computations, many commonly used deep learning models can provide us with saliency maps for free. We don’t need to modify the network architectures at all; we simply need to tweak the gradient calculations slightly.
Different Types of the Saliency Map
1. Static Saliency: It focuses on every static pixel of the image to calculate the important area of interest for saliency map analysis.
2. Motion Saliency: It focuses on the dynamic features of video data. The saliency map in a video is computed by calculating the optical flow of a video. The moving entities/objects are considered salient objects.
We will perform step-by-step investigations of the ResNet50 architecture, which has been pre-trained on ImageNet. But you can take other pretrained deep learning models or bring your own trained model. We will illustrate how to develop a basic saliency map utilizing the most famous DL model in TensorFlow 2.x. We used the Wikimedia image as a test image during the tutorials.
We begin by creating a ResNet50 with ImageNet weights. With the simple helper functions, we import the image on the disc and prepare it for feeding to the ResNet50.
# Import necessary packages import tensorflow as tf import numpy as np import matplotlib.pyplot as plt def input_img(path): image = tf.image.decode_png(tf.io.read_file(path)) image = tf.expand_dims(image, axis=0) image = tf.cast(image, tf.float32) image = tf.image.resize(image, [224,224]) return image def normalize_image(img): grads_norm = img[:,:,0]+ img[:,:,1]+ img[:,:,2] grads_norm = (grads_norm - tf.reduce_min(grads_norm))/ (tf.reduce_max(grads_norm)- tf.reduce_min(grads_norm)) return grads_norm def get_image(): import urllib.request filename="image.jpg" img_url = r"https://upload.wikimedia.org/wikipedia/commons/d/d7/White_stork_%28Ciconia_ciconia%29_on_nest.jpg" urll)
test_model = tf.keras.applications.resnet50.ResNet50() #test_model.summary() get_image() img_path = "image.jpg" input_img = input_img(img_path) input_img = tf.keras.applications.densenet.preprocess_input(input_img) plt.imshow(normalize_image(input_img), cmap = "ocean")
result = test_model(input_img) max_idx = tf.argmax(result,axis = 1) tf.keras.applications.imagenet_utils.decode_predictions(result.numpy())
with tf.GradientTape() as tape: tape.watch(input_img) result = test_model(input_img) max_score = result[0,max_idx] grads = tape.gradient(max_score, input_img) plot_maps(normalize_image(grads), normalize_image(input_img))
fig2: (1) Saliency_map, (2) input_image, (3) overlayed_image
In this blog, we have defined the saliency map in different aspects. We have added a pictorial representation to understand the term “saliency map” deeply. Also, we have understood it by implementing it in python using TensorFlow API. The results seem promising and can be easily understandable.
In this article, you have learned about the saliency map for an image with tensorflow.
A python code is also implemented on an image t
The mathematical background of the saliency map is also covered in this article.
That’s it! Congratulations, you just computed a saliency map.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.