This article was published as part of the Data Science Blogathon.
Brief introduction to transfer learning
The most pervasive problem in machine learning relates to data: it may be either insufficient or of low quality. One obvious solution to this set of problems is getting more and better data. However, these two do not meet very often. We have to sacrifice quality for quantity or vice versa. Fortunately, there is a more innovative solution: transfer learning.
Transfer learning is a way of reusing an already trained model for another task. The original training step is called pre-training. The general idea is that pre-training “teaches” the model more general features, while the final training phase “learns” that they feature our (limited) data.
Transfer learning is particularly useful in areas such as medicine, where lack of data remains a perennial problem. Several CNN models pre-trained on ImageNet data have been shown to be successful in various medical tasks . All it takes is a few lines of code to transfer it to the medical data.
In this article, we will learn how to do that using TensorFlow, the world’s most used deep learning platform (as of 2021). Before we delve into the code, let’s have a quick summary of TensorFlow and the Keras API that powers it.
Tensorflow and Hard API
TensorFlow is a comprehensive platform that allows building and publishing ML models. We are only interested in building models, not publishing them, and for this, we need to use Keras. Keras is an API designed for “humans, not machines,” as they put it themselves. This means that Keras is designed for programmers like us who want to build custom models. Its simple and easy-to-remember formula makes it almost addictive.
While Keras API is available as a standalone Python library, it is also available as part of the TensorFlow library. It is recommended to use tensorflow.keras via Keras itself, as it is maintained by the TensorFlow team, ensuring consistency with other TensorFlow modules.
Case Study: Classification of Binary Images
As a first example, we will try to classify binary images. Our dataset will be a Hot Dog – not a Hot Dog from Kaggle  And we will try to predict – you guessed it – whether the image presented is a sausage or not.
For this, we will use the ResNet50 model that was previously trained on the ImageNet dataset. ResNet refers to a set of architectures that use residual connections to solve the degradation problem – that is, resolution degradation.
The figure above depicts the remaining maps. This connection skips one (or more) layers and sets the identity, F(x) + x. This slight modification of the network architecture has been a huge success against the degradation problem . As a result, ResNet architectures can reach a depth of 1,000 layers. Our specific choice of model, ResNet50 is a relatively shallow example. You can see its overall structure in the following figure:
There are alternatives to the ResNet family: MobileNets, Inception, etc. are also proven to be successful in image classification. You can also choose one of these, or a completely different network and do transfer learning on that.
I’ll be working on Google Colab, which I’d recommend to anyone whose computer isn’t doing the job, although it’s not a strict requirement. You can run the code in any environment of your choice, including Jupyter Notebook or PyChram.
Let’s go through the process step by step.
Note: This step may vary depending on your preferred environment.
# Upload the kaggle API key from google.colab import files files.upload()
! mkdir ~/.kaggle ! cp kaggle.json ~/.kaggle/ ! chmod 600 ~/.kaggle/kaggle.json
# Install the kaggle package ! pip install -q kaggle
# Download the dataset from Kaggle ! kaggle datasets download -d dansbecker/hot-dog-not-hot-dog
# Import the necessary packages import tensorflow as tf from tensorflow import keras from PIL import Image import os import numpy as np
Load data for transfer learning with TensorFlow
# Unzip the downloaded zip file !unzip /content/hot-dog-not-hot-dog.zip
# Let's check size of images for image in list(os.walk("/content/train/not_hot_dog")): a = Image.open(f"/content/train/not_hot_dog/image") print(np.asarray(a).shape)
This is only part of the output, but we can already see that the image sizes are not fixed. ImageDataGenerator It deals with this kind of problem, among many other things.
Image data is basically a set of numbers. Color images are represented by a set of three two-dimensional matrices. Each of these arrays consists of values between 0 and 255 (this may vary). Three of these values combined (each from a single array) represent the coIour of a pixel. In our case, our images have the form (512, 512, 3). This means we have 512 * 512 = 262,144 pixels and 3 channels. (As we said earlier, not all of them fit 512*512 size, but we will deal with it.)
# Create ImageDataGenerator objects train_datagen = tf.keras.preprocessing.image.ImageDataGenerator() test_datagen = tf.keras.preprocessing.image.ImageDataGenerator()
# Assign the image directories to them train_data_generator = train_datagen.flow_from_directory( "/content/train", target_size=(512,512) )
test_data_generator = train_datagen.flow_from_directory( "/content/test", target_size=(512,512) )
ImageDataGenerator Object data in batches to our model when necessary. This allows us to work directly with the data stored on the hard drive, without overloading the RAM. train_data_generator and test_data_generator will be passed as arguments to the x and validation_data parameters respectively. where ImageDataGenerator It gets the classes from the folder names, we don’t need the y parameter. (If you try to pass an argument to y, Python will err.)
Now that we have our train and test data set, we can build and train our model.
First, we will upload the Keras application for the ResNet50 model.
resnet_50 = tf.keras.applications.resnet50.ResNet50(include_top=False, weights="imagenet") resnet_50.trainable=False
include_top = false Ensures that the last layer of the ResNet50 model is not loaded. weights = ‘imagenet’ Loads ImageNet weights. If we put Weights = none, then the weights are randomly initialized (in this case, we won’t perform transfer learning). by selecting trainable attribute to False, we guarantee that the original weights (ImageNet) of the model will remain constant.
We need a binary classifier, but ResNet50 has more than two nodes in the final layers. This means that we have to add the final layer manually. I used a functional API, which can be tricky if you are a novice user of TensorFlow. (In this case, I would suggest you use the Serial API, which has a more straightforward syntax.)
inputs = keras.Input(shape=(512,512,3)) x = resnet_50(inputs) x = keras.layers.GlobalAveragePooling2D()(x) outputs = keras.layers.Dense(2, activation="softmax")(x) model = keras.Model(inputs=inputs, outputs=outputs, name="my_model") model.compile(optimizer="Adam", loss="binary_crossentropy", metrics=["accuracy"]) model.summary()
In these lines, we define our input, and pass it to resnet_50 The model we defined earlier, pass its output to the global average pool layer, and then pass its output to a dense layer with nodes (of two classes). The activation function should be softmax in this case. The sum of the values of the softmax output vector is always 1. For two nodes (each node represents a class), we have x1 +2 = 1 where x1 and x2 represent class probabilities. (Otherwise, we could have 1 node and the sigmoid activation function). After all this, we need to compile the model by choosing an optimizer and a loss function. We can also add metrics that need to be measured during the training process. Finally, we can train our model.
model.fit(train_data_generator, validation_data=test_data_generator, epochs=5)
We’ve finished moving the learning part. Optionally, you can adjust the model for better results.
In this article, we have learned how to implement learning transfer with the help of TensorFlow. Transfer learning is a powerful approach that allows us to overcome the lack of data. However, it is not a silver bullet. There are cases when working with whatever data we have it makes more sense and leads to better results. And it has alternatives. Data augmentation is common. Of course, these two are not exclusive. Different approaches can (often) be combined to solve a data problem.
1 – https://www.toptal.com/machine-learning/tensorflow-machine-learning-tutorial
2 – https://github.com/tensorflow/tensorflow
3 – https://netbasequid.com/blog/social-analytics-hotdog/
4 – https://neurohive.io/en/popular-networks/resnet/
5 – https://www.researchgate.net/figure/Left-ResNet50-architecture-Blocks-with-dotted-line-represent-modules-that-might-be_fig3_331364877
6 – https://colab.research.google.com/drive/1pYVZtULa3pKncA7C2umg9LA5tqOCCos3?usp=sharing
7 – https://colab.research.google.com/drive/1pYVZtULa3pKncA7C2umg9LA5tqOCCos3?usp=sharing