It can be expensive and time-consuming to train a deep neural network from scratch. Even experienced data scientists have to try out many different model architectures and hyperparameters in order to generate a model with the right accuracy/cost trade-offs for the problem at hand. Therefore, it is common in domains such as computer vision and natural language processing to take pre-trained models developed for one setting and apply them to a different, but related setting. This is called transfer learning.
For example, we may have an existing model that can identify dogs and cats, and use it as the basis for training a new model to identify different animals, say, cows and horses. This exploits the fact that different classes of images may share the same low-level attributes like edges, shapes, and variations in shade and lighting .
Convolutional Neural Networks Example
CNNs are a special kind of neural network that are very good at image classification . Like other neural networks, they consist of input and output layers with several hidden layers. Convolution layers are dedicated to learning features about the data (edges, shapes, etc.) and subsequent layers are designed for classification (Figure 1).
Figure 1: Example of a CNN Showing Feature Layers and Classification Layers 
The idea behind transfer learning for CNNs is that the feature layers could be shared between different settings, with only the classification layers needing to be re-trained for the new setting. That is, the weights for the feature layers are frozen and do not need to be re-trained. This approach can produce accurate models while saving on training time.
Our simple transfer learning example uses the MNIST dataset , which is a well-known database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. Following the example of , we start by training a simple CNN to classify the first 5 digits (0, 1, 2, 3, 4). Then we freeze the convolutional feature layers and fine-tune the dense layers for classification of the last 5 digits (5, 6, 7, 8, 9).
We use the Apache MADlib open source project which supports deep learning using Keras and TensorFlow on Greenplum Database . The Jupyter notebook for this example is available at .
Mini-batching gradient descent can perform better than stochastic gradient descent because it uses more than one training example at a time, typically resulting in faster and smoother convergence . After we load training data into a table with one image per row, we need to call the MADlib image preprocessor to pack multiple images into a row for the Keras optimizer to work on mini-batches. For example, for the training examples for the first 5 digits (0, 1, 2, 3, 4) the SQL is:
We define two groups of layers in the CNN: feature and classification, which are both trainable at this point. The Keras code in Python to create the model is:
The resulting model looks like:
Next we freeze the feature layers to create the transfer model and load it into the model architecture table:
Now the model architecture table contains the two models that we need, one fully trainable and another with only the classification layers trainable:
Train the model for 5-digit classification (0,1,2,3,4) for 5 iterations using model_id=1:
Now that we have the weights, we can train the dense layers for new classification task for 5 iterations using model_id=2:
 Deep Learning, Goodfellow, Bengio and Courville, p. 526.
 Le Cun, Denker, Henderson, Howard, Hubbard and Jackel, Handwritten digit recognition with a back-propagation network, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 1989, pp. 396–404.
 The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist/
 MNIST transfer CNN, https://keras.io/examples/mnist_transfer_cnn/
 GPU-Accelerated Deep Learning on Greenplum Database, https://content.pivotal.io/engineers/gpu-accelerated-deep-learning-on-greenplum-database
 Apache MADlib community artifacts, https://github.com/apache/madlib-site/tree/asf-site/community-artifacts
 Neural Networks for Machine Learning, Lectures 6a and 6b on mini-batch gradient descent, Geoffrey Hinton with Nitish Srivastava and Kevin Swersky, http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Ready to take the next step? Learn more about Apache MADlib and Pivotal Greenplum:
Ready-to-use deep learning Jupyter notebooks with Apache MADlib.
The Greenplum Database YouTube channel has a wealth of educational and use case videos, including many on machine learning.
Watch Frank’s recording at FOSDEM on deep learning.
Get our new eBook “Data Warehousing with Greenplum, Second Edition.” It’s free!
Let’s talk about your needs. Contact us via the web, or email@example.com
About the AuthorMore Content by Frank McQuillan