This follows Section 3.5 of https://www.manning.com/books/deep-learning-with-python
Let's take a look at a multicategory classification task using Keras. We will be using the famous Reuters newswire dataset. https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection Here examples are short texts with their topics as labels. There are 46 different topics and we will build a topic classifier which, given newswire text classifies its topic
The Reuters dataset is a classic toy dataset for text classification, included with keras. Let's load train and test data as provided within Keras
from keras.datasets import reuters
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
print(train_data.shape)
print(test_data.shape)
Each article is encoded as a list of integers.
train_data[0:2]
We will use a bag-of-words encoding for text, where we represent each article by a vector of 0 or 1s, using 1s to indicate that a given word occurs in the document
import numpy as np
def vectorize_articles(articles, dimension=10000):
result = np.zeros((len(articles), dimension))
for i, article in enumerate(articles):
result[i, article] = 1.
return result
x_train = vectorize_articles(train_data)
x_test = vectorize_articles(test_data)
x_train[0,:]
Labels are currently given as integers indicating topic, we will encode them using one-hot-encoding
from keras.utils.np_utils import to_categorical
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)
one_hot_train_labels[1,:]
Let's build a two layer network, each layer with 64 hidden units and relu
activation functions, and an output layer of 46 units with a softmax
activation function
since we are doing a multicategory classification task. We use 46 units as that is the number of topics (categories) in our multi-categorical classification task.
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
With the model in place, we next need to decide how to train it. That includes deciding the loss function to use, which optimization method to use, and other metrics we may want to keep track of during training of the model.
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
rmsprop
: this is an extension of gradient descent that seeks to improve convergence by scaling gradient updates by their magnitude. The GD update is divided by a running average of the gradient magnitudes in some small number of last iterations.
categorical_crossentropy
: a loss function used for multiclass classification tasks. Recall that the softmax
activation function for each of the 46 output nodes is given by $f_k = \frac{e^{s_k}}{\sum_l e^{s_l}}$. We can interpret $f_k$ as the probability of classifiying an example with label $k$. Cross entropy is based on the deviation of the probability distribution given by the output layer and the observed distribution, which is 0 everywhere except for the observed label, in which case it is 1. So, letting $y_k$ be the one-hot-encoding of the observed class. The loss function (for each example) is then $-\sum_k y_k\log{f_k}$.
We indicate we want to prediction accuracy as we train the network
Now we are ready to train the network. First we grab a validation set of 1000 examples to measure validation loss and accuracy and decide how many epochs to train the model to avoid overfitting.
x_val = x_train[:1000]
partial_train_x = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_train_y = one_hot_train_labels[1000:]
Now train the model
history = model.fit(partial_train_x, partial_train_y,
epochs=20, batch_size=512,
validation_data=(x_val, y_val))
Let's take a look at loss and accuracy curves across epochs. First loss
%matplotlib inline
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label="Training loss")
plt.plot(epochs, val_loss, 'b', label="Validation loss")
plt.title("Training and validation loss")
plt.xlabel('Epochs')
plt.ylabel('X-entropy loss')
plt.legend()
plt.show()
acc = history.history['acc']
val_acc = history.history['val_acc']
plt.plot(epochs, acc, 'bo', label="Accuracy")
plt.plot(epochs, val_acc, 'b', label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Let's retrain the model to up to 8 epochs since it's overfitting after that many
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(partial_train_x, partial_train_y,
epochs=8, batch_size=512,
validation_data=(x_val, y_val))
Let's evaluate the results on the test set
model.evaluate(x_test, one_hot_test_labels)
Since we are using a softmax
activation function, we can generate "class probabilities" as predictions (we could do this with decision trees and ensemble methods).
predictions = model.predict(x_test)
predictions[0]
Let's see what that article is
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
' '.join([reverse_word_index.get(i-3, '?') for i in test_data[0]])
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(partial_train_x, partial_train_y,
epochs=20, batch_size=512,
validation_data=(x_val, y_val))
acc = history.history['acc']
val_acc = history.history['val_acc']
plt.plot(epochs, acc, 'bo', label="Accuracy")
plt.plot(epochs, val_acc, 'b', label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()