Part 5: Sentiment Classification with Deep Learning using Keras
In this final notebook, we cover the implementation of advanced techniques for sentiment analysis using modern techniques in deep learning. The implementation of these techniques may often require longer training periods but can often produce better classification results if the right architecture and features are used.
This notebook covers the following:
Dense Layer Neural Network for Sentiment Classification
- 1. Implementation of Dense Layer in Keras
- 2. Activation and Optimization metric set up
- 3. Model Metric Specification
Recurrent Neural Networks for Sentiment Classification
- 1. Embedding layers with Keras
- 2. Recurrent Neural Network Architecture
- 3. Keras Model Implementation and Parameter Setting
0. Data Preparation
Before we implement any of the above, we need to perform the following operations on the review dataset.
- 1. Text Processing - Cleaning, Removing Punctuations and Numeric values
- 2. Build a word2vector model - Using Word vectors for feature generation
- 3. Averaging Word Vectors across review/text
Let's begin
import re
import numpy as np
import pandas as pd
from keras.preprocessing.text import text_to_word_sequence
from gensim.models import Word2Vec
0.1. Preprocessing Text
The function below implements basic preprocessing steps to clean the text ahead of developing word vectors
def processText(text):
""" Cleaning Function """
text = re.sub('[^a-zA-Z]', ' ', text)
text = re.sub('[0-9]+', '', text)
text = [word for word in text_to_word_sequence(text) ]
return " ".join(text)
review_data = pd.read_csv('restaurant_reviews.tsv', delimiter='\t')
review_data['clean_review'] = review_data.Review.apply( lambda x: processText(str(x)) )
review_data.head()
### table
0.2 . Building Word2Vec Model: CBOW
The implementation below develops a CBOW word-to-vector model of vector size 500 and window size 150.
vector_size = 500
window_size = 150
corpus = [text_to_word_sequence(review) for review in review_data.clean_review.values]
cbow_model = Word2Vec( sentences= corpus, vector_size = vector_size, window = window_size, sg=0, min_count = 2, sample=.000001 )
0.3. Averaging Word Vectors
The last step is to average the word vectors in the CBOW model so that we can have a single vector that summarizes each individual vector in the sentiment.
def avg_words_vectors(words, model, vocabulary , num_features ):
""" """
feature_vector = np.zeros((num_features,), dtype='float64')
word_count = 0
for word in words:
if word in vocabulary:
word_count += 1
feature_vector = np.add(feature_vector, model.wv.get_vector(word))
if word_count:
feature_vector = np.divide(feature_vector, word_count)
return feature_vector
def word_vectorizer(corpus, model, num_features):
""" Average Word Vectors """
vocabulary = list(model.wv.index_to_key)
features = [ avg_words_vectors(sentence, model, vocabulary, num_features) for sentence in corpus ]
return np.array(features)
text_features = word_vectorizer(corpus=corpus, model=cbow_model, num_features=500)
text_features.shape
(1000, 500)
1. Classification with Dense Neural Network
In this example, we will implement a simple fully connected Neural Network for classification. The neural network will have 2-3 Dense/Fully connected layers with drop-out layers to avoid overfitting. The output layer will be a binary output.
Initial parameter for the model:
- 1. batch_size = 50
- 2. Training Epochs = 20
- 3. Input_size/Vector Size = 500
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization
from keras.utils import to_categorical
# Dense Layer Architecture
fully_connected_nn = Sequential()
fully_connected_nn.add(Dense(200, activation='relu', input_shape=(vector_size, )))
fully_connected_nn.add(Dropout(.5))
fully_connected_nn.add(Dense(200, activation='relu'))
fully_connected_nn.add(Dropout(.5))
fully_connected_nn.add(Dense(200, activation='relu'))
fully_connected_nn.add(Dropout(.5))
fully_connected_nn.add(Dense(2, activation='softmax'))
fully_connected_nn.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 200) 100200 dropout (Dropout) (None, 200) 0 dense_1 (Dense) (None, 200) 40200 dropout_1 (Dropout) (None, 200) 0 dense_2 (Dense) (None, 200) 40200 dropout_2 (Dropout) (None, 200) 0 dense_3 (Dense) (None, 2) 402 ================================================================= Total params: 181,002 Trainable params: 181,002 Non-trainable params: 0 _________________________________________________________________
The next step is to compile the model before training it on the dataset.
fully_connected_nn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
1.1. Training/Fitting the Model
To train the model, we pass on features and targets, train and test split (as test data percentage), epochs for training, and batch sizes.
target = to_categorical(review_data.Liked)
fully_connected_nn.fit(text_features, target , validation_split=.3, shuffle=True, epochs=50, batch_size=150, verbose=1)
Epoch 31/50 5/5 [==============================] - 0s 18ms/step - loss: 0.3265 - accuracy: 0.8686 - val_loss: 0.6063 - val_accuracy: 0.7633 Epoch 32/50 5/5 [==============================] - 0s 21ms/step - loss: 0.3042 - accuracy: 0.8771 - val_loss: 0.7175 - val_accuracy: 0.7067 Epoch 33/50 5/5 [==============================] - 0s 18ms/step - loss: 0.2987 - accuracy: 0.8643 - val_loss: 0.6350 - val_accuracy: 0.7567 Epoch 34/50 5/5 [==============================] - 0s 21ms/step - loss: 0.2803 - accuracy: 0.8843 - val_loss: 0.6719 - val_accuracy: 0.7167 Epoch 35/50 5/5 [==============================] - 0s 23ms/step - loss: 0.2785 - accuracy: 0.8914 - val_loss: 0.7174 - val_accuracy: 0.7200 Epoch 36/50 5/5 [==============================] - 0s 23ms/step - loss: 0.2864 - accuracy: 0.8771 - val_loss: 0.5400 - val_accuracy: 0.7833 Epoch 37/50 5/5 [==============================] - 0s 20ms/step - loss: 0.3090 - accuracy: 0.8657 - val_loss: 0.6064 - val_accuracy: 0.7600 Epoch 38/50 5/5 [==============================] - 0s 20ms/step - loss: 0.2826 - accuracy: 0.8843 - val_loss: 0.6501 - val_accuracy: 0.7533 Epoch 39/50 5/5 [==============================] - 0s 20ms/step - loss: 0.2749 - accuracy: 0.8943 - val_loss: 0.6804 - val_accuracy: 0.7333 Epoch 40/50 5/5 [==============================] - 0s 21ms/step - loss: 0.2699 - accuracy: 0.8943 - val_loss: 0.5571 - val_accuracy: 0.7800 Epoch 41/50 5/5 [==============================] - 0s 23ms/step - loss: 0.2895 - accuracy: 0.8829 - val_loss: 0.5829 - val_accuracy: 0.7833 Epoch 42/50 5/5 [==============================] - 0s 21ms/step - loss: 0.2884 - accuracy: 0.8729 - val_loss: 0.6943 - val_accuracy: 0.7400 Epoch 43/50 5/5 [==============================] - 0s 24ms/step - loss: 0.2527 - accuracy: 0.9043 - val_loss: 0.8703 - val_accuracy: 0.7000 Epoch 44/50 5/5 [==============================] - 0s 19ms/step - loss: 0.2572 - accuracy: 0.8957 - val_loss: 0.7495 - val_accuracy: 0.7067 Epoch 45/50 5/5 [==============================] - 0s 19ms/step - loss: 0.2776 - accuracy: 0.8786 - val_loss: 0.6321 - val_accuracy: 0.7533 Epoch 46/50 5/5 [==============================] - 0s 20ms/step - loss: 0.2858 - accuracy: 0.8843 - val_loss: 0.5431 - val_accuracy: 0.7833 Epoch 47/50 5/5 [==============================] - 0s 20ms/step - loss: 0.2991 - accuracy: 0.8657 - val_loss: 0.6092 - val_accuracy: 0.7633 Epoch 48/50 5/5 [==============================] - 0s 20ms/step - loss: 0.2524 - accuracy: 0.8986 - val_loss: 0.8246 - val_accuracy: 0.7033 Epoch 49/50 5/5 [==============================] - 0s 21ms/step - loss: 0.2712 - accuracy: 0.8914 - val_loss: 0.8334 - val_accuracy: 0.6967 Epoch 50/50 5/5 [==============================] - 0s 22ms/step - loss: 0.2577 - accuracy: 0.8957 - val_loss: 0.6313 - val_accuracy: 0.7633
1.2. Model Performance and Improvements
The trained model has an 89% accuracy on the train set and 76% accuracy on the test set which is very good performance for a simple dense model.
Notice that the validation loss is increasing as the training loss is decreasing which suggests overfitting. To improve performance, we can introduce more regularizers and change the architecture. For our purposes, this is sufficient. We move on to using RNNs.
2. Embedding Layers and Recurrent Neural Networks
Word Embeddings are a little different from the averaged word vector features we used in our deep layer neural network because they use word indexing in place for the words in the corpus and assign an indeces to the words to represent their presence or lack of in a particular document.
Let's demonstrate this in practice. We can first count the frequency of the words below and use the variable to determine the number of unique words in the corpus
from collections import Counter
token_counter = Counter([token for review in corpus for token in review])
We can use the token counter to create a dictionary that indexes all the words to some unique value.
vocab_map = { item[0]:index+1 for index,item in enumerate(dict(token_counter).items()) } # Index all the words starting at 1
max_index = np.max(list(vocab_map.values()))
Notice that we may have instances when loading new text that the words are not in the vocabulary map. In the code below, we add two things: a padding index value matched to index 0 and not found index as the last words to catch those two scenarios.
vocab_map['PAD_INDEX'] = 0
vocab_map['NOT_FOUND_INDEX'] = max_index + 1
vocab_size = len(vocab_map)
We also need to obtain the maximum length of the review. We use the following list comprehension to return the maximum review.
max_len = np.max([len(review) for review in corpus])
max_len
32
2.1. Padding Text Sequences
We need to pad the text sequences to make sure that all reviews have the same input length. To do this, we use the keras padding method.
#from keras.preprocessing import sequence
from tensorflow.keras.preprocessing.sequence import pad_sequences
padding_reviews = [[vocab_map[token] for token in review] for review in corpus]
input_features = pad_sequences(padding_reviews, max_len)
input_features
array([[ 0, 0, 0, ..., 2, 3, 4], [ 0, 0, 0, ..., 6, 7, 8], [ 0, 0, 0, ..., 13, 14, 15], ..., [ 0, 0, 0, ..., 7, 76, 77], [ 0, 0, 0, ..., 516, 512, 63], [ 0, 0, 0, ..., 1323, 11, 528]], dtype=int32)
2.2. Building a Recurrent Neural Network
The RNN will be a relatively simple network with one LSTM layer, an Embedding layer, and a dropout layer for regularization. See the architecture below.
from keras.models import Sequential
from keras.layers import Dense, Embedding, Dropout, Flatten
from keras.layers import LSTM
Embedding_dim = 128
LSTM_DIM = 64
rnn_model = Sequential()
rnn_model.add(Embedding(input_dim=vocab_size, output_dim=Embedding_dim, input_length=max_len))
rnn_model.add(Dropout(.2))
rnn_model.add(LSTM(LSTM_DIM, dropout=.2, recurrent_dropout=.2))
rnn_model.add(Dense(2, activation='softmax'))
rnn_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
2.3. Fittting the RNN Classifier
In fitting our RNN we will use the following hyperparameters:
- 1. batch_size = 100
- 2. Epochs = 20
rnn_model.fit(input_features, target, epochs=20, batch_size=100, shuffle=True, validation_split=.3, verbose=1)
Epoch 1/20 7/7 [==============================] - 5s 307ms/step - loss: 0.6892 - accuracy: 0.5429 - val_loss: 0.7084 - val_accuracy: 0.3600 Epoch 2/20 7/7 [==============================] - 1s 191ms/step - loss: 0.6766 - accuracy: 0.5600 - val_loss: 0.7546 - val_accuracy: 0.3600 Epoch 3/20 7/7 [==============================] - 1s 118ms/step - loss: 0.6650 - accuracy: 0.5614 - val_loss: 0.7268 - val_accuracy: 0.3567 Epoch 4/20 7/7 [==============================] - 1s 115ms/step - loss: 0.6424 - accuracy: 0.5900 - val_loss: 0.7420 - val_accuracy: 0.3833 Epoch 5/20 7/7 [==============================] - 1s 114ms/step - loss: 0.5954 - accuracy: 0.6814 - val_loss: 0.7123 - val_accuracy: 0.4467 Epoch 6/20 7/7 [==============================] - 1s 113ms/step - loss: 0.5170 - accuracy: 0.7843 - val_loss: 0.7047 - val_accuracy: 0.5100 Epoch 7/20 7/7 [==============================] - 1s 114ms/step - loss: 0.4079 - accuracy: 0.8686 - val_loss: 0.6066 - val_accuracy: 0.7033 Epoch 8/20 7/7 [==============================] - 1s 112ms/step - loss: 0.3092 - accuracy: 0.9071 - val_loss: 0.6246 - val_accuracy: 0.7200 Epoch 9/20 7/7 [==============================] - 1s 110ms/step - loss: 0.2387 - accuracy: 0.9371 - val_loss: 0.6720 - val_accuracy: 0.7133 Epoch 10/20 7/7 [==============================] - 1s 120ms/step - loss: 0.1791 - accuracy: 0.9529 - val_loss: 0.6168 - val_accuracy: 0.7367 Epoch 11/20 7/7 [==============================] - 1s 114ms/step - loss: 0.1392 - accuracy: 0.9686 - val_loss: 0.8643 - val_accuracy: 0.6867 Epoch 12/20 7/7 [==============================] - 1s 116ms/step - loss: 0.1215 - accuracy: 0.9757 - val_loss: 0.6693 - val_accuracy: 0.7300 Epoch 13/20 7/7 [==============================] - 1s 108ms/step - loss: 0.0876 - accuracy: 0.9786 - val_loss: 0.6404 - val_accuracy: 0.7300 Epoch 14/20 7/7 [==============================] - 1s 114ms/step - loss: 0.0720 - accuracy: 0.9943 - val_loss: 0.6354 - val_accuracy: 0.7167 Epoch 15/20 7/7 [==============================] - 1s 155ms/step - loss: 0.0574 - accuracy: 0.9900 - val_loss: 0.7458 - val_accuracy: 0.7367 Epoch 16/20 7/7 [==============================] - 1s 210ms/step - loss: 0.0464 - accuracy: 0.9900 - val_loss: 0.8075 - val_accuracy: 0.7200 Epoch 17/20 7/7 [==============================] - 1s 199ms/step - loss: 0.0388 - accuracy: 0.9929 - val_loss: 0.7612 - val_accuracy: 0.7500 Epoch 18/20 7/7 [==============================] - 1s 206ms/step - loss: 0.0343 - accuracy: 0.9943 - val_loss: 0.9616 - val_accuracy: 0.7233 Epoch 19/20 7/7 [==============================] - 1s 115ms/step - loss: 0.0279 - accuracy: 0.9971 - val_loss: 0.7260 - val_accuracy: 0.7467 Epoch 20/20 7/7 [==============================] - 1s 109ms/step - loss: 0.0284 - accuracy: 0.9986 - val_loss: 0.8207 - val_accuracy: 0.7533
We see that over time the accuracy increases in the training set to 99% but the validation loss decreases and then increases indicating overfitting. It is also important to note that the dataset we are using is rather small with only about 1000 textual reviews for which neural networks can easily overfit.
Predicting Sentiment of New Reviews
We have built our model and wish to deploy it in new reviews to see the performance. We will need to preprocess the new input in the same way we preprocessed the training dataset. Let's see this in action below.
new_text = "the food was great!"
new_text = [[vocab_map[token] for token in text_to_word_sequence(new_text)]]
padded_text_input = pad_sequences(new_text, max_len)
padded_text_input
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 124, 13, 31]], dtype=int32)
rnn_model.predict(padded_text_input)
1/1 [==============================] - 1s 1s/step array([[0.00378104, 0.996219 ]], dtype=float32)
Notice that our model has predicted 99% probability of a positive review and abour .3% probability that our sentiment is a negative review. Not bad for test prediction.
Conclusion
In the series, we covered a number of NLP techniques from traditional feature extraction techniques like TFIDF and Bag of Words to modern techniques like Word2Vect and Embeddings. We have learned how to use word embedding and word vectors to build classification and sentiment analysis models. Try using the above model architectures and techniques to build a model for the Amazon product reviews data and set the results. Deep Learning techniques offer a lot of opportunities to improve model performance with additional layers, regularizers, and hyperparameter changes!