Models

Each model’s code, thoroughly commented and with a summary of results, can be easily viewed by navigating to it’s specific page, accessed by clicking on the title.

Baseline model 1: Plurality Prediction

This model simply predicted the class with the most data points in the train set, i.e. the class that forms a plurality. This happened to be class 1, but all classes were roughly the same size. We did this because there is no majority class, which is a good thing when we use it to train the neural networks. Because the model is so simple, the procedure is likewise simple. The model accepts X_train, a list of data points that can be in any format since the model doesn’t actually look at the features or pixels, and simply predicts the majority class, returning the predictions in an array. An example usage would be “predict_plurality(x_train)”.

Baseline Model 2: Random Breed Prediction

This model predicted a class (an integer in [0, 8]) uniformly at random. Again, because the model is so simple, the procedure is likewise simple. The model accepts X_train, a list of data points that can be in any format since the model doesn’t actually look at the features or pixels, and simply predicts the majority class, returning the predictions in an array. An example usage would be “predict_plurality(x_train)”.

Logistic Regression

Preprocessing

I preprocess the data by resizing the smaller side to 224 and cropping the second side to 224. This can be done by running python preprocessing.py, which helps properly resize and crop the images for logistic regression.

Model

This model utilizes logistic regression to predict the superbreed. Initially, we tried to do logistic regression using sklearn, but the number of files caused our kernels to crash. From the suggestion from our TF Camilo, we instead attempted logistic regression using Keras.

With Keras, we were able to implement logistic regression by adding one Dense layer (one single perceptron) to our model. Afterwards, we compiled the model and used Keras.SGD for optimization. For our model, we tested around a few different parameters, but we ended up choosing a learning rate of 0.001 and 50 epochs.

Results

The logistic regression model took 19.55 minutes to train. We got a training accuracy of 28.60% and a test accuracy of 20.26%. Overall, logistic regression did better than both of our baseline models, but did not nearly do as well as our various CNN models.

Convolutional Neural Networks

Preprocessing

I preprocess the data by resizing the smaller side to 224 and cropping the second side to 224. The cropping is random for the training set and uniformly center for the test set. While originally the resizing happened prior and cropping happened during training, I’ve now restructed the code such that sizes are uniform prior to being zipped and uploaded to colab.

Reorganizing

I reorganize the file structure to work with ImageDataGenerator and flow_from_directory from Keras, which require a file structure significantly difference from the original data set. This is similarly done prior to upload, and the code below expected that the test and train image directories are zipped and uploaded to colab. I display images from both sets to verify that resizing, cropping, and reorganizing didn’t break anything.

I experimented a bit with jupyter hub and AWS but it seems like colab is the best option for getting access to GPU. That said, even when running a light model like mobilenetv2 I get a warning that I’m running out of GPU.

I use ImageDataGenerator to facilitate the model creation, ensuring to shuffle the training data and including a seed for reproducibility. Importantly, I change class_mode to categorical to ensure the proper handling of our data.

Name Learning Rate Batch Size Epochs Loss Accuracy Time
From Scratch 0.001 71 10 5.8862 22.54% 17.8 mins
With Data Augmentation 0.001 71 10 2.5033 22.88% 18.5 mins
MobileNetV2 0.001 71 10 0.7060 79.51% 20.8 mins
ResNet50 0.001 71 10 8.37 * 10^-5 83.28% 50.3 mins

Convolutional Neural Network: From Scratch

The model from scratch follows a somewhat arbitrary structure (based on the 209a hw) as follows:

Convolutional 2D 5x5 relu Convolutional 2D 3x3 relu Max Pooling 2D 2x2 Convolutional 2D 3x3 relu Convolutional 2D 3x3 relu Max Pooling 2D 2x2 Convolutional 2D 3x3 relu Convolutional 2D 3x3 relu Max Pooling 2D 2x2 (flatten) Dense relu Dense softmax

The model took about 17 minutes to train and had a training accuracy of 99% and a test accuracy of 22.5% (overfit).

Convolutional Neural Network: From Scratch, Data Augmentation

For the next CNN variation, we added on the fly data augmentation. Specifically, we added horizontal flipping, a random 20% colour channel shift, and 15% dropout.

The CNN model with augmentation took about 18.5 minutes to train and had a training accuracy of 77.2% and a test accuracy of 22.9%. Without augmentation, the training accuracy was 99.3% and the test accuracy was 22.5%.

The test categorical loss with augmentation was 2.503, compared to 5.886 without augmentation.

Compared to the original CNN we built from scratch, data augmentation appears to cause the model to be less overfit to the training set and yields slightly better performance on the test set. In particular the categorical loss is improved a lot with on the fly data augmentation, even though the accuracy score is only marginally improved.

Convolutional Neural Network: MobileNetV2

I chose MobileNetV2 at Camilo’s suggestion and knowing that I’ve already been running out of GPU so I needed something lightweight and fast. I remove the top and add output layers to fine tune the model to our eight super classes.

I again use ImageDataGenerator, and I first import with include_top=True to get the structure of the top layers, and then with include_top = False to finetune the model to our training data and to use our superbreeds as classes. The layers I add are Global AveragePooling2D and a Dense output layer with softmax. I use Adam optimizer with a learning rate of 0.001 and categorical_crossentropy loss, and ten epochs. Ten epochs gave a good test accuracy, but it could be argued that we could use fewer epochs as we are overfitting to the training data based on the accuracy above.

The pretrained model took about 21 minutes to train and had a training accuracy of 100% and a test accuracy of 79.5%.

Convolutional Neural Network: ResNet50

We also decided to try out the classic ResNet50 pre trained model with images trained on ImageNet. To better compare this pre trained model’s performance with MobileNetV2, we again used an Adam optimizer with a learning rate of 0.001, categorical_crossentropy loss, and ten epochs of training.

ResNet50 took about 50 minutes to train and had a training accuracy of 100% and a test accuracy of 83.3%. Compared to MobileNetV2, ResNet50 took significantly longer to train while only yielding about a 3% improvement in the test accuracy. The longer training time and the test accuracy improvement both can be attributed to the more complex structure of ResNet50.