The rotation model for our two-model approach had a similar model architecture to the landmark model V1. To achieve more stable results, the inner epoch evaluation was done by averaging the results on three different augmentations instances on the validation set.
For the final evaluation of the validation and test sets, we used 10 different augmentation instances. These augmentation instances insured a high level of generalization to different frog directions.
To reduce training time and hardware capabilities, and since rotation classifications did not need as much complexity as landmark detection, images were downscaled to 128*128 pixels. The rotation model’s metrics were similarly to those used in the landmark detection models.
We needed a new loss function to account for the non-linear relation of the rotation variable. Since the values of our rotation variable ranged between 0-360 but 350 was closer to 0 than 20.
A straightforward method would have been to use sines and cosines to automatically consider how angles change. However, the derivative of the loss function does not act well, since sine’s derivative is cosine, which is perpendicular to sine. In other words, for a value difference of 0, the derivative is maximum and equals 1. This causes the use of sine and cosine to be very tricky, in finding a correct combination to allow correct derivatives and easy convergence.
Another problem is that both functions are not convex, and this causes another computational obstacle. In view of these two problems, we ended up using a different approach. Our approach aimed to predict the norm vector from the vent to the snout and then evaluate the rotation of the vector found.
The “vent to snout vector” approach allows the loss function to correctly consider how angles work, while still having an informative derivative. Yet, this loss function had one drawback. If a very small vector was predicted, a very small change in either x or y coordinates will cause a relatively large change to the angle. To mitigate this drawback another term was added. This new term strives to keep the vector size close to the ground truth. By keeping the vector size large, the angle of the vector is stable under small variations in the x and y coordinates. The final loss function used is described below.
Loss function we used to account for the non-linearity of rotation angles.
Where g is the ground truth vector, p is the predicted vector, and α is a term to control loss function attention between angle to size.
While examining the results of the rotation models (table below), we found out that images incorrectly classified with more than 40 degrees of a difference than the ground truth were mislabeled images. This means one image was misclassified for the test set and four for the validation set. In the 30-degree range, almost all images were classified correctly, therefore we safely assumed a 30-degree augmentation for the landmark detection model. In addition, the standard deviation was relatively small, showing the stability of the predictions.
Rotation Model’s results on validation and test sets.