For our Landmark Detection model, we chose six of the most important feature points to measure frog’s size. These landmarks allow an optimal ability to morph and crop the image to standardise, as much as possible, the images before we run the Frog Identification and Size Estimation models. The six feature points we chose were: 1) tip of the frog’s snout, 2) left front leg, 3) right front leg, 4) left eye, 5) right eye, and 6) vent.
Archey’s frog labelled image, the green dots are the labels to find.
In order to label the data, we created a Zooniverse project, in which volunteers manually labelled the landmark in as many images as they wish without the need to register or download specific software. This allowed the number of labelled images to reach a number of 1,642 images, of which 170 were labelled twice.
We used images that were labelled by at least two different volunteers to understand how accurate users were at manually labelling the photos. As can be seen in the figure below, most double labelled images were very closely labelled, except for a few outliers, that were mostly due to program errors, and poor understanding of the correct points to label.
Difference in labels between images labelled at least twice, where the distance is normalized by image size.
The most common mislabeled point was the front legs, in which a couple of volunteers misunderstood the assignment and wrongly classified the knees instead. The mean difference without outliers was 1.4% of the image with a standard deviation of 0.7%. Due to the low number of overall images, the cutoff that we used was two standard deviations from the mean, which is 2.8% of the image.
In at least twice-labelled images, the mean of the labels was considered to be the ground truth, while ensuring that their difference by the percentage of image size was small, if not the image would be discarded.
After aggregating the labels from multiple volunteers and dealing with the outliers we ended up with a total of 1633 relevant images for classification.
Examples of inaccurately labelled frogs.
Probably an outcome of program malfunction (left) and a user incorrectly labelling the “knee” instead of the joint (right).
The data was split into train, validation, and test sets. To avoid data leakage, the images were split by frog ID to ensure that no images of the same frog were included in more than one data set. The split was as follows: 1124 images (70%) to train, 257 images (15%) to validate and 252 images (15%) to test the models.
Considering computer vision problems in general, and Landmark Detection problems in particular, 1633 is a relatively small number of images. Capturing all the image variations and complexity of the task with such a limited dataset is a big challenge. Moreover, as opposed to Image Classification, image resolution is crucial for correct predictions, especially when the prediction must be extrapolated to the original image size. This restriction causes the feature space to be extremely large, and a model trained only on 1633 images is unlikely to generalize well.
To overcome the scarcity of training examples, a new python generator was created using the image augmentation library imgaug. The generator was able to replicate all variations that were seen in the original data: the rotation, the scaling, the shear, and the different lighting.
In the next post, I will explain the models we used, the results we got and discuss the next steps for our Pepeketua (frog) identification project.