In this short post, I will share how our team of volunteers (data rangers) used machine learning to identify fish from underwater videos. I hope the description of our work processes and methods are useful for future work in this area.

Data rangers involved in Spyfish Aotearoa

Data rangers involved in Spyfish Aotearoa.
From left to right, Nitzan Barzilay, Itamar Eraz, Matan Atsmony and Eran Paz.

As a nature and wildlife enthusiast, and as someone who spent many years in Eilat, I always regretted the condition of the coral reef there. What used to be a vibrant home to many species, some of them unique to the Red Sea has been terribly damaged by human activity, similarly to what happened to many other reefs around the globe.

When I started my computer science and statistics studies, I didn’t think that what I will learn in class can be useful for the conservation of species and marine life. Volunteering for Wildlife.ai was an amazing opportunity to use my skills for a good cause and to have a real impact on the efforts to conserve marine species.

Today, conservation scientists analyze and assess the marine life in the ocean around New Zealand by manually counting the number of fish in short videos taken underwater. What could help this research is to come up with a system that can count the fish automatically, but devising such a system turned out to be very complicated.

Short underwater video recorded in Parininihi Marine Reserve, Aotearoa New Zealand.
Baited underwater video recorded by the New Zealand Department of Conservation.

The most developed technology in the area of object detection and recognition is CNNs, which requires a lot of annotated data to be trained properly, as well as massive computational resources. However, when we tried to train a model detecting species in New Zealand’s ocean, we had neither. The extent of the annotated data we had was small. We could not enlarge it by adding data from other datasets, due to the distortion of our camera, caused by the fisheye lens, and the poor visibility of fish in the muddy water. These conditions yielded a very distinct dataset in comparison to other datasets. Moreover, we were short of computational resources and so we had to find an efficient way to create such a model.

Short underwater video recorded in Kāpiti Marine Reserve, Aotearoa New Zealand.
Baited underwater video recorded by the Guardians of the Kāpiti Marine Reserve.

The basic system that we thought would serve the scientists best is a short pipeline that receives video clips and detects the amount of each species in each video frame. Other features like visual verification of the results and some statistics are also crucial for the system’s success in maintaining high accuracy. While the pipeline itself was simple to plan and put together, acquiring enough annotated data and finding a model that will yield satisfying results – wasn’t as easy. Experiments of “easy” solutions, for example, using open-source models to detect fish on our data, showed poor results. Therefore, we had to start from scratch and create a new model for the unique data we had

Example of a model identifying fish in underwater footage

Example of a model identifying fish in underwater footage similar to the baited underwater video.

Here is some detailed information on our work: While many articles suggest ways to improve object detection in general and fish detection in particular, it was challenging to choose an approach that fits our needs. Eventually, we chose to use a pre-trained Resnet-50 on Imagenet as the CNN’s backbone and fine-tune faster R-CNN. Inside the R-CNN, it is possible to freeze some of the layers. We did experiments that included unfreezing only the ROI and RPN, and others that have had unfreezing of the FPN layers as well.

In addition, we tried to optimize the selection of hyperparameters for the training. For example, we used Optuna, a hyperparameter automated search framework that uses a Bayesian approach, and more specifically, the TPESampler algorithm to optimize the resources needed for the task. As for the data itself, we first found some other data sets that look similar to ours, and then we tried training with and without them. Moreover, we tried many augmentations such as vertical and horizontal flips or adjusted saturation and Gaussian blur of the image. We tried to think which augmentations would best serve the different images from the different datasets we had.

A manually labelled frame of baited underwater video

A manually labelled frame of baited underwater video.

Managing such a project might be a very complicated task. For example, it requires many experiments such as model selection or hyperparameters optimization. When we experimented with different ways to enlarge the data we have, we had to be very careful, because receiving good results does not necessarily mean improving the model. We tried to constantly monitor how changing the data that we “feed” into our model affects the results of the model, while trying to figure out the best hyperparameters for that dataset. These challenges were only rendered more difficult as we had a limited amount of computational resources.

During the work process, we learned about Clear.ML, which turned out to be very useful. Clear.ML is a machine learning framework that provides an experiment manager platform, which proved very helpful to us along the way.

First, it helped us in sharing the results across the team and enabled us to monitor them on the run. The training we had to perform was a very heavy computational task that we had to run over and over again. This is why getting real-time results along with the training in an organized way was a genuine life-saver. For example, the fact that we could compare the results fairly easily and keep track of the experiments we conducted, enabled us to work more efficiently and orderly.

Representation of the model outputs in Clear.ML

Representation of the model outputs in Clear.ML

Second, the platform is very flexible. It allowed us to perform all the tasks we needed, such as hyperparameters optimization with excellent integration with Optuna, train and validation, plotting graphs, and plotting images to test the model visually. 

Third, it allowed us to compare our results to the results of experiments run by others, making it easier to think ahead and monitor problems in our experiment. Like any other platform, Clear.ML is not problem-free, but its huge advantage is that when we encountered such a problem, we could simply reach out to the amazing Slack community and get a quick solution that fixed our problem.

We significantly improved our basic model. It turned out that we got the best results when we unfroze the FPN layers during the trainingIn addition, adding data augmentations helped to improve the validation loss. We especially noticed that it had the effect of lowering the false positive.

Example of the fish automatic identification of baited underwater video.

Example of the fish automatic identification of baited underwater video. 

There is a lot to learn and to do before one can reach a full scale operating model that can be used by the scientific community of New Zealand. As we learned along with the project, one of the main challenges is the lack of annotated images from the domain to which we were trying to fit the model.

One of the magical things about Wildlife.ai and particularly this project is the ability of anyone to easily contribute to the effort. For example, spending some time tagging fish images in the Spyfish Aotearoa Zooniverse project, a task that almost anyone can do, might have an extensive impact on projects like ours. I’m grateful for the opportunity I received from Wildlife.ai to be a part of this project. I have learned much from working with this wonderful organization. In particular, I have learned how communities and scientists can collaborate and work all together to protect wildlife.

Homepage of the Spyfish Aotearoa community science webpage.

Homepage of the Spyfish Aotearoa community science webpage.

I want to thank Victor Anton, Bar Vinograd, Professor Matan Gavish and Nitzan Barzilay for their dedicated work, and help along the way. In addition, I want to thank Clear.ML for their aid in this project and for providing us with computational resources, and especially to Erez Schneider, whose kind help was much valued. I very much appreciate the guidance and wise advice of Gal Hyams and Eran Paz that were involved in the work process all along the way. A special thank is saved for Matan Atzmony, who worked alongside me on this project and made our vision a reality.

Share this story!