Through key collaborations with various fisheries and dedicated effort by our in-house annotation team this project benefited from training based on almost 2 million annotations for fish, humans, and gear in longline fishing contexts. Collecting this data provided valuable learning, allowing algorithms to be exposed to a wide variety of cameras, viewpoints, weather, lighting, and vessel conditions from real-world imagery.
We experimented with various model sizes; various state-of-the-art deep learning-based object detectors, and examined the trade-offs. Our best model achieved an average precision of 91%, however, some smaller models also presented good results (e.g. 85%) with much lower processing costs. An 11% performance gap was noted between "seen" and "unseen" vessels. Counting proved challenging using conventional approaches achieving only a 62% accuracy. Classification of target species proved generally reliable but experienced a fall-off in performance on "unseen vessels" and classification of less common species or bycatch proved challenging. AI experienced challenges that are also experienced by humans during tuna classification such as reliably differentiating bigeye tuna from juvenile yellowfin tuna.

Example of the challenge of identifying yellowfin vs bigeye tunas from the Western Central Pacific Fisheries Commission Handbook for the Identification of Yellowfin and Bigeye Tunas.
We proved the feasibility of utilizing computer vision and machine learning techniques to analyze electronic monitoring video with human-level accuracy achieving less than 15% error in our initial target market.
In our novel algorithm development we explored three areas we felt would improve AI performance overall. The first was an approach to handle object occlusion, for example, when a fish passes behind a fishermen during the tracking. Baseline models would lose a fish when obscured by another object resulting in double counting. This is a key gap for computers compared to human reviewers who understand the fish will re-appear momentarily. The second challenge proposed a strategy to help the algorithm maintain attention toward a fish track for longer. A human reviewer, for example, is able to see a fish come onboard, watch a considerable amount of handling of the fish in subsequent video, and remember that the fish is still the same fish. Our aim was to help the AI to deliver a similar capability. Finally, we examined the integration of sensor data from the EM system alongside visual inputs, as a mechanism to improve AI analysis. Human reviewers sometimes will consider factors such as vessel speed, and the timing of gear entering and exiting the water to find locations in the video to begin their review. We examined the possibility of applying such techniques for video pre-processing. The addition of occlusion methods and long range data made significant impacts on our ability to accurately count fish moving to 88% counting accuracy from our baseline of 62%. Incorporating sensor data into our approach accurately identified set and haul periods 90% of the time suggesting this data, when available, could be utilized to create a highly efficient pre-processing step in the pipeline.


Remaining challenges to be addressed following this project included improvements to classification performance for bycatch species and improved processing speeds.