Interpreting Electronic Monitoring System Videos of Fish Catch through DL

Author: Jie-Yan Lu, Hong-Yang Lin, Mao-Hsiang Huang, Yan-Fu Kuo, Department of Bio-mechatronics Engineering, National Taiwan University

Fig. 1. Flow chart of the processing of an EMS video from recording to information retrieval

Global climate change and population expansion have spurred countries to undertake the problem of insufficient food resources, including the consumption of marine resources. Many fisheries management organizations have started to manage and monitor fishing activities in the High-Seas for the sustainable development of marine resources. The acquisition of accurate fish catch information is critical in achieving optimal management. The catch information includes the fish species, body length, time of catch, and latitude and longitude of the fishing location. Taiwan is a member of the regional fisheries management organization and has great capabilities in offshore fisheries. Therefore, Taiwan bears the obligation to collect and report the correct fish catch information. However, due to the shortage of human resources in Taiwan, there are only a few marine observers available to record the catch information. The rest rely on the fishers to declare the catch information by themselves. To solve the above problems, electronic monitoring systems (EMSs) have become essential to ocean fisheries in recent years. Some countries have implemented EMS on vessels to assist in the recording of fishing information to reduce the human resource requirements and simplify the process. Among them, the video-type EMS can record fishing procedures on a deck, providing a complete video recording.

Nevertheless, these videos need to be analyzed manually. For example, to calculate the fish body length, observers must identify the positions of the head and tail of the fish manually. The durations of the videos are often in thousands of hours or more, making the manual interpretation process extraordinarily labor-intensive and time-consuming. Therefore, the development of a video interpretation system of EMS applied to deep-sea fishing vessels is essential.

An EMS includes a waterproof camera on the deck, a GPS device, and a computer. The fishing videos were recorded by the camera and stored in the computer memory on the vessel. The team retrieved the fishing videos sent to an onshore data processing center, and applied deep learning methods to achieve automatic interpretation of fishing videos. Fig. 1 illustrates the processing of an EMS video from recording to information retrieval.

EMS video interpretation system

Fig. 2. Flow chart of the EMS video interpretation system

The architecture of EMS video interpretation system uses two deep learning models, as shown in Fig 2. Models 1 and 2 use Mobile Net v2 and mask region-based convolutional networks (Mask R-CNN), respectively. Since the EMS videos are often several days long, there are not many video clips that have fish catches. Therefore, the system firstly finds the clips with catching activities through model 1, and then process these clips to identify and collect the fish statistics by model 2.

In model 1, the team used images with and without catching activity to train the Mobile Net v2. The trained model was able to identify if a video clip contains fish catching activity at an accuracy of 99.17%. In model 2, the team used various fish images captured by EMSs to train a CNN model to detect the body of fish and identify its type automatically. The applied CNN model, Mask R-CNN, was different from the general object detection model. It could not only provide a bounding box on the target found in an image but also generate a color mask for the target. This capability is helpful to measure the fish body length.

Fig. 3. Flow chart of fish detection and statistics acquisition

By analyzing the videos using model 1, we were able to extract video clips with fish catching activities. The subsequent process of fish identification and statistics acquisition is shown in Fig. 3. First, the videos were converted into an image set in sequential order and then fed into a trained Mask R-CNN model to detect fish. If a fish was detected, the model would cover the fish body with a mask in the image (red mask shown in Fig. 3). The next step is to remove false-positive detections using the time threshold algorithm. Then the distance threshold algorithm was applied to count the number of fish, without double counting. The results in Fig. 3 showed that in a ten-minute video, four fish were detected in the following order: shark, tuna, tuna, and marlin. Their body length information was also marked in Fig. 3. At present, this model can successfully detect several types of fishes, such as tuna, marlin, and shark. In 200 test videos, the accuracy of fish identification is 98.06%, and the accuracy of the fish count is 77.31%.

The team has successfully developed a deep learning application system that can analyze EMS videos and collect harvested fish statistics (fish type, fish body length, location, and quantity). This system can efficiently extract fishing data of fishing vessels at sea, reduce labor costs, and provide relevant fishing analysis. In the future, it will be applied to longer EMS videos, and further improved to enhance the overall accuracy and stability. This advancement in Taiwan's fishery technology is expected to contribute to the management of marine resources.

Copyright © All Rights Reserved.
This website is operated by China Productivity Center
Visitors：223921

Projects

Interpreting Electronic Monitoring System Videos of Fish Catch through DL