An affordable and easy-to-use tool for automatic fish length and weight estimation in mariculture

[ad_1]

Farming site characteristics

The study was carried out at the “Maricoltura e Ricerca Società Cooperativa” fish farm (43°03′34.0″N 9°50′19.4″E) approximately 0.17 nautical miles off Capraia Island (Tuscany Region, Northern Tyrrhenian Sea, Italy). The farming site area is characterized by a rocky bottom, water depth of about 35–40 m, dissolved oxygen concentration of 5.95 ± 0.30 mg/L (mean ± SD), and annual surface water temperature of 24.13 ± 0.53 °C (mean ± SD). The facility consists of ten circular sea cages, eight 2400 m3 cages dedicated exclusively to gilthead seabream and European sea bass (Dicentrarchus labrax L.) farming, and two 900 m3 cages also devoted to experimental trials.

Smart buoy and stereoscopic camera characteristics

The smart buoy was composed of a 1.2 m × 0.2 m stainless steel cylinder, fixed to a 0.6 m wide float (Fig. 4a). The device was equipped with a lithium battery pack, a 4G network connection router, a multiparametric probe (measuring temperature, pH, and dissolved oxygen), and a stereo camera (Fig. 4c). The buoy was anchored inside a commercial scale farming cage for underwater image recording and tied by ropes to the floating collar of the cage. The integrated stereo camera was placed at a depth of about 0.7 m and sealed in a waterproof housing (plexiglass cylinder). The mounted camera was an 8MP Arducam synchronized stereo camera consisting of two 8MP IMX219 camera modules capable of taking pictures simultaneously thanks to a connection on Raspberry Pi (Table 2). The two-camera lenses were spaced 8 cm apart on the vertical axis and the device was oriented towards the cage net (Fig. 4a–d). The smart buoy transmitted over a mobile network to a cloud-based site where images and data were stored. The images were accessible for download on a personal computer. During the trial period (3 months, from 1st July to 30th September 2021), the fish cage hosted about 4000 gilthead seabreams (mean weight 606 ± 103 g). Daily feeding and all routine farm procedures were performed by the farmers’ operators during the entire trial period. At the end of the experiment, 200 fish were collected and standard length (SL) and weight (W) were recorded to determine the length–weight relationship curves and compare the results obtained from the image analysis to the actual size of the fish.

Figure 4

(a ,b) Photos and images of buoy positioning inside the sea cage (Photos by N. Tonachella and M. Martinoli). (c) Close-ups of the stereo camera and plexiglass waterproof housing (Photo by F. Capoccioni), and (d) diagrams of the stereo camera’s fields of view.

Table 2 Technical characteristics of the stereo camera.

Image data calibration and analysis

There are a variety of approaches for geometric camera calibration23. Many of them have in common the use of markers or patterns, which represent visible and distinguishable object points. These object points and their corresponding image points are then used as observations to determine the parameters of the camera model(s) and the relative orientation(s)24. In the underwater environment, the calibration must model and compensate for the refractive effects of lenses, the housing port, and the water medium10. Computer vision approaches often use 2D test fields in the form of chessboard targets. Usually, the 2D calibration technique employs a planar calibration pattern of alternating black and white squares to determine the intrinsic and extrinsic parameters of the camera25.

Camera calibration and model fitting

In this study, several replicate calibrations were performed in the water and in various orientations using a chessboard (270 × 190 mm) as a planar calibration pattern (Fig. 5). The corners of 15 squares were manually marked, each square measuring 27 × 27 mm. In this phase, the chessboard images were used to estimate the camera’s radial distortion parameters (Eq. 1, distortion matrix26), and to correct the refraction caused by the propagation of light through different substances27. Estimating the radial distortion coefficient helped to remove the barrel and cushion effects brought in by the camera and the housing.

$$begin{gathered} hfill \ begin{array}{*{20}c} {left[ {begin{array}{*{20}c} {f_{x} } & 0 & 0 \ s & {f_{y} } & 0 \ {c_{x} } & {c_{y} } & 1 \ end{array} } right]} & {begin{array}{*{20}l} {frac{{left[ {c_{x} ~c_{y} } right] – {text{Optical~center~}}left( {{text{the~principal~point}}} right),{text{~in~pixels}}~}}{{~left( {f_{{x~}} ~f_{y} } right) – ~{text{Focal~length~in~pixels}}}}} hfill \ {f_{x} = {raise0.7exhbox{$F$} !mathord{left/ {vphantom {F {p_{x} }}}right.kern-nulldelimiterspace} !lower0.7exhbox{${p_{x} }$}}} hfill \ {f_{y} = {raise0.7exhbox{$F$} !mathord{left/ {vphantom {F {p_{y} }}}right.kern-nulldelimiterspace} !lower0.7exhbox{${p_{y} }$}}} hfill \ {F – {text{Focal~length~in~world~units}},{text{~expressed~in~millimeters}}} hfill \ {frac{{left( {p_{{x~}} ~p_{y} } right) – ~{text{Size~of~the~pixels~in~world~units}}}}{{{text{s}} – {text{Skew coefficient}},{text{ which is non}} – {text{zero if the image axes are not perpendicular}}}}} hfill \ {{text{s}} = f_{x} tan alpha } hfill \ end{array} } \ end{array} hfill \ end{gathered}$$

(1)

Figure 5

(a) Underwater image of the chessboard pattern used for calibration of real pixel size. (b) Example of fish silhouettes validation and error estimation in the cage. (c) Bounding boxes of automatic annotated fish with confidence thresholds. (d) cropped images of single fish and landmarks apposition. (e) translation example of fish silhouettes comparing the two stereo-camera images (red dots represent the same landmark from different lenses).

In a second phase, the stereo images (one pair of images per photo shoot) were properly marked with 4 pairs of reference points (landmarks) in different positions of the board (corners) to estimate the translation of each pixel of the board between the two stereo images; the relative translation of the target in the two stereo images is directly related to the distance between the camera and the target itself (Fig. 1). The closer the target is to the camera, the greater the translation of the target between the two stereo images. This was key information to correctly estimate the actual target size in pixels. A total of 52 images and 208 single measurements from the calibration chessboard were used to compute the relationship.

Measurement error estimation

In order to estimate the measurement error (mean absolute percentage error – MAPE), 17 photos of plastic fish silhouettes of four known different sizes were taken and processed (standard length of 22.0–24.2–29.0–33.6 cm each, Fig. 5). The images were captured by placing the targets in front of the camera, at increasing distances. The known lengths of the fish silhouettes were compared with the estimated lengths of the AI to calculate the error in cm and as a percentage of the fish body length.

AI automatic fish recognition, landmarks positioning, and fish length measurements

During the image acquisition stage, the fish swam freely within the cage, without being oriented along any of the x–y axes of the camera plane. For fish body length estimation, a complex AI pipeline was designed (Fig. 6). The pipeline was split into smaller packages to break down the final pipeline task into its components and thus simplify and manage the analysis more efficiently.

Figure 6

The overall process of the proposed method: automatic AI fish recognition and size estimation.

The raw stereo images were fed to an improved Convolutional Neural Network (CNN) called You Only Look Once (YOLO) v428. As an excellent one-stage detection algorithm, the YOLO series algorithm has high detection accuracy, fast detection speed, and is widely used in various target detection tasks. Several studies applied this algorithm for detection purposes: Tian et al.29 used improved YOLOv3 to detect apples at different growth stages; Shi et al.30 proposed a YOLO network pruning method, which can be used as a lightweight mango detection model for mobile devices; and Cai et al.31 proposed an improved YOLOv3 based on MobileNetv1 as a backbone to detect fish. In this study, YOLOv4 CNN was trained with 1400 properly annotated images (Training: n = 1120, validation: n = 280), collected from the Open Image dataset (Open Images Dataset V6 – storage.googleapis.com) and from the field, to locate individual fish within the image using bounding boxes. The training was carried out for 6000 iterations and reached a CIoU Loss = 1.532 and an mAP = 87%33. In a second step, each bounding box was used to obtain the individual image of the fish, which was then entered into a well-known CNN, RESNET-101 (RES101)34, optimized for image recognition. The training of RES101 was carried out in Pytorch35 using the transfer learning technique as proposed by Monkman et al.36. Moreover, as for the individual fish location, the automatic landmarks detection was achieved using CNN, RESNET-101, with the last layer being modified to detect two landmarks (the snout tip and the base of the middle caudal rays) on the fish shape. The training (n = 8960) and test dataset (n = 3840) were obtained from 200 field pictures where each relevant individual fish were extracted and manually annotated with the landmarks required. Each image was then fed into an augmentation algorithm that generated 64 augmented images of different levels of scale, noise, rotation, translation, and brightness. The process generated the final image dataset of 12,800 pictures. The training was carried out for 100 epochs and generated an MSE = 0.23 (Mean Square Error between the predicted and true landmarks. This automatic landmark positioning allowed the algorithm to measure the fish length in pixels by counting the pixels between the two points. Finally, the length unit was transformed from pixels to centimeters, using the translation information derived from the chessboard target images during the calibration phase: the further a target is placed from the cameras, the less it translates between a pair of stereo images and vice versa. The extent of this translation can be measured both as an angle (the parallax angle) or as a distance in pixels between the same point in the two stereo images (Fig. 5e); the same technique is used in the parallax method for estimating the distance of the stars37. Since the size of the chessboard was known, the translation in pixels of the chessboard’s landmarks was then plotted against their corresponding ratio between length in cm and length in pixels; the fit model generated has become more accurate the more stereo images were tested in different positions within the camera’s field of view.

The observation points obtained by the described computation were then entered into a polynomial Ridge Regression algorithm which produced a function minimizing the Residual Mean Square Error38. This model was finally used to estimate the micron/pixel converting factor of a given fish, through which the final length in cm is obtained. The distribution of standard length values achieved employing the AI algorithm (n = 124, called “AI estimated”) was compared to the one directly measured on a subsample of 190 fish (called “Sampled”) collected in the same cage where stereo images were taken. The fish employed in this study were harvested by the farm staff and destined for sale in large retailers, as they pertained to the fish farm. A random sub-sample of all catches from the cage was given to the researchers for comparative analysis. Therefore, we treated fish already sacrificed, in line with the current national legislation for farmed animals. The length comparison was carried out using a quantile–quantile (q-q) plot, i.e., a graphical technique for determining if two data sets come from populations with a common distribution (Fig. 3). If this assumption is true, the points in the scatterplot should fall approximately along the 45° reference line plotted in the graph. The greater the deviation from this baseline, the greater the evidence that the two data sets come from populations with different distributions. The Shapiro–Wilk test (alpha level = 0.05) was performed to test for the data set’s normality, whereas the Leven test (alpha level = 0.05) was carried out to assess the homogeneity of variance between the two distributions. Then, the Welch-F test (alpha level = 0.05) was used to compare the distribution means in the case the homogeneity of variances was violated.

Finally, a length-to-weight relationship (LWR)39 (Eq. 2) was determined using body weight (g) and standard length (cm) measurements from sampled fish (n = 198):

where W is the body weight of the fish, a is the intercept linked to body shape, L is the standard length, and b is the exponent interrelated to variations in body shape. The obtained LWR relationship was used to calculate the fish weight from length values derived from the images processed by the AI40(fish were collected on the same day the photos were taken by the stereo camera).

The experimental activities involving animals conducted in this study, including their ethical aspects, were approved by the Animal Welfare Body of the CREA Centre for Animal production and aquaculture (authorization n. 86670 of 23/09/2021). No human experiments were performed, nor were human tissue samples used. All the people depicted in the images represent the authors of the study during the logistical organization of the calibration tests. Informed consent was obtained from all individual participants both for participation in the study and for the publication of identifying information/images in an online open-access publication.

[ad_2]

Source link

Related posts

Nayanthara: The Meteoric Rise from South to Bollywood and the Bhansali Buzz 1

“Kaala premiere: Stars shine at stylish entrance – see photos”

EXCLUSIVE: Anurag Kashyap on Sacred Games casting: ‘Every time…’