基于多通道单回归的太平洋长鳍金枪鱼渔场预测模型与可解释性研究

PREDICTION OF PACIFIC THUNNUS ALALUNGA FISHERY BASED ON MULTIPLE CHANNEL SINGLE REGRESSION MODULE WITH EXPLAINABILITY

  • 摘要: 为提高太平洋长鳍金枪鱼渔场预报准确率, 探索深度学习及可解释性方法在长鳍金枪鱼渔场预报领域中的应用, 采用太平洋区域(120°E—80°W、45°S—45°N) 2000—2021年长鳍金枪鱼延绳钓渔业数据, 选取了月份、经纬度、海洋温度、海洋盐度、叶绿素浓度、混合层深度等16种原始特征数据, 涉及标量、向量、矩阵和张量, 通过组织残差结构的卷积神经网络和全连接神经网络, 构建了一种新型多通道单回归深度学习模型。该模型能同时使用不同大小不同分辨率的数据, 利用卷积运算的强大适应能力提取各类环境因子的潜在特征并进行融合完成预测任务。同时引入SHAP (SHapley Additive exPlanation)可解释方法对样本各个参数贡献度进行分析, 沿不同维度累加SHAP值挖掘环境因子与渔获量间的内在关系, 为科学分析提供实用的分析方法。结果表明, 本模型能通过海量异构数据正确学习特征因子与长鳍金枪鱼渔场位置和渔获量间的相关关系, 与其他渔场预测模型(随机森林、XGBOOST、广义加性模型、支持向量机、长短期记忆网络和BP模型)相比, 该模型的均方误差、均方根误差和平均绝对误差最低, 分别为0.00322、0.0567和0.272, 较其他模型相比降低了3.9%—82.6%。该研究模型可以有效适应异构数据输入, 完成端到端的学习任务。同时, 通过对SHAP值的多维度分析也证明了渔获量分布与许多环境因子具有高度相关性, 包括海洋温度、海洋盐度、混合层深度和海面异常高度。除此之外, 可解释性方法还揭示了溶解铁对长鳍金枪鱼渔获量的相关关系。可解释性深度学习可以作为一种新的特征因子研究方法应用在环境因子与生物习性的相关性研究领域中, 为传统生物学特征研究提供新的研究思路。

     

    Abstract: Thunnus alalunga is a highly migratory oceanic fish, widely distributed in the Pacific, Atlantic, Indian, and Mediterranean regions. Improving the accuracy of Pacific albacore fishery locations and cache predictions not only enhances the efficiency in deep-sea fishing, but also measures the saturation degree of albacore fishery and provides a theoretical support for the sustainable fishery development. Based on the longline fishing data for albacore in Pacific (120°E—80°W、45°S—45°N) from 2000 to 2021, 16 environmental factors including month, longitude, latitude, sea water potential temperature (ST), sea water salinity (SS), chlorophyll a, dissolved iron (Fe), primary production (PP), dissolved molecular oxygen (DO), PH, surface partial pressure of carbon dioxide (SPCO2), phytoplankton expressed as carbon (PHYC), ocean mixed layer thickness (MLD), sea surface height (SSH), eastward wind (EW) and northward wind (NW) have been choose with various shape and size. We proposed a novel Multiple Channels Single Regression (MCSR) module built by stacked convolutional operators with residual structures and fully connected layers. The module is divided into 3 components: the “root” component derives feature maps from various environmental factors, processing each factor in the sample differently; the “bulk” component concatenates the feature maps from the root and extract features about whole sample; and the "head" component computes the likelihood of predicting a T. alalunga fishery location and the expected catch. For measuring the performance of this module, we employed SHAP to calculate the contributions of each environmental factor, leveraging the additive contributions from various factors to reveal relationship between factors and catch. When compared to traditional fishery forecasting models, including Random Forest, XGBOOST, Generalized Additive Model, SVN, Long Short-Term Memory network, and BP, our module achieved the best performance, with MSE, RMSE, and MAE values of 0.00322, 0.0567, and 0.0272, respectively, outperforming other models by 3.9%—82.6%. The MCSR module demonstrated superior performance among statistical and ensemble learning modules, as well as deep learning modules. The SHAP summary, aggregated across various factors and locations, revealed that the module effectively learns potential relationships between factors and fishery caches, minimizing redundancy and noise. From a modeling perspective, this approach adapts well to heterogeneous data inputs, enabling end-to-end learning and accurate forecasting of fishery locations and catches according to various environmental factors. In the aspect of biology fields, this study introduces a new method combining deep learning modules and explainable approaches, which can be used in researching relationships between species and environmental factors.

     

/

返回文章
返回