A SINGLE-VIEW 3D MODEL RECONSTRUCTION METHOD FOR YANGTZE FINLESS PORPOISE
-
摘要:
在江豚三维重建领域, 存在水下图像色偏失真、江豚数据集不足、获取江豚多视角图像困难等问题, 而新兴方法尚未出现针对江豚的应用研究。为了解决这些难题, 文章提出了一种结合扩散模型和神经辐射场的单视图江豚三维模型重建方法。首先, 改进水下图像增强方法, 有效地解决水下图像色偏失真的问题。其次, 自制江豚多视角图像数据集, 微调视角条件扩散模型, 实现由单视图合成多视角图像, 为单张图像重建江豚提供了新思路。最后, 由神经辐射场进行重建, 得到江豚三维模型。对江豚三维重建的结果使用平均倒角距离和法向量一致性进行了对比评估, 平均倒角距离低于现有方法, 法向量一致性高于现有方法, 表明文章方法能够有效重建出符合江豚体色及形态的三维模型, 合成新视角图像PSNR、SSIM、LPIPS值分别为38.968、0.972和0.294, 效果优于现有方法, 经过水下图像增强的重建结果的平均倒角距离值最低为0.428, 法向量一致性最高达到0.882。
Abstract:In the field of 3D reconstruction of Yangtze finless porpoises, challenges such as underwater image color distortion, limited datasets, and difficulty in capturing multi-view images of Yangtze porpoises remain significant. Emerging methods have yet to address these issues specifically for Yangtze finless porpoises. To tackle these challenges, this paper proposes a novel single-view 3D reconstruction method for Yangtze finless porpoises, combining diffusion models and neural radiance fields. First, an improved underwater image enhancement technique is developed to effectively address the issue of underwater color distortion. Second, a custom multi-view image dataset of Yangtze finless porpoises is created to fine-tune a view-conditioned diffusion model, enabling the synthesis of multi-view images from a single view. This provides a new approach for reconstructing Yangtze finless porpoises from a single image. Finally, a neural radiance field is employed to reconstruct the 3D model of the porpoise. The reconstruction results were evaluated using the average chamfer distance (ACD) and normal consistency (NC). The proposed method achieved lower ACD and higher NC compared to existing methods, demonstrating its effectiveness in reconstructing 3D models that accurately capture the coloration and morphology of Yangtze finless porpoises. The synthesized multi-view images achieved PSNR, SSIM, and LPIPS values of 38.968, 0.972, and 0.294, respectively, surpassing the performance of existing methods. Additionally, the reconstruction results after underwater image enhancement yielded the lowest ACD of 0.428 and the highest NC of 0.882, further highlighting the superiority of the proposed approach.
-
-
表 1 关键属性统计信息
Table 1 Key attribute statistics
属性Property 值Value 边的数量 2187315 面的数量 1458210 点的数量 729107 随机渲染颜色 (0.94, 0.27, 0.73) 表 2 新视图合成指标对比
Table 2 Comparison of composite indicators for new views
指标Index RealFusion One-2-3-45 Ours PSNR↑ 37.784 38.132 38.968 SSIM↑ 0.943 0.955 0.972 LPIPS↓ 0.305 0.298 0.294 表 3 网格模型平均倒角距离评估
Table 3 Evaluation of average chamfer distance of mesh model
江豚Finless porpoise RealFusion One-2-3-45 Ours (未增强) Ours (增强) 江豚1 1.146 2.352 0.689 0.503 江豚2 0.871 2.341 0.649 0.554 江豚3 1.217 1.761 0.733 0.535 江豚4 1.473 3.576 0.675 0.583 江豚5 1.591 2.287 0.874 0.759 江豚6 1.748 1.237 0.462 0.428 表 4 法向量一致性评估
Table 4 Evaluation of normal vector consistency evaluation
江豚Finless porpoise RealFusion One-2-3-45 Ours (未增强) Ours (增强) 江豚1 0.624 0.602 0.853 0.866 江豚2 0.705 0.483 0.824 0.841 江豚3 0.718 0.472 0.819 0.837 江豚4 0.531 0.415 0.807 0.849 江豚5 0.683 0.585 0.763 0.774 江豚6 0.524 0.673 0.876 0.882 -
[1] 程兆龙, 李永涛, 左涛, 等. 我国东亚江豚的研究现状、面临的威胁与保护建议 [J]. 应用海洋学学报, 2024, 43(3): 597-606.] doi: 10.3969/J.ISSN.2095-4972.20230601002 Cheng Z L, Li Y T, Zuo T, et al. Threats and conservation strategies of the East Asian finless porpoises in China [J]. Journal of Applied Oceanography, 2024, 43(3): 597-606. [ doi: 10.3969/J.ISSN.2095-4972.20230601002
[2] 王康伟, 周开亚, 陈敏敏, 等. 长江江豚迁地保护需要注意的几个问题 [J]. 南京师大学报(自然科学版), 2024, 47(2): 91-98.] Wang K W, Zhou K Y, Chen M M, et al. Beware of several problems in ex-situ protection of Yangtze finless porpoise [J]. Journal of Nanjing Normal University (Natural Science Edition), 2024, 47(2): 91-98. [
[3] 郝玉江, 唐斌, 梅志刚, 等. 长江江豚保护进展的回顾性分析及进一步保护建议 [J]. 水生生物学报, 2024, 48(6): 1065-1072.] doi: 10.7541/2024.2024.0020 Hao Y J, Tang B, Mei Z G, et al. Further suggestions on conservation of the Yangtze finless porpoise based on retrospective analysis of the current progress [J]. Acta Hydrobiologica Sinica, 2024, 48(6): 1065-1072. [ doi: 10.7541/2024.2024.0020
[4] Zuffi S, Kanazawa A, Jacobs D W, et al. 3D Menagerie: Modeling the 3D Shape and Pose of Animals [C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 5524-5532.
[5] Rüegg N, Zuffi S, Schindler K, et al. BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information [C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 3866-3874.
[6] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models [J]. Advances in Neural Information Processing Systems, 2020(33): 6840-6851.
[7] Chan E R, Nagano K, Chan M A, et al. Generative Novel View Synthesis with 3d-aware Diffusion Models [C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). October 1-6, 2023, Paris, France. IEEE, 2023: 4217-4229.
[8] Watson D, Chan W, Martin-Brualla R, et al. Novel view synthesis with diffusion models [EB/OL]. 2022: 2210.04628. https://arxiv.org/abs/2210.04628v1.
[9] Melas-Kyriazi L, Laina I, Rupprecht C, et al. RealFusion 360° Reconstruction of Any Object from a Single Image [C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023. Vancouver, BC, Canada. IEEE, 2023: 8446-8455.
[10] Liu R, Wu R, Van Hoorick B, et al. Zero-1-to-3: Zero-shot One Image to 3D Object [C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). October 1-6, 2023, Paris, France. IEEE, 2023: 9264-9275.
[11] Deitke M, Schwenk D, Salvador J, et al. Objaverse: A Universe of Annotated 3D Objects [C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 13142-13153.
[12] Arampatzakis V, Pavlidis G, Mitianoudis N, et al. Monocular depth estimation: a thorough review [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(4): 2396-2414. doi: 10.1109/TPAMI.2023.3330944
[13] Hu K, Weng C, Zhang Y, et al. An overview of underwater vision enhancement: from traditional methods to recent deep learning [J]. Journal of Marine Science and Engineering, 2022, 10(2): 241. doi: 10.3390/jmse10020241
[14] Jaffe J S. Computer modeling and the design of optimal underwater imaging systems [J]. IEEE Journal of Oceanic Engineering, 1990, 15(2): 101-111. doi: 10.1109/48.50695
[15] Mobley C D. Light and Water: Radiative Transfer in Natural Waters [M]. Academic Press, 1994.
[16] Anwar S, Li C, Porikli F. Deep underwater image enhancement [EB/OL]. 2018: 1807.03528. https://arxiv.org/abs/1807.03528.
[17] Fu Z, Wang W, Huang Y, et al. Uncertainty Inspired Underwater Image Enhancement [C]. 2022 European Conference on Computer Vision (ECCV). Cham: Springer Nature Switzerland, 2022: 465-482.
[18] Wang Q, Wu B, Zhu P, et al. ECA-net: Efficient Channel Attention for Deep Convolutional Neural Networks [C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020: 11531-11539.
[19] Ranftl R, Bochkovskiy A, Koltun V. Vision Transformers for Dense Prediction [C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 12159-12168.
[20] Saharia C, Chan W, Saxena S, et al. Photorealistic text-to-image diffusion models with deep language understanding [J]. Advances in neural information processing systems, 2022(35): 36479-36494.
[21] Rombach R, Blattmann A, Lorenz D, et al. High-Resolution Image Synthesis with Latent Diffusion Models [C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 10674-10685.
[22] Schuhmann C, Beaumont R, Vencu R, et al. Laion-5b: An open large-scale dataset for training next generation image-text models [J]. Advances in Neural Information Processing Systems, 2022(35): 25278-25294.
[23] Radford A, Kim J W, Hallacy C, et al. Learning Transferable Visual Models from Natural Language Supervision [C]. Proceedings of the 38th International Conference on Machine Learning (ICML). July 18-24, 2021, New York, NY, USA. PMLR 139: 8748-8763.
[24] Mildenhall B, Srinivasan P P, Tancik M, et al. Nerf: Representing scenes as neural radiance fields for view synthesis [J]. Communications of the ACM, 2021, 65(1): 99-106.
[25] Ranftl R, Lasinger K, Hafner D, et al. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(3): 1623-1637.
[26] Shen T, Gao J, Yin K, et al. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis [J]. Advances in Neural Information Processing Systems, 2021(34): 6087-6101.
[27] Zhang R, Isola P, Efros A A, et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric [C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 586-595.
[28] Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity [J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. doi: 10.1109/TIP.2003.819861
[29] Liu M, Xu C, Jin H, et al. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization[J]. Advances in Neural Information Processing Systems, 2024: 36.