首页 > 分享 > 改进Mask R

改进Mask R

花匠小妙招
2024-12-28 23:13

摘要: 基于深度神经网络的果实识别和分割是采摘机器人作业成功的关键步骤，但由于网络参数多、计算量大，导致训练时间长，当模型部署到采摘机器人上则存在运行速度慢，识别精度低等问题。针对这些问题，该研究提出了一种改进Mask R-CNN的温室环境下不同成熟度番茄果实分割方法，采用跨阶段局部网络（Cross Stage Partial Network，CSPNet）与Mask R-CNN网络中的残差网络（Residual Network，ResNet）进行融合，通过跨阶段拆分与级联策略，减少反向传播过程中重复的特征信息，降低网络计算量的同时提高准确率。在番茄果实测试集上进行试验，结果表明以层数为50的跨阶段局部残差网络（Cross Stage Partial ResNet50，CSP-ResNet50）为主干的改进Mask R-CNN模型对绿熟期、半熟期、成熟期番茄果实分割的平均精度均值为95.45%，F1分数为91.2%，单张图像分割时间为0.658 s。该方法相比金字塔场景解析网络（Pyramid Scene Parsing Network，PSPNet）、DeepLab v3+模型和以ResNet50为主干的Mask R-CNN模型平均精度均值分别提高了16.44、14.95和2.29个百分点，相比以ResNet50为主干的Mask R-CNN模型分割时间减少了1.98%。最后将以CSP- ResNet50为主干的改进Mask R-CNN模型部署到采摘机器人上，在大型玻璃温室中开展不同成熟度番茄果实识别试验，该模型识别正确率达到90%。该研究在温室环境下对不同成熟度番茄果实具有较好的识别性能，可为番茄采摘机器人精准作业提供依据。

Abstract: Abstract: Fruit recognition and segmentation using deep neural networks have widely been contributed to the operation of picking robots in modern agriculture. However, the most current models present a low accuracy of recognition with a low running speed, due mainly to a large number of network parameters and calculations. In this study, a high-resolution segmentation was proposed for the different ripeness of tomatoes under a greenhouse environment using improved Mask R-CNN. Firstly, a Cross Stage Partial Network (CSPNet) was used to merge with Residual Network (ResNet) in the Mask R-CNN model. Cross-stage splitting and cascading strategies were contributed to reducing the repeated features in the backpropagation process for a higher accuracy rate, while reducing the number of network calculations. Secondly, the cross-entropy loss function with weight factor was utilized to calculate the mask loss for the better segmentation effect of the model, due to the imbalance of the whole sample. An experiment was also performed on the test sets of tomato fruits with three ripeness levels. The results showed that the improved Mask R-CNN model with CSP-ResNet50 as the backbone network presented the mean average precision of 95.45%, the precision of 95.25%, the recall of 87.43%, F1-score of 0.912, and average segmentation time was 0.658 s. Furthermore, the mean average precision increased by 16.44, 14.95, and 2.29 percentage points, respectively, compared with the Pyramid Scene Parsing Network (PSPNet), DeepLab v3+, and Mask R-CNN with ResNet50 as the backbone network. Nevertheless, the average segmentation time increased by 14.83% and 27.52%, respectively, compared with PSPNet and DeepLab v3+. More importantly, the average segmentation time of improved Mask R-CNN with CSP-ResNet50 as the backbone network was reduced by 1.98%, compared with Mask R-CNN with ResNet50 as the backbone network. Additionally, the new model performed well in the segmentation of green and half-ripe tomato fruits under different light intensities, especially under low light, compared with PSPNet and DeepLab v3+. Finally, the improved Mask R-CNN model with CSP-ResNet50 as the backbone network was deployed to the picking robot, in order to verify the recognition and segmentation effect on different ripeness of tomato fruits in large glass greenhouses. In a low overlap rate of tomato fruits, the model identified the number of tomato fruits consistent with manual detection, where the accuracy was more than 90%. When the occlusion or overlap rate of tomato fruits exceeded 70%, particularly when the target was far away, the accuracy of 66.67% was achieved in the improved Mask R-CNN model, indicating a large gap with manual detection. Only a few features with the blur pixels were attributed to the difficulty to extract the shape and color features of tomato fruits. In addition, low light also posed a great challenge on recognition difficulty. Correspondingly, it was more difficult to pick tomatoes for the picking robot, particularly a relatively low success rate of picking, as the overlap was more serious. Fortunately, the picking success rate improved greatly, as the occlusions reduced. Consequently, the integrated multiple technologies (such as image acquisition equipment, the performance of the model, the execution end design of robotic arm, and automatic mechanization) can widely be expected to effectively improve the picking rate of mature tomatoes under the complex environment of a specific greenhouse. The new model also demonstrated strong robustness and applicability for the precise operation of tomato-picking robots in various complex environments.

[1] 霍建勇. 中国番茄产业现状及安全防范[J]. 蔬菜，2016(6)：1-4.Huo Jianyong. Current status and safety precautions of Chinese tomato industry[J]. Vegetables, 2016(6)：1-4. (in Chinese with English abstract) [2] 张振，王新龙，刘军民，等. 现代果园作业平台与试验[J]. 农业工程，2019，9(6)：106-111.Zhang Zhen, Wang Xinlong, Liu Junmin, et al. Modern orchard operating platform and experiment[J]. Agricultural Engineering, 2019, 9(6): 106-111. (in Chinese with English abstract) [3] 樊艳英，张自敏，陈冠萍，等. 视觉传感器在采摘机器人目标果实识别系统中的应用[J]. 农机化研究，2019，41(5)：210-214.Fan Yanying, Zhang Zimin, Chen Guanping, et al. Application of vision sensor in the target fruit recognition system of picking robot[J]. Journal of Agricultural Mechanization Research, 2019, 41(5): 210-214. (in Chinese with English abstract) [4] 刘芳，刘玉坤，林森，等. 基于改进型YOLO的复杂环境下番茄果实快速识别方法[J]. 农业机械学报，2020，51(6)：229-237.Liu Fang, Liu Yukun, Lin Sen, et al. Fast recognition method for tomatoes under complex environments based on improved YOLO[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(6): 229-237. (in Chinese with English abstract) [5] 王春雷，卢彩云，陈婉芝，等. 基于遗传算法和阈值滤噪的玉米根茬行图像分割[J]. 农业工程学报，2019，35(16)：198-205.Wang Chunlei, Lu Caiyun, Chen Wanzhi, et al. Image segmentation of maize stubble row based on genetic algorithm and threshold filtering noise[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(16): 198-205. (in Chinese with English abstract) [6] 孙建桐，孙意凡，赵然，等. 基于几何形态学与迭代随机圆的番茄识别方法[J]. 农业机械学报，2019，50(增刊1)：22-26，61.Sun Jiantong, Sun Yifan, Zhao Ran, et al. Tomato recognition method based on iterative random circle and geometric morphology[J]. Transactions of the Chinese Society for Agricultural Machinery, 2019, 50(Supp. 1): 22-26, 61. (in Chinese with English abstract) [7] Peng H X, Xue C, Shao Y Y, et al. Semantic segmentation of litchi branches using DeepLab v3+ model[J]. IEEE Access, 2020, 8: 164546-164555. [8] Jia W K, Tian Y Y, Luo R, et al. Detection and segmentation of overlapped fruits based on optimized Mask R-CNN application in apple harvesting robot[J]. Computers and Electronics in Agriculture, 2020, 172: 1-7. [9] Afonso M, Fonteijn H, Fiorentin F S, et al. Tomato fruit detection and counting in greenhouses using deep learning[J]. Frontiers in Plant Science, 2020, 11: 571299-571310. [10] 岳有军，田博凯，王红君，等. 基于改进Mask R-CNN的复杂环境下苹果检测研究[J]. 中国农机化学报，2019，40(10)：128-134.Yue Youjun, Tian Bokai, Wang Hongjun, et al. Research on apple detection in complex environment based on improved Mask R-CNN[J]. Journal of Chinese Agricultural Mechanization, 2019, 40(10): 128-134. (in Chinese with English abstract) [11] 张靖祺. 基于机器视觉温室番茄成熟度检测研究[D]. 泰安：山东农业大学，2019.Zhang Jingqi. Research on Maturity Detection of Greenhouse Tomato Based on Machine Vision[D]. Tai'an, Shandong Agricultural University, 2019. (in Chinese with English abstract) [12] 朱逢乐，郑增威. 基于图像和卷积神经网络的蝴蝶兰种苗生长势评估[J]. 农业工程学报，2020，36(9)：185-194.Zhu Fengle, Zheng Zengwei. Image-based assessment of growth vigor for Phalaenopsis aphrodite seedlings using convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(9): 185-194. (in Chinese with English abstract) [13] He K M, Gkioxari G, Dollar P, et al. Mask R-CNN[C]//Proceedings of 2017 Conference on Computer Vision (ICCV), Venice: IEEE, 2017. [14] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. [15] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651. [16] He K M, Zhang X Y, Ren S Q, et al. Deep residual Learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: IEEE, 2016. [17] Lin T Y, Dollar P, Girshick P, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: IEEE, 2017. [18] 王春山，周冀，吴华瑞，等. 改进Multi-scale ResNet的蔬菜叶部病害识别[J]. 农业工程学报，2020，36(20)：209-217.Wang Chunshan, Zhou Ji, Wu Huarui, et al. Identification of vegetable leaf diseases based on improved Multi-scale ResNet[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(20): 209-217. (in Chinese with English abstract) [19] 娄甜田，杨华，胡志伟. 基于深度卷积网络的葡萄簇检测与分割[J]. 山西农业大学学报：自然科学版，2020，40(5)：109-119.Lou Tiantian, Yang Hua, Hu Zhiwei. Grape cluster detection and segmentation based on deep convolutional network[J]. Journal of Shanxi Agricultural University: Natural Science Edition, 2020, 40(5): 109-119. (in Chinese with English abstract) [20] Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle: IEEE, 2020. [21] Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: IEEE, 2017. [22] Chen L C, Zhu Y K, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision, Mountain View: ECCV, 2018. [23] Russell B C, Torralba A, Murphy K P, et al. LabelMe: A database and web-based tool for image annotation[J]. International Journal of Computer Vision, 2008, 77(1/2/3): 157-173. [24] Howard A G, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[J/OL]. Computer Vision and Pattern Recognition, 2017, [2017-04-17]. https: //arxiv. org/abs/1704. 04861. [25] Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: IEEE, 2017. [26] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(9): 1904-1916. [27] 廖崴，郑立华，李民赞，等. 基于随机森林算法的自然光照条件下绿色苹果识别[J]. 农业机械学报，2017，48(增刊1)：86-91.Liao Wei, Zheng Lihua, Li Minzan, et al. Green apple recognition in natural illumination based on random Forest algorithm[J]. Transactions of the Chinese Society for Agricultural Machinery, 2017, 48(Supp. 1): 86-91. (in Chinese with English abstract)