A Deep-Learning Detector via Optimized YOLOv7-bw Architecture for Dense Small Remote-Sensing Targets in Harsh Food Supply Applications

Xuebo Jin; Heran Fu; Jianlei Kong; Huijun Ma; Yuting Bai; Tingli Su

doi:10.62762/CJIF.2025.919344

CiteScore

2.50

Impact Factor

Volume 2, Issue 1, Chinese Journal of Information Fusion

Volume 2, Issue 1, 2025

Submit Manuscript Edit a Special Issue

Academic Editor

Jian Lan

Xi'an Jiaotong University, China

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion Simultaneous Spatiotemporal Bias Compensation and Data Fusion for Asynchronous Multisensor Systems YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Extraction of Motion Information from Occupancy Grid Map Using Keystone Transform

Chinese Journal of Information Fusion, Volume 2, Issue 1, 2025: 38-58

Open Access | Research Article | 22 March 2025

A Deep-Learning Detector via Optimized YOLOv7-bw Architecture for Dense Small Remote-Sensing Targets in Harsh Food Supply Applications

Xuebo Jin 1

Heran Fu 1

Jianlei Kong 1 *

Huijun Ma 1

Yuting Bai 1

Tingli Su 1

1 School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

* Corresponding Author: Jianlei Kong, [email protected]

DOI: 10.62762/CJIF.2025.919344

Received: 03 March 2025, Accepted: 19 March 2025, Published: 22 March 2025

PDF (11.21 MB) Full-Text HTML XML

Article Metrics Cite This Article

Abstract

With the progressive advancement of remote sensing image technology, its application in the agricultural domain is becoming increasingly prevalent. Both cultivation and transportation processes can greatly benefit from utilizing remote sensing images to ensure adequate food supply. However, such images often exist in harsh environments with many gaps and dense distribution, which poses major challenges to traditional target detection methods. The frequent missed detections and inaccurate bounding boxes severely constrain the further analysis and application of remote sensing images within the agricultural sector. This study presents an enhanced version of the YOLO algorithm, specifically tailored to achieve high-efficiency detection of densely distributed small targets in remote sensing images. We replaced the convolutions with a convolution kernel size of 3 in the last two ELAN modules with DeformableConvNetsv2 so that the backbone can better extract various objects. The proposed detector introduces a Bi-level Routing Attention module to the pooled pyramid SPPCSPC network of YOLOv7, thereby intensifying the attention towards areas of target concentration and augmenting the network's capacity to extract features related to dense small targets through effective feature fusion. Additionally, our approach employs a dynamic non-monotonic WIoUv3 to ensure the loss function of the network, enabling the allocation of the most appropriate gradient gain strategy at each instant and enhancing the network's ability to focus on detecting targets accurately. Finally, through comparative experimentation on the DIOR remote sensing image dataset, our proposed YOLOv7-bw exhibits superior performance with higher [email protected] and [email protected]: 0.95, achieving detection rates of 85.63\% and 65.93\%, surpassing those of the YOLOv7 detector by 1.93% and 2.03%, respectively, thus substantiating the effectiveness of our algorithmic approach.

Graphical Abstract

A Deep-Learning Detector via Optimized YOLOv7-bw Architecture for Dense Small Remote-Sensing Targets in Harsh Food Supply Applications

Keywords

remote sensing image

small object detection

harsh food supply management

deep learning

YOLOv7 architecture

Data Availability Statement

Data will be made available on request.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62173007, Grant 62203020, Grant 62473008, Grant 62433002, and Grant 62476014; in part by the Beijing Nova Program under Grant 20240484710; in part by the Project of Humanities and Social Sciences (Ministry of Education in China, MOC) under Grant 22YJCZH006; in part by the Beijing Scholars Program under Grant 099; in part by the Project of ALL China Federation of Supply and Marketing Cooperatives under Grant 202407; in part by the Project of Beijing Municipal University Teacher Team Construction Support Plan under Grant BPHR20220104.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Nie, G. T., & Huang, H. (2021). A survey of object detection in optical remote sensing images. Acta Automatica Sinica, 47(8): 1749-1768.
[CrossRef] [Google Scholar]
Kong, J. L., Fan, X. M., Jin, X. B., Su, T. L., Bai, Y. T., Ma, H. J., & Zuo, M. (2023). BMAE-Net: A data-driven weather prediction network for smart agriculture. Agronomy, 13(3), 625.
[CrossRef] [Google Scholar]
Jin, X. B., Wang, Z. Y., Kong, J. L., Bai, Y. T., Su, T. L., Ma, H. J., & Chakrabarti, P. (2023). Deep spatio-temporal graph network with self-optimization for air quality prediction. Entropy, 25(2), 247.
[CrossRef] [Google Scholar]
Liu, Y., & Wu, L. (2016). Geological disaster recognition on optical remote sensing images using deep learning. Procedia Computer Science, 91, 566-575.
[CrossRef] [Google Scholar]
Lenhart, D. O. M. I. N. I. K., Hinz, S. T. E. F. A. N., Leitloff, J. E. N. S., & Stilla, U. (2008). Automatic traffic monitoring based on aerial image sequences. Pattern Recognition and Image Analysis, 18, 400-405.
[CrossRef] [Google Scholar]
Kong, J., Fan, X., Jin, X., Lin, S., & Zuo, M. (2023). A variational Bayesian inference-based en-decoder framework for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 25(3), 2966-2975.
[CrossRef] [Google Scholar]
Mzid, N., Pignatti, S., Huang, W., & Casa, R. (2021). An analysis of bare soil occurrence in arable croplands for remote sensing topsoil applications. Remote Sensing, 13(3), 474.
[CrossRef] [Google Scholar]
Luo, B., Liu, X., Zhang, F., & Guo, P. (2021). Optimal management of cultivated land coupling remote sensing-based expected irrigation water forecasting. Journal of Cleaner Production, 308, 127370.
[CrossRef] [Google Scholar]
Löw, F., & Duveiller, G. (2014). Defining the spatial resolution requirements for crop identification using optical remote sensing. Remote Sensing, 6(9), 9034-9063.
[CrossRef] [Google Scholar]
Bahrami, H., McNairn, H., Mahdianpari, M., & Homayouni, S. (2022). A meta-analysis of remote sensing technologies and methodologies for crop characterization. Remote Sensing, 14(22), 5633.
[CrossRef] [Google Scholar]
Ly, R., Dia, K., & Diallo, M. (2021). Remote sensing and machine learning for food crop production data in Africa post-COVID-19. arxiv preprint arxiv:2108.10054.
[CrossRef] [Google Scholar]
Lazarowska, A. (2021). Review of collision avoidance and path planning methods for ships utilizing radar remote sensing. Remote Sensing, 13(16), 3265.
[CrossRef] [Google Scholar]
Zheng, Q., Huang, W., Xia, Q., Dong, Y., Ye, H., Jiang, H., ... & Huang, S. (2023). Remote sensing monitoring of rice diseases and pests from different data sources: A review. Agronomy, 13(7), 1851.
[CrossRef] [Google Scholar]
Li, X., & Wang, A. (2025). Forest pest monitoring and early warning using UAV remote sensing and computer vision techniques. Scientific Reports, 15(1), 401.
[CrossRef] [Google Scholar]
Lopez, R. D., & Frohn, R. C. (2017). Remote sensing for landscape ecology: New metric indicators. CRC Press.
[Google Scholar]
Vaccari, A., Batabyal, T., Tabassum, N., Hoppe, E. J., Bruckno, B. S., & Acton, S. T. (2018). Integrating remote sensing data in decision support systems for transportation asset management. Transportation Research Record, 2672(45), 23-35.
[CrossRef] [Google Scholar]
Banerjee, B., Bovolo, F., Bhattacharya, A., Bruzzone, L., Chaudhuri, S., & Buddhiraju, K. M. (2015). A novel graph-matching-based approach for domain adaptation in classification of remote sensing image pair. IEEE Transactions on Geoscience and Remote Sensing, 53(7), 4045-4062.
[CrossRef] [Google Scholar]
Shen, X., Guo, Y., & Cao, J. (2023). Object-based multiscale segmentation incorporating texture and edge features of high-resolution remote sensing images. PeerJ Computer Science, 9, e1290.
[CrossRef] [Google Scholar]
Zulfiqar, A., Ghaffar, M. M., Shahzad, M., Weis, C., Malik, M. I., Shafait, F., & Wehn, N. (2021). AI-ForestWatch: semantic segmentation based end-to-end framework for forest estimation and change detection using multi-spectral remote sensing imagery. Journal of Applied Remote Sensing, 15(2), 024518-024518.
[CrossRef] [Google Scholar]
Diao, W., Sun, X., Zheng, X., Dou, F., Wang, H., & Fu, K. (2016). Efficient saliency-based object detection in remote sensing images using deep belief networks. IEEE geoscience and remote sensing letters, 13(2), 137-141.
[CrossRef] [Google Scholar]
Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection proposals?. IEEE transactions on pattern analysis and machine intelligence, 38(4), 814-830.
[CrossRef] [Google Scholar]
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137-1149.
[CrossRef] [Google Scholar]
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., & Lin, D. (2019). Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 821-830).
[CrossRef] [Google Scholar]
Nie, X., Duan, M., Ding, H., Hu, B., & Wong, E. K. (2020). Attention mask R-CNN for ship detection and segmentation from remote sensing images. Ieee Access, 8, 9325-9334.
[CrossRef] [Google Scholar]
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[CrossRef] [Google Scholar]
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy of object detection. arxiv preprint arxiv:2004.10934.
[CrossRef] [Google Scholar]
Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arxiv preprint arxiv:2107.08430.
[CrossRef] [Google Scholar]
Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7464-7475).
[CrossRef] [Google Scholar]
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.
[CrossRef] [Google Scholar]
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627-9636).
[CrossRef] [Google Scholar]
Varghese, R., & Sambath, M. (2024, April). Yolov8: A novel object detection algorithm with enhanced performance and robustness. In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) (pp. 1-6). IEEE.
[CrossRef] [Google Scholar]
Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., ... & Chen, J. (2024). Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16965-16974).
[CrossRef] [Google Scholar]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).
[CrossRef] [Google Scholar]
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., ... & Shum, H. Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605.
[CrossRef] [Google Scholar]
Li, K., Cheng, G., Bu, S., & You, X. (2017). Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 56(4), 2337-2348.
[CrossRef] [Google Scholar]
Yang, X., Sun, H., Sun, X., Yan, M., Guo, Z., & Fu, K. (2018). Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network. IEEE access, 6, 50839-50849.
[CrossRef] [Google Scholar]
Xu, S. Y., Chu, K. B., Zhang, J., & Feng, C. T. (2022). An improved YOLOv3 algorithm for small target detection. Electro-Opt. Control, 29, 35-39.
[CrossRef] [Google Scholar]
Jiang, S., Yao, W., Wong, M. S., Li, G., Hong, Z., Kuc, T. Y., & Tong, X. (2020). An optimized deep neural network detecting small and narrow rectangular objects in Google Earth images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1068-1081.
[CrossRef] [Google Scholar]
Wang, Y., Li, W., Li, X., & Sun, X. (2018, August). Ship detection by modified RetinaNet. In 2018 10th IAPR workshop on pattern recognition in remote sensing (PRRS) (pp. 1-5). IEEE.
[CrossRef] [Google Scholar]
Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., ... & Fu, K. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8232-8241).
[CrossRef] [Google Scholar]
Yao, Q., Hu, X., & Lei, H. (2020). Multiscale convolutional neural networks for geospatial object detection in VHR satellite images. IEEE Geoscience and Remote Sensing Letters, 18(1), 23-27.
[CrossRef] [Google Scholar]
Junhua, Y. A. N., Zhang, K., & Tianjun, S. H. I. (2022). Multi-level feature fusion based dim small ground target detection in remote sensing images. Chinese Journal of Scientific Instrument, 43(03), 221-229.
[CrossRef] [Google Scholar]
Li, L., Zhou, Z., Wang, B., Miao, L., & Zong, H. (2020). A novel CNN-based method for accurate ship detection in HR optical remote sensing images via rotated bounding box. IEEE Transactions on Geoscience and Remote Sensing, 59(1), 686-699.
[CrossRef] [Google Scholar]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
[CrossRef] [Google Scholar]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).
[CrossRef] [Google Scholar]
Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., ... & Guo, B. (2022). Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12124-12134).
[CrossRef] [Google Scholar]
Wang, W., Yao, L., Chen, L., Lin, B., Cai, D., He, X., & Liu, W. (2021). CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention. arXiv e-prints, arXiv-2108.
[CrossRef] [Google Scholar]
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., & Shen, C. (2018). Repulsion loss: Detecting pedestrians in a crowd. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7774-7783).
[CrossRef] [Google Scholar]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).
[CrossRef] [Google Scholar]
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. (2019). Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 658-666).
[CrossRef] [Google Scholar]
Tong, Z., Chen, Y., Xu, Z., & Yu, R. (2023). Wise-IoU: bounding box regression loss with dynamic focusing mechanism. arxiv preprint arxiv:2301.10051.
[CrossRef] [Google Scholar]
Zhu, X., Hu, H., Lin, S., & Dai, J. (2019). Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9308-9316).
[CrossRef] [Google Scholar]
Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. (2016, October). Unitbox: An advanced object detection network. In Proceedings of the 24th ACM international conference on Multimedia (pp. 516-520).
[CrossRef] [Google Scholar]
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020, April). Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 12993-13000).
[CrossRef] [Google Scholar]
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[CrossRef] [Google Scholar]
Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV) (pp. 734-750).
[CrossRef] [Google Scholar]
Wang, K., Liew, J. H., Zou, Y., Zhou, D., & Feng, J. (2019). Panet: Few-shot image semantic segmentation with prototype alignment. In proceedings of the IEEE/CVF international conference on computer vision (pp. 9197-9206).
[CrossRef] [Google Scholar]
Chen, Z., Long, C., Zhang, L., & Xiao, C. (2021). Canet: A context-aware network for shadow removal. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4743-4752).
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Jin, X., Fu, H., Kong, J., Ma, H., Bai, Y., & Su, T. (2025). A Deep-Learning Detector via Optimized YOLOv7-bw Architecture for Dense Small Remote-Sensing Targets in Harsh Food Supply Applications. Chinese Journal of Information Fusion, 2(1), 38–58. https://doi.org/10.62762/CJIF.2025.919344

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 15

Publisher's Note

IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Copyright © 2025 by the Author(s). Published by Institute of Emerging and Computer Engineers. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

Chinese Journal of Information Fusion

ISSN: 2998-3371 (Online) | ISSN: 2998-3363 (Print)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies