Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks

Jiaxun Yang; Yilihamujiang Gapar

doi:10.62762/TETAI.2024.115892

CiteScore

2.5

Impact Factor

Volume 1, Issue 1, IECE Transactions on Emerging Topics in Artificial Intelligence

Volume 1, Issue 1, 2024

Submit Manuscript Edit a Special Issue

Academic Editor

Teerath Kumar

National College of Ireland, Ireland

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval Inaugural Editorial of the Chinese Journal of Information Fusion Simultaneous Spatiotemporal Bias Compensation and Data Fusion for Asynchronous Multisensor Systems YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Extraction of Motion Information from Occupancy Grid Map Using Keystone Transform

IECE Transactions on Emerging Topics in Artificial Intelligence, 2024, Volume 1, Issue 1: 31-43

Free Access | Research Article | Feature Paper | 21 May 2024 | Cited: 4

Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks

Jiaxun Yang 1 *

Yilihamujiang Gapar 2

1 Lyceum of the Philippines University, Batangas 4200, Philippines

2 Department of Student Affairs, Northwest Minzu University, Lanzhou 730030, China

* Corresponding author: Jiaxun Yang, email: [email protected]

DOI: 10.62762/TETAI.2024.115892

Received: 01 December 2023, Accepted: 16 May 2024, Published: 21 May 2024

Abstract

This paper proposes an improved object detection algorithm based on a dynamically deformable convolutional network (D-DCN), aiming to solve the multi-scale and variability challenges in object detection tasks. First, we review traditional methods in the field of object detection and introduce the current research status of improved methods based on multi-scale and variability convolutional neural networks. Then, we introduce in detail our proposed improved algorithms, including an improved feature pyramid network and a dynamically deformable network. In the improved feature pyramid network, we introduce a multi-scale feature fusion mechanism to better capture target information at different scales. In dynamically deformable networks, we propose dynamic offset calculations and dynamic convolution operations to achieve dynamic adaptation to the target shape and pose. We also validate our method by conducting experiments on the datasets KITTI and Caltech. Finally, we design a comprehensive loss function that considers both location localization error and category classification error to guide model training. Experimental results show that our improved algorithm achieves significant performance improvements in target detection tasks, with higher accuracy and robustness compared with traditional methods. Our work provides an effective method to solve the multi-scale and variability challenges in target detection tasks and has high practical value and prospects for general application.

Graphical Abstract

Keywords

object detection

feature pyramid network

multi-scale fusion

dynamic convolution

KITTI

Caltech

References

[1] Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9 (pp.404-417). Springer Berlin Heidelberg.

[2] Cai, Z., Saberian, M., & Vasconcelos, N. (2015).Learning complexity-aware cascades for deep pedestrian detection. In Proceedings of the IEEE international conference on computer vision (pp. 3361-3369).

[3] Chen, X., Kundu, K., Zhu, Y., Berneshawi, A. G., Ma, H., Fidler, S., & Urtasun, R. (2015). 3d object proposals for accurate object class detection. Advances in neural information processing systems, 28.

[4] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 764-773).

[5] Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE transactions on pattern analysis and machine intelligence,36(8), 1532-1545.

[6] Dollar, P., Wojek, C., Schiele, B., & Perona, P. (2011).Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence, 34(4), 743-761.

[7] Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88, 303-338.

[8] Geiger, A., Lenz, P., & Urtasun, R. (2012, June). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354-3361). IEEE.

[9] Geiger, A., Wojek, C., & Urtasun, R. (2011). Joint 3d estimation of objects and scene layout. Advances in Neural Information Processing Systems, 24.

[10] Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp.1440-1448).

[11] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014).Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp.580-587).

[12] Glorot, X., Bordes, A., & Bengio, Y. (2011, June). Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 315-323). JMLR Workshop and Conference Proceedings.

[13] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[14] Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.

[15] Hosang, J., Benenson, R., Dollár, P., & Schiele, B.(2015). What makes for effective detection proposals?.IEEE transactions on pattern analysis and machine intelligence, 38(4), 814-830.

[16] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

[17] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678).

[18] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[19] Li, B., Wu, T., & Zhu, S. C. (2014). Integrating context and occlusion for car detection by hierarchical and-or model. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13 (pp. 652-667). Springer International Publishing.

[20] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).

[21] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár,P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

[22] Cruz-Mota, J., Bogdanova, I., Paquier, B., Bierlaire, M., & Thiran, J. P. (2012). Scale invariant feature transform on the sphere: Theory and applications. International journal of computer vision, 98, 217-241.

[23] Cheng, L., Wang, Y., Liu, Q., Epema, D. H., Liu, C.,Mao, Y., & Murphy, J. (2021). Network-aware locality scheduling for distributed data operators in data centers. IEEE Transactions on Parallel and Distributed Systems, 32(6), 1494-1510.

[24] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed,S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21-37). Springer International Publishing.

[25] Nezamabadi-pour, H., & Kabir, E. (2004). Image retrieval using histograms of uni-color and bi-color blocks and directional changes in intensity gradient. Pattern Recognition Letters, 25(14), 1547-1557.

[26] Ohn-Bar, E., & Trivedi, M. M. (2015). Learning to detect vehicles by clustering appearance patterns. IEEE Transactions on Intelligent Transportation Systems, 16(5), 2511-2521.

[27] Paisitkriangkrai, S., Shen, C., & van den Hengel, A.(2015). Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE transactions on pattern analysis and machine intelligence, 38(6), 1243-1257.

[28] Pepik, B., Stark, M., Gehler, P., & Schiele, B. (2015).Multi-view and 3d deformable part models. IEEE transactions on pattern analysis and machine intelligence,37(11), 2232-2245.

[29] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

[30] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Fasterr-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.

[31] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh,S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115, 211-252.

[32] Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781-10790).

[33] Tian, Y., Luo, P., Wang, X., & Tang, X. (2015).Deep learning strong parts for pedestrian detection. In Proceedings of the IEEE international conference on computer vision (pp. 1904-1912).

[34] Wang, X., Yang, M., Zhu, S., & Lin, Y. (2013). Regionlets for generic object detection. In Proceedings of the IEEE international conference on computer vision(pp. 17-24).

[35] Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2015).Data-driven 3d voxel patterns for object category recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1903-1911).

[36] Yang, F., Choi, W., & Lin, Y. (2016). Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers.In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2129-2137).

[37] Wang, C., Wang, Y., Han, Y., Song, L., Quan, Z.,Li, J., & Li, X. (2017, January). CNN-based object detection solutions for embedded heterogeneous multicore SoCs. In 2017 22nd Asia and South Pacific design automation conference (ASP-DAC) (pp. 105-110).IEEE.

[38] Zhang, S., Benenson, R., & Schiele, B. (2015, June).Filtered channel features for pedestrian detection. In CVPR (Vol. 1, No. 2, p. 4).

[39] Zhu, Y., Urtasun, R., Salakhutdinov, R., & Fidler,S. (2015). segdeepm: Exploiting segmentation and context in deep neural networks for object detection.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4703-4711).

Cite This Article

APA Style

Yang, J., & Gapar, Y. (2024). Improved Object Detection Algorithm Based on Multi-scale and Variability Convolutional Neural Networks. IECE Transactions on Emerging Topics in Artificial Intelligence, 1(1), 31-43. https://doi.org/10.62762/TETAI.2024.115892

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 185

Publisher's Note

IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

IECE or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

IECE Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3066-1676 (Online) | ISSN: 3066-1668 (Print)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies