-
CiteScore
-
Impact Factor
IECE Transactions on Sensing, Communication, and Control, 2024, Volume 1, Issue 1: 3-29

Free Access | Review Article | 12 October 2024
1 Interdisciplinary Research Centre for Aviation and Space Exploration (IRCASE), King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Kingdom of Saudi Arabia
2 Electronic Engineering Department, Maynooth International Engineering College (MIEC), Maynooth University, Maynooth, Co. Kildare, Ireland
3 Department of Telecommunication Engineering, Mehran University of Engineering and Technology (MUET), Pakistan
* Corresponding author: Ghulam E Mustafa Abro, email: [email protected]
Received: 13 September 2024, Accepted: 22 September 2024, Published: 12 October 2024  

Abstract
This review paper offers a thorough assessment of three-dimensional object recognition methods, an essential element in the perception frameworks of autonomous systems. This analysis emphasises the integration of LiDAR and camera sensors, providing a distinctive contrast with more economical alternatives like camera-only or camera-Radar combinations. This study objectively evaluates performance and practical implementation issues, such as cost and operational efficiency, thereby elucidating the limitations of existing systems and proposing avenues for further research. The insights provided render it a significant asset for enhancing 3D object recognition and autonomy in intelligent systems.

Graphical Abstract
Innovations in 3D Object Detection: A Comprehensive Review of Methods, Sensor Fusion, and Future Directions

Keywords
autonomous systems
camera
fusion methods
LiDAR
object detection
radar and three-dimensional

References

[1] Banham, M. R., & Katsaggelos, A. K. (1997). Digital image restoration. IEEE signal processing magazine, 14(2), 24-41.

[2] Bao, W., Xu, B., & Chen, Z. (2019). Monofenet: Monocular 3d object detection with feature enhancement networks. IEEE Transactions on Image Processing, 29, 2753-2765.

[3] Barabas, I., Todoruţ, A., Cordoş, N., & Molea, A. (2017, October). Current challenges in autonomous driving. In IOP conference series: materials science and engineering (Vol. 252, No. 1, p. 012096). IOP Publishing.

[4] Li, J., Yang, B., Chen, C., Huang, R., Dong, Z., & Xiao, W. (2018). Automatic registration of panoramic image sequence and mobile laser scanning data using semantic features. ISPRS Journal of Photogrammetry and Remote Sensing, 136, 41-57.

[5] Liao, Y., Li, J., Kang, S., Li, Q., Zhu, G., Yuan, S., ... & Yang, B. (2023). SE-Calib: Semantic Edge-Based LiDAR–Camera Boresight Online Calibration in Urban Scenes. IEEE Transactions on Geoscience and Remote Sensing, 61, 1-13.

[6] Wang, J. G., & Zhou, L. B. (2018). Traffic light recognition with high dynamic range imaging and deep learning. IEEE Transactions on Intelligent Transportation Systems, 20(4), 1341-1352.

[7] Melotti, G., Premebida, C., Gonçalves, N. M. D. S., Nunes, U. J., & Faria, D. R. (2018, November). Multimodal CNN pedestrian classification: a study on combining LIDAR and camera data. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC) (pp. 3138-3143). IEEE.

[8] Wang, K., Ma, S., Ren, F., & Lu, J. (2021). SBAS: Salient bundle adjustment for visual SLAM. IEEE Transactions on Instrumentation and Measurement, 70, 1-9.

[9] Kurihata, H., Takahashi, T., Ide, I., Mekada, Y., Murase, H., Tamatsu, Y., & Miyahara, T. (2005, June). Rainy weather recognition from in-vehicle camera images for driver assistance. In IEEE Proceedings. Intelligent Vehicles Symposium, 2005. (pp. 205-210). IEEE.

[10] Webster, D. D., & Breckon, T. P. (2015, September). Improved raindrop detection using combined shape and saliency descriptors with scene context isolation. In 2015 IEEE International Conference on Image Processing (ICIP) (pp. 4376-4380). IEEE.

[11] Zhang, W., Wang, Z., & Change Loy, C. Multi-modality cut and paste for 3d object detection. arXiv 2020. arXiv preprint arXiv:2012.12741.

[12] Filgueira, A., González-Jorge, H., Lagüela, S., Díaz-Vilariño, L., & Arias, P. (2017). Quantifying the influence of rain in LiDAR performance. Measurement, 95, 143-148.

[13] Rasshofer, R. H., Spies, M., & Spies, H. (2011). Influences of weather phenomena on automotive laser radar systems. Advances in radio science, 9, 49-60.

[14] Abro, G. E. M., Zulkifli, S. A. B., Kumar, K., El Ouanjli, N., Asirvadam, V. S., & Mossa, M. A. (2023). Comprehensive review of recent advancements in battery technology, propulsion, power interfaces, and vehicle network systems for intelligent autonomous and connected electric vehicles. Energies, 16(6), 2925.

[15] Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Glaeser, C., Timm, F., ... & Dietmayer, K. (2020). Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3), 1341-1360.

[16] Wei, Z., Zhang, F., Chang, S., Liu, Y., Wu, H., & Feng, Z. (2022). Mmwave radar and vision fusion for object detection in autonomous driving: A review. Sensors, 22(7), 2542.

[17] Svenningsson, P., Fioranelli, F., & Yarovoy, A. (2021, May). Radar-pointgnn: Graph based object recognition for unstructured radar point-cloud data. In 2021 IEEE Radar Conference (RadarConf21) (pp. 1-6). IEEE.

[18] Ulrich, M., Braun, S., Köhler, D., Niederlöhner, D., Faion, F., Gläser, C., & Blume, H. (2022, October). Improved orientation estimation and detection with hybrid object detection networks for automotive radar. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC) (pp. 111-117). IEEE.

[19] Kim, Y., Choi, J. W., & Kum, D. (2020, October). Grif net: Gated region of interest fusion network for robust 3d object detection from radar point cloud and monocular image. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 10857-10864). IEEE.

[20] Chadwick, S., Maddern, W., & Newman, P. (2019, May). Distant vehicle detection using radar and vision. In 2019 International Conference on Robotics and Automation (ICRA) (pp. 8311-8317). IEEE.

[21] Nobis, F., Geisslinger, M., Weber, M., Betz, J., & Lienkamp, M. (2019, October). A deep learning-based radar and camera sensor fusion architecture for object detection. In 2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF) (pp. 1-7). IEEE.

[22] John, V., & Mita, S. (2019). RVNet: Deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging environments. In Image and Video Technology: 9th Pacific-Rim Symposium, PSIVT 2019, Sydney, NSW, Australia, November 18–22, 2019, Proceedings 9 (pp. 351-364). Springer International Publishing.

[23] Li, L. Q., & Xie, Y. L. (2020, December). A feature pyramid fusion detection algorithm based on radar and camera sensor. In 2020 15th IEEE International Conference on Signal Processing (ICSP) (Vol. 1, pp. 366-370). IEEE.

[24] Chang, S., Zhang, Y., Zhang, F., Zhao, X., Huang, S., Feng, Z., & Wei, Z. (2020). Spatial attention fusion for obstacle detection using mmwave radar and vision sensor. Sensors, 20(4), 956.

[25] Nabati, R., & Qi, H. (2021). Centerfusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1527-1536).

[26] Li, Y., Zeng, K., & Shen, T. (2023). CenterTransFuser: radar point cloud and visual information fusion for 3D object detection. EURASIP Journal on Advances in Signal Processing, 2023(1), 7.

[27] Long, Y., Kumar, A., Morris, D., Liu, X., Castro, M., & Chakravarty, P. (2023, June). RADIANT: Radar-image association network for 3D object detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 2, pp. 1808-1816).

[28] Li, P., Zhao, H., Liu, P., & Cao, F. (2020, August). Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In European Conference on Computer Vision (pp. 644-660). Cham: Springer International Publishing.

[29] Zhang, Y., Lu, J., & Zhou, J. (2021). Objects are different: Flexible monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3289-3298).

[30] Simonelli, A., Bulo, S. R., Porzi, L., López-Antequera, M., & Kontschieder, P. (2019). Disentangling monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1991-1999).

[31] Brazil, G., & Liu, X. (2019). M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9287-9296).

[32] Cai, Y., Li, B., Jiao, Z., Li, H., Zeng, X., & Wang, X. (2020, April). Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 07, pp. 10478-10485).

[33] Chen, H., Huang, Y., Tian, W., Gao, Z., & Xiong, L. (2021). Monorun: Monocular 3d object detection by reconstruction and uncertainty propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10379-10388).

[34] Chen, Y., Tai, L., Sun, K., & Li, M. (2020). Monopair: Monocular 3d object detection using pairwise spatial relationships. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12093-12102).

[35] Heylen, J., De Wolf, M., Dawagne, B., Proesmans, M., Van Gool, L., Abbeloos, W., ... & Reino, D. O. (2021). Monocinis: Camera independent monocular 3d object detection using instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 923-934).

[36] Liu, Z., Wu, Z., & Tóth, R. (2020). Smoke: Single-stage monocular 3d object detection via keypoint estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 996-997).

[37] Liu, L., Lu, J., Xu, C., Tian, Q., & Zhou, J. (2019). Deep fitting degree scoring network for monocular 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1057-1066).

[38] Luo, S., Dai, H., Shao, L., & Ding, Y. (2021). M3dssd: Monocular 3d single stage object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6145-6154).

[39] Wang, T., Zhu, X., Pang, J., & Lin, D. (2021). Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 913-922).

[40] Lu, Y., Ma, X., Yang, L., Zhang, T., Liu, Y., Chu, Q., ... & Ouyang, W. (2021). Geometry uncertainty projection network for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3111-3121).

[41] Mousavian, A., Anguelov, D., Flynn, J., & Kosecka, J. (2017). 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 7074-7082).

[42] Brazil, G., Pons-Moll, G., Liu, X., & Schiele, B. (2020). Kinematic 3d object detection in monocular video. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16 (pp. 135-152). Springer International Publishing.

[43] Simonelli, A., Bulo, S. R., Porzi, L., Ricci, E., & Kontschieder, P. (2020). Towards generalization across depth for monocular 3d object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16 (pp. 767-782). Springer International Publishing.

[44] Li, B., Ouyang, W., Sheng, L., Zeng, X., & Wang, X. (2019). Gs3d: An efficient 3d object detection framework for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1019-1028).

[45] Qin, Z., Wang, J., & Lu, Y. (2019, July). Monogrnet: A geometric reasoning network for monocular 3d object localization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 8851-8858).

[46] Shi, X., Chen, Z., & Kim, T. K. (2020). Distance-normalized unified representation for monocular 3d object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16 (pp. 91-107). Springer International Publishing.

[47] Hu, H. N., Cai, Q. Z., Wang, D., Lin, J., Sun, M., Krahenbuhl, P., ... & Yu, F. (2019). Joint monocular 3D vehicle detection and tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5390-5399).

[48] Ku, J., Pon, A. D., & Waslander, S. L. (2019). Monocular 3d object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11867-11876).

[49] Lian, Q., Ye, B., Xu, R., Yao, W., & Zhang, T. (2022). Exploring geometric consistency for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1685-1694).

[50] Zeeshan Zia, M., Stark, M., & Schindler, K. (2014). Are cars just 3d boxes?-jointly estimating the 3d shape of multiple objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3678-3685).

[51] Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., & Chateau, T. (2017). Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2040-2049).

[52] He, T., & Soatto, S. (2019, July). Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 8409-8416).

[53] Rogage, K., & Doukari, O. (2024). 3D object recognition using deep learning for automatically generating semantic BIM data. Automation in Construction, 162, 105366.

[54] Manhardt, F., Kehl, W., & Gaidon, A. (2019). Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2069-2078).

[55] Beker, D., Kato, H., Morariu, M. A., Ando, T., Matsuoka, T., Kehl, W., & Gaidon, A. (2020). Monocular differentiable rendering for self-supervised 3d object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16 (pp. 514-529). Springer International Publishing.

[56] Zakharov, S., Kehl, W., Bhargava, A., & Gaidon, A. (2020). Autolabeling 3d objects with differentiable rendering of sdf shape priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12224-12233).

[57] Jörgensen, E., Zach, C., & Kahl, F. (2019). Monocular 3d object detection and box fitting trained end-to-end using intersection-over-union loss. arXiv preprint arXiv:1906.08070.

[58] Naiden, A., Paunescu, V., Kim, G., Jeon, B., & Leordeanu, M. (2019, September). Shift r-cnn: Deep monocular 3d object detection with closed-form geometric constraints. In 2019 IEEE international conference on image processing (ICIP) (pp. 61-65). IEEE.

[59] Shi, X., Ye, Q., Chen, X., Chen, C., Chen, Z., & Kim, T. K. (2021). Geometry-based distance decomposition for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 15172-15181).

[60] Wang, T., Pang, J., & Lin, D. (2022, October). Monocular 3d object detection with depth from motion. In European Conference on Computer Vision (pp. 386-403). Cham: Springer Nature Switzerland.

[61] Wang, Y., Chao, W. L., Garg, D., Hariharan, B., Campbell, M., & Weinberger, K. Q. (2019). Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8445-8453).

[62] You, Y., Wang, Y., Chao, W. L., Garg, D., Pleiss, G., Hariharan, B., ... & Weinberger, K. Q. (2019). Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310.

[63] Ding, M., Huo, Y., Yi, H., Wang, Z., Shi, J., Lu, Z., & Luo, P. (2020). Learning depth-guided convolutions for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops (pp. 1000-1001).

[64] Weng, X., & Kitani, K. (2019). Monocular 3d object detection with pseudo-lidar point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (pp. 0-0).

[65] Wang, L., Du, L., Ye, X., Fu, Y., Guo, G., Xue, X., ... & Zhang, L. (2021). Depth-conditioned dynamic message propagation for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 454-463).

[66] Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., & Fan, X. (2019). Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6851-6860).

[67] Park, D., Ambrus, R., Guizilini, V., Li, J., & Gaidon, A. (2021). Is pseudo-lidar needed for monocular 3d object detection?. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3142-3152).

[68] Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., & Ouyang, W. (2020). Rethinking pseudo-lidar representation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16 (pp. 311-327). Springer International Publishing.

[69] Chang, J., & Wetzstein, G. (2019). Deep optics for monocular depth estimation and 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10193-10202).

[70] Li, P., Chen, X., & Shen, S. (2019). Stereo r-cnn based 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7644-7652).

[71] Sun, J., Chen, L., Xie, Y., Zhang, S., Jiang, Q., Zhou, X., & Bao, H. (2020). Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10548-10557).

[72] Liu, Y., Wang, L., & Liu, M. (2021, May). Yolostereo3d: A step back to 2d for efficient stereo 3d detection. In 2021 IEEE international conference on Robotics and automation (ICRA) (pp. 13018-13024). IEEE.

[73] Qin, Z., Wang, J., & Lu, Y. (2019). Triangulation learning network: from monocular to stereo 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7615-7623).

[74] Chen, Y., Liu, S., Shen, X., & Jia, J. (2020). Dsgn: Deep stereo geometry network for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12536-12545).

[75] Guo, X., Shi, S., Wang, X., & Li, H. (2021). Liga-stereo: Learning lidar geometry aware representations for stereo-based 3d detector. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3153-3163).

[76] Guo, X., Wang, S. S. X., & Li, H. Supplementary Materials of LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector.

[77] Su, K., Yan, W., Wei, X., & Gu, M. (2022). Stereo VoVNet-CNN for 3D object detection. Multimedia Tools and Applications, 81(25), 35803-35813.

[78] Xu, Z., Zhang, W., Ye, X., Tan, X., Yang, W., Wen, S., ... & Huang, L. (2020, April). Zoomnet: Part-aware adaptive zooming neural network for 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 07, pp. 12557-12564).

[79] Shi, Y., Guo, Y., Mi, Z., & Li, X. (2022). Stereo CenterNet-based 3D object detection for autonomous driving. Neurocomputing, 471, 219-229.

[80] Chen, L., Sun, J., Xie, Y., Zhang, S., Shuai, Q., Jiang, Q., ... & Zhou, X. (2021). Shape prior guided instance disparity estimation for 3d object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5529-5540.

[81] Peng, W., Pan, H., Liu, H., & Sun, Y. (2020). Ida-3d: Instance-depth-aware 3d object detection from stereo vision for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13015-13024).

[82] Peng, X., Zhu, X., Wang, T., & Ma, Y. (2022). Side: Center-based stereo 3d detector with structure-aware instance depth estimation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 119-128).

[83] Qian, R., Garg, D., Wang, Y., You, Y., Belongie, S., Hariharan, B., ... & Chao, W. L. (2020). End-to-end pseudo-lidar for image-based 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5881-5890).

[84] Liu, Y., Yixuan, Y., & Liu, M. (2021). Ground-aware monocular 3d object detection for autonomous driving. IEEE Robotics and Automation Letters, 6(2), 919-926.

[85] Peng, L., Liu, F., Yu, Z., Yan, S., Deng, D., Yang, Z., ... & Cai, D. (2022, October). Lidar point cloud guided monocular 3d object detection. In European conference on computer vision (pp. 123-139). Cham: Springer Nature Switzerland.

[86] Wang, X., Yin, W., Kong, T., Jiang, Y., Li, L., & Shen, C. (2020, April). Task-aware monocular depth estimation for 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 07, pp. 12257-12264).

[87] Ye, X., Du, L., Shi, Y., Li, Y., Tan, X., Feng, J., ... & Wen, S. (2020). Monocular 3d object detection via feature domain adaptation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16 (pp. 17-34). Springer International Publishing.

[88] Wang, L., Zhang, L., Zhu, Y., Zhang, Z., He, T., Li, M., & Xue, X. (2021). Progressive coordinate transforms for monocular 3d object detection. Advances in Neural Information Processing Systems, 34, 13364-13377.

[89] Meng, H., Li, C., Chen, G., & Chen, L. (2023). Efficient 3D Object Detection Based on Pseudo-LiDAR Representation. IEEE Transactions on Intelligent Vehicles.

[90] Tao, C., Cao, C., Cheng, H., Gao, Z., Luo, X., Zhang, Z., & Zheng, S. (2023). An efficient 3D object detection method based on fast guided anchor stereo RCNN. Advanced Engineering Informatics, 57, 102069.

[91] Xia, Y., Shi, L., Ding, Z., Henriques, J. F., & Cremers, D. (2024). Text2loc: 3d point cloud localization from natural language. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 14958-14967).

[92] Königshof, H., Salscheider, N. O., & Stiller, C. (2019, October). Realtime 3d object detection for automated driving using stereo vision and semantic information. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC) (pp. 1405-1410). IEEE.

[93] Tao, C., He, H., Xu, F., & Cao, J. (2021). Stereo priori RCNN based car detection on point level for autonomous driving. Knowledge-Based Systems, 229, 107346.

[94] Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1907-1915).

[95] Zhou, Y., Sun, P., Zhang, Y., Anguelov, D., Gao, J., Ouyang, T., ... & Vasudevan, V. (2020, May). End-to-end multi-view fusion for 3d object detection in lidar point clouds. In Conference on Robot Learning (pp. 923-932). PMLR.

[96] Rubino, C., Crocco, M., & Del Bue, A. (2017). 3d object localisation from multi-view image detections. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1281-1294.

[97] Yang, Z., & Wang, L. (2019). Learning relationships for multi-view 3D object recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7505-7514).

[98] Wang, C., Pelillo, M., & Siddiqi, K. (2019). Dominant set clustering and pooling for multi-view 3d object recognition. arXiv preprint arXiv:1906.01592.

[99] Deng, J., & Czarnecki, K. (2019, October). MLOD: A multi-view 3D object detection based on robust feature fusion method. In 2019 IEEE intelligent transportation systems conference (ITSC) (pp. 279-284). IEEE.

[100] Choy, C. B., Xu, D., Gwak, J., Chen, K., & Savarese, S. (2016). 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14 (pp. 628-644). Springer International Publishing.

[101] Ku, J., Pon, A. D., Walsh, S., & Waslander, S. L. (2019, November). Improving 3d object detection for pedestrians with virtual multi-view synthesis orientation estimation. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3459-3466). IEEE.

[102] Hong, C., Yu, J., You, J., Chen, X., & Tao, D. (2015). Multi-view ensemble manifold regularization for 3D object recognition. Information sciences, 320, 395-405.

[103] Philion, J., & Fidler, S. (2020). Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16 (pp. 194-210). Springer International Publishing.

[104] Wang, Y., Guizilini, V. C., Zhang, T., Wang, Y., Zhao, H., & Solomon, J. (2022, January). Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning (pp. 180-191). PMLR.

[105] Lin, J., Rickert, M., & Knoll, A. (2021, May). Deep hierarchical rotation invariance learning with exact geometry feature representation for point cloud classification. In 2021 IEEE international conference on robotics and automation (ICRA) (pp. 9529-9535). IEEE.

[106] Zhang, K., Hao, M., Wang, J., Chen, X., Leng, Y., de Silva, C. W., & Fu, C. (2021, November). Linked dynamic graph cnn: Learning through point cloud by linking hierarchical features. In 2021 27th international conference on mechatronics and machine vision in practice (M2VIP) (pp. 7-12). IEEE.

[107] Zhang, J., Liu, J., Liu, X., Wei, J., Cao, J., & Tang, K. (2021). Feature interpolation convolution for point cloud analysis. Computers & Graphics, 99, 182-191.

[108] Shi, S., Wang, X., & Li, H. (2019). Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 770-779).

[109] Liu, Z., Tang, H., Lin, Y., & Han, S. (2019). Point-voxel cnn for efficient 3d deep learning. Advances in neural information processing systems, 32.

[110] Chen, C., Chen, Z., Zhang, J., & Tao, D. (2022, June). Sasa: Semantics-augmented set abstraction for point-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 1, pp. 221-229).

[111] Ngiam, J., Caine, B., Han, W., Yang, B., Chai, Y., Sun, P., ... & Vasudevan, V. (2019). Starnet: Targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069.

[112] Yang, Z., Sun, Y., Liu, S., & Jia, J. (2020). 3dssd: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11040-11048).

[113] Yang, H., Liu, Z., Wu, X., Wang, W., Qian, W., He, X., & Cai, D. (2022, October). Graph r-cnn: Towards accurate 3d object detection with semantic-decorated local graph. In European Conference on Computer Vision (pp. 662-679). Cham: Springer Nature Switzerland.

[114] NShi, W., & Rajkumar, R. (2020). Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1711-1719).

[115] Zhou, D., Fang, J., Song, X., Liu, L., Yin, J., Dai, Y., ... & Yang, R. (2020). Joint 3d instance segmentation and object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1839-1849).

[116] He, Q., Wang, Z., Zeng, H., Zeng, Y., & Liu, Y. (2022, June). Svga-net: Sparse voxel-graph attention network for 3d object detection from point clouds. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 1, pp. 870-878).

[117] Zarzar, J., Giancola, S., & Ghanem, B. (2019). PointRGCN: Graph convolution networks for 3D vehicles detection refinement. arXiv preprint arXiv:1911.12236.

[118] Feng, M., Gilani, S. Z., Wang, Y., Zhang, L., & Mian, A. (2020). Relation graph network for 3D object detection in point clouds. IEEE Transactions on Image Processing, 30, 92-107.

[119] Pan, X., Xia, Z., Song, S., Li, L. E., & Huang, G. (2021). 3d object detection with pointformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7463-7472).

[120] Liu, Z., Zhang, Z., Cao, Y., Hu, H., & Tong, X. (2021). Group-free 3d object detection via transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2949-2958).

[121] Fayyad, J., Jaradat, M. A., Gruyer, D., & Najjaran, H. (2020). Deep learning sensor fusion for autonomous vehicle perception and localization: A review. Sensors, 20(15), 4220.

[122] Wang, Q., Chen, J., Deng, J., & Zhang, X. (2021). 3D-CenterNet: 3D object detection network for point clouds with center estimation priority. Pattern Recognition, 115, 107884.

[123] Wang, D. Z., & Posner, I. (2015, July). Voting for voting in online point cloud object detection. In Robotics: science and systems (Vol. 1, No. 3, pp. 10-15).

[124] Engelcke, M., Rao, D., Wang, D. Z., Tong, C. H., & Posner, I. (2017, May). Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 1355-1361). IEEE.

[125] Cui, Y., Zhang, Y., Dong, J., Sun, H., Chen, X., & Zhu, F. (2024). Link3d: Linear keypoints representation for 3d lidar point cloud. IEEE Robotics and Automation Letters.

[126] Bai, L., Li, Y., Cen, M., & Hu, F. (2021). 3D instance segmentation and object detection framework based on the fusion of LIDAR remote sensing and optical image sensing. Remote Sensing, 13(16), 3288.

[127] Wang, B., Zhu, M., Lu, Y., Wang, J., Gao, W., & Wei, H. (2021). Real-time 3D object detection from point cloud through foreground segmentation. IEEE Access, 9, 84886-84898.

[128] Yang, B., Liang, M., & Urtasun, R. (2018, October). Hdnet: Exploiting hd maps for 3d object detection. In Conference on Robot Learning (pp. 146-155). PMLR.

[129] Zhou, Y., & Tuzel, O. (2018). Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4490-4499).

[130] Yan, Y., Mao, Y., & Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18(10), 3337.

[131] Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12697-12705).

[132] Wang, Y., Fathi, A., Kundu, A., Ross, D. A., Pantofaru, C., Funkhouser, T., & Solomon, J. (2020). Pillar-based object detection for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16 (pp. 18-34). Springer International Publishing.

[133] Shi, S., Wang, Z., Shi, J., Wang, X., & Li, H. (2020). From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE transactions on pattern analysis and machine intelligence, 43(8), 2647-2664.

[134] Li, B. (2017, September). 3d fully convolutional network for vehicle detection in point cloud. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1513-1518). IEEE.

[135] Yin, T., Zhou, X., & Krahenbuhl, P. (2021). Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11784-11793).

[136] Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., ... & Xu, C. (2021). Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3164-3173).

[137] Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., & Li, H. (2021, May). Voxel r-cnn: Towards high performance voxel-based 3d object detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 2, pp. 1201-1209).

[138] Song, Z., Wei, H., Jia, C., Xia, Y., Li, X., & Zhang, C. (2023). VP-Net: Voxels as points for 3-D object detection. IEEE Transactions on Geoscience and Remote Sensing, 61, 1-12.

[139] Wang, H., Chen, Z., Cai, Y., Chen, L., Li, Y., Sotelo, M. A., & Li, Z. (2022). Voxel-RCNN-complex: An effective 3-D point cloud object detector for complex traffic conditions. IEEE Transactions on Instrumentation and Measurement, 71, 1-12.

[140] Sheng, H., Cai, S., Liu, Y., Deng, B., Huang, J., Hua, X. S., & Zhao, M. J. (2021). Improving 3d object detection with channel-wise transformer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2743-2752).

[141] Li, J., Dai, H., Shao, L., & Ding, Y. (2021, October). From voxel to point: IoU-guided 3D object detection for point cloud with voxel-to-point decoder. In Proceedings of the 29th ACM International Conference on Multimedia (pp. 4622-4631).

[142] Miao, Z., Chen, J., Pan, H., Zhang, R., Liu, K., Hao, P., ... & Zhan, X. (2021). Pvgnet: A bottom-up one-stage 3d object detector with integrated multi-level features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3279-3288).

[143] Noh, J., Lee, S., & Ham, B. (2021). Hvpr: Hybrid voxel-point representation for single-stage 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14605-14614).

[144] Guan, T., Wang, J., Lan, S., Chandra, R., Wu, Z., Davis, L., & Manocha, D. (2022). M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 772-782).

[145] Mao, J., Niu, M., Bai, H., Liang, X., Xu, H., & Xu, C. (2021). Pyramid r-cnn: Towards better performance and adaptability for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2723-2732).

[146] Liu, Z., Tang, H., Zhao, S., Shao, K., & Han, S. (2021). Pvnas: 3d neural architecture search with point-voxel convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8552-8568.

[147] Li, P., Su, S., & Zhao, H. (2021, May). Rts3d: Real-time stereo 3d detection from 4d feature-consistency embedding space for autonomous driving. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 3, pp. 1930-1939).

[148] Zhang, R., Qiu, H., Wang, T., Guo, Z., Cui, Z., Qiao, Y., ... & Gao, P. (2023). MonoDETR: Depth-guided transformer for monocular 3D object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9155-9166).

[149] Lu, B., Sun, Y., & Yang, Z. (2023). Voxel Graph Attention for 3D Object Detection from Point Clouds. IEEE Transactions on Instrumentation and Measurement.

[150] Deng, J., Zhou, W., Zhang, Y., & Li, H. (2021). From multi-view to hollow-3D: Hallucinated hollow-3D R-CNN for 3D object detection. IEEE Transactions on Circuits and Systems for Video Technology, 31(12), 4722-4734.

[151] Zhang, Y., Chen, J., & Huang, D. (2022). Cat-det: Contrastively augmented transformer for multi-modal 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 908-917).

[152] Shi, S., Jiang, L., Deng, J., Wang, Z., Guo, C., Shi, J., ... & Li, H. (2023). PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection. International Journal of Computer Vision, 131(2), 531-551.

[153] Wu, P., Gu, L., Yan, X., Xie, H., Wang, F. L., Cheng, G., & Wei, M. (2023). PV-RCNN++: semantical point-voxel feature interaction for 3D object detection. The Visual Computer, 39(6), 2425-2440.

[154] Tu, J., Wang, P., & Liu, F. (2021, July). Pp-rcnn: Point-pillars feature set abstraction for 3d real-time object detection. In 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

[155] Li, J., Luo, C., & Yang, X. (2023). PillarNeXt: Rethinking network designs for 3D object detection in LiDAR point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 17567-17576).

[156] Hu, J. S., Kuai, T., & Waslander, S. L. (2022). Point density-aware voxels for lidar 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8469-8478).

[157] Geng, K., Dong, G., Yin, G., & Hu, J. (2020). Deep dual-modal traffic objects instance segmentation method using camera and lidar data for autonomous driving. Remote Sensing, 12(20), 3274.

[158] Ignatious, H. A., & Khan, M. (2022). An overview of sensors in Autonomous Vehicles. Procedia Computer Science, 198, 736-741.

[159] Vargas, J., Alsweiss, S., Toker, O., Razdan, R., & Santos, J. (2021). An overview of autonomous vehicles sensors and their vulnerability to weather conditions. Sensors, 21(16), 5397.

[160] Cartenì, A. (2020). The acceptability value of autonomous vehicles: A quantitative analysis of the willingness to pay for shared autonomous vehicles (SAVs) mobility services. Transportation Research Interdisciplinary Perspectives, 8, 100224.

[161] Sakib, S. M. (2022). LiDAR Technology-An Overview. IUP Journal of Electrical & Electronics Engineering, 15(1).

[162] Bastos, D., Monteiro, P. P., Oliveira, A. S., & Drummond, M. V. (2021, February). An overview of LiDAR requirements and techniques for autonomous driving. In 2021 Telecoms Conference (ConfTELE) (pp. 1-6). IEEE.

[163] Royo, S., & Ballesta, M. (2019). An overview of imaging lidar sensors for autonomous vehicles.

[164] Thomä, R., Dallmann, T., Jovanoska, S., Knott, P., & Schmeink, A. (2021, March). Joint communication and radar sensing: An overview. In 2021 15th European Conference on Antennas and Propagation (EuCAP) (pp. 1-5). IEEE.

[165] Paterniani, G., Sgreccia, D., Davoli, A., Guerzoni, G., Di Viesti, P., Valenti, A. C., ... & Boriani, G. (2023). Radar-based monitoring of vital signs: A tutorial overview. Proceedings of the IEEE, 111(3), 277-317.

[166] Mielle, M., Magnusson, M., & Lilienthal, A. J. (2019, September). A comparative analysis of radar and lidar sensing for localization and mapping. In 2019 European Conference on Mobile Robots (ECMR) (pp. 1-6). IEEE.

[167] Kim, K. E., Lee, C. J., Pae, D. S., & Lim, M. T. (2017, October). Sensor fusion for vehicle tracking with camera and radar sensor. In 2017 17th International Conference on Control, Automation and Systems (ICCAS) (pp. 1075-1077). IEEE.

[168] Abro, G. E. M., Abdullahi, M. S., Ganasan, J., & Ricky, S. K. (2021). Prototyping an IoT-enabled Autonomous Unmanned Ground Vehicle Using SLAM. International Journal of Control Systems and Robotics, 6.

[169] Pravallika, A., Hashmi, M. F., & Gupta, A. (2024). Deep Learning Frontiers in 3D Object Detection: A Comprehensive Review for Autonomous Driving. IEEE Access.

[170] Berrio, J. S., Shan, M., Worrall, S., & Nebot, E. (2021). Camera-LIDAR integration: Probabilistic sensor fusion for semantic mapping. IEEE Transactions on Intelligent Transportation Systems, 23(7), 7637-7652.

[171] Khan, D., Baek, M., Kim, M. Y., & Han, D. S. (2022, October). Multimodal Object Detection and Ranging Based on Camera and Lidar Sensor Fusion for Autonomous Driving. In 2022 27th Asia Pacific Conference on Communications (APCC) (pp. 342-343). IEEE.

[172] Das, D., Adhikary, N., & Chaudhury, S. (2022, September). Sensor fusion in autonomous vehicle using LiDAR and camera Sensor. In 2022 IEEE 10th Region 10 Humanitarian Technology Conference (R10-HTC) (pp. 336-341). IEEE.

[173] Mendez, J., Molina, M., Rodriguez, N., Cuellar, M. P., & Morales, D. P. (2021). Camera-LiDAR multi-level sensor fusion for target detection at the network edge. Sensors, 21(12), 3992.

[174] Thakur, A., & Rajalakshmi, P. (2023, July). LiDAR and Camera Raw Data Sensor Fusion in Real-Time for Obstacle Detection. In 2023 IEEE Sensors Applications Symposium (SAS) (pp. 1-6). IEEE.

[175] Ai, C., Qi, Z., Zheng, L., Geng, D., Feng, Z., & Sun, X. (2021, March). Research on mapping method based on data fusion of lidar and depth camera. In 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE) (pp. 360-365). IEEE.


Cite This Article
APA Style
Abro, G. E. M., Ali, Z. A., & Rajput, S. (2024). Innovations in 3D Object Detection: A Comprehensive Review of Methods, Sensor Fusion, and Future Directions. IECE Transactions on Sensing, Communication, and Control, 1(1), 3–29. https://doi.org/10.62762/TSCC.2024.989358

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 845
PDF Downloads: 107

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
IECE or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
IECE Transactions on Sensing, Communication, and Control

IECE Transactions on Sensing, Communication, and Control

ISSN: 3065-7431 (Online) | ISSN: 3065-7423 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2024 Institute of Emerging and Computer Engineers Inc.