-
CiteScore
1.08
Impact Factor
IECE Transactions on Intelligent Systematics, 2024, Volume 1, Issue 3: 161-175

Free Access | Research Article | 09 November 2024
1 School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
2 School of Computer Science, Wuhan University, Wuhan 430070, China
3 Department of Information technology, the University of Haripur, 22620, Pakistan
4 School of Business, Nanjing University of Information Science and Technology, Nanjing 210044, China
5 Department of Computer Science, Technical Hochshule (THWS) Wuerzburg, Germany
6 Department of Computer Science, IQRA National University, Swat Campus Pakistan
7 Coventry University, Priory St, Post code:CV1 5FB, Coventry, England,UK
* Corresponding author: Danish Ali, email: [email protected]
Received: 03 October 2024, Accepted: 27 October 2024, Published: 09 November 2024  

Abstract
Sentiment analysis is the process of identifying and categorizing opinions expressed in a piece of text. It has been extensively studied for languages like English and Chinese but still needs to be explored for languages such as Urdu and Hindi. This paper presents an in-depth analysis of Urdu text using state-of-the-art supervised learning techniques and a transformer-based technique. We manually annotated and preprocessed the dataset from various Urdu blog websites to categorize the sentiments into positive, neutral, and negative classes. We utilize five machine learning classifiers: Support Vector Machine (SVM), K-nearest neighbor (KNN), Naive Bayes, Multinomial Logistic Regression (MLR), and the transformer-based multilingual BERT (mBERT) model. This model was fine-tuned to capture deep contextual embeddings specific to Urdu text. The mBERT model was pre-trained on 104 languages and optimized for Urdu-specific sentiment classification by fine-tuning it on the dataset. Our results demonstrated that the mBERT model significantly outperformed traditional classifiers, achieving an accuracy of 96.5% on the test set. The study highlights the effectiveness of transfer learning via mBERT for low-resource languages such as Urdu, making it a highly promising approach for sentiment analysis.

Graphical Abstract
In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches

Keywords
machine learning
sentiment analysis
Urdu language
natural language processing (NLP)
computational linguistics

References

[1] Mukhtar, N., & Khan, M. A. (2018). Urdu sentiment analysis using supervised machine learning approach. International Journal of Pattern Recognition and Artificial Intelligence, 32(02), 1851001.

[2] Ghulam, H., Zeng, F., Li, W., & Xiao, Y. (2019). Deep learning-based sentiment analysis for roman urdu text. Procedia computer science, 147, 131-135.

[3] Ali, D., Huque, M. T., Godhuli, J. J., & Ahmed, N. (2022). Detection of Face Emotion and Music Recommendation System using Machine Learning. International Journal of Research and Innovation in Applied Science, 7(11), 05-08.

[4] Ali, D., Iqbal, S., Mehmood, S., Khalil, I., Ullah, I., Khan, H., & Ali, F. (2024). Unleashing the Power of AI in Communication Technology: Advances, Challenges, and Collaborative Prospects. In Artificial General Intelligence (AGI) Security: Smart Applications and Sustainable Technologies (pp. 211-226). Singapore: Springer Nature Singapore.

[5] Amin, R., Gantassi, R., Ahmed, N., Alshehri, A. H., Alsubaei, F. S., & Frnda, J. (2024). A hybrid approach for adversarial attack detection based on sentiment analysis model using Machine learning. Engineering Science and Technology, an International Journal, 58, 101829.

[6] Khan, H., Ullah, I., Shabaz, M., Omer, M. F., Usman, M. T., Guellil, M. S., & Koo, J. (2024). Visionary vigilance: Optimized YOLOV8 for fallen person detection with large-scale benchmark dataset. Image and Vision Computing, 149, 105195.

[7] Dar, G., Bhagat, A., Ansarullah, S., Othman, M., Hamid, Y., Alkahtani, H., Ullah, I. & Hamam, H. (2023). A novel framework for classification of different Alzheimer’s disease stages using CNN model. Electronics, 12, 469.

[8] Ali, D., Younis, B., & Iqbal, S. (2024). A DEEP LEARNING TECHNIQUE FOR CLASSIFYINGIMAGES OF BRAIN TUMOR. Exceed Journal of Biological and Computer Sciences, 1(1).

[9] Mukhtar, N., Khan, M. A., & Chiragh, N. (2017). Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation, 9, 446-456.

[10] Rehman, Z. U., & Bajwa, I. S. (2016, August). Lexicon-based sentiment analysis for Urdu language. In 2016 sixth international conference on innovative computing technology (INTECH) (pp. 497-501). IEEE.

[11] Khan, H., Ullah, M., Al-Machot, F., Cheikh, F. A., & Sajjad, M. (2023). Deep learning based speech emotion recognition for Parkinson patient. Electronic Imaging, 35, 298-1.

[12] Saeed, M., Ahmed, N., Mehmood, A., Aftab, M., Amin, R., & Kamal, S. (2023). Sentiment analysis for covid-19 vaccine popularity. KSII Transactions on Internet and Information Systems (TIIS), 17(5), 1377-1393.

[13] Khan, H., Hussain, T., Khan, S. U., Khan, Z. A., & Baik, S. W. (2024). Deep multi-scale pyramidal features network for supervised video summarization. Expert Systems with Applications, 237, 121288.

[14] Mehmood, K., Essam, D., Shafi, K., & Malik, M. K. (2019). Discriminative feature spamming technique for roman urdu sentiment analysis. IEEE Access, 7, 47991-48002.

[15] Ahmed, N., Amin, R., Ayub, H., Iqbal, M. M., Saeed, M., & Hussain, M. (2022). Urdu Sentiment Analysis Using Deep Attention-Based Technique. Foundation University Journal of Engineering and Applied Sciences (HEC Recognized Y Category, ISSN 2706-7351), 3(1), 1-12.

[16] Mehmood, F., Ghani, M. U., Ibrahim, M. A., Shahzadi, R., Mahmood, W., & Asim, M. N. (2020). A precisely xtreme-multi channel hybrid approach for roman urdu sentiment analysis. IEEE Access, 8, 192740-192759.

[17] Syed, A. Z., Aslam, M., & Martinez-Enriquez, A. M. (2010). Lexicon based sentiment analysis of Urdu text using SentiUnits. In Advances in Artificial Intelligence: 9th Mexican International Conference on Artificial Intelligence, MICAI 2010, Pachuca, Mexico, November 8-13, 2010, Proceedings, Part I 9 (pp. 32-43). Springer Berlin Heidelberg.

[18] Ahmed, N., Amin, R., Aldabbas, H., Koundal, D., Alouffi, B., & Shah, T. (2022). Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges. Security and Communication Networks, 2022(1), 1862888.

[19] Pal, R., Adhikari, D., Heyat, M. B. B., Ullah, I., & You, Z. (2023). Yoga meets intelligent internet of things: recent challenges and future directions. Bioengineering, 10(4), 459.

[20] Rajalakshmi, S., Asha, S., & Pazhaniraja, N. (2017, March). A comprehensive survey on sentiment analysis. In 2017 fourth international conference on signal processing, communication and networking (ICSCN) (pp. 1-5). IEEE.

[21] Khan, L., Amjad, A., Afaq, K. M., & Chang, H. T. (2022). Deep sentiment analysis using CNN-LSTM architecture of English and Roman Urdu text shared in social media. Applied Sciences, 12(5), 2694.

[22] Ahmed, N., Amin, R., Aldabbas, H., Saeed, M., Bilal, M., & Song, H. (2024). A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models. ACM Transactions on Asian and Low-Resource Language Information Processing.

[23] Ahmed, N., Amin, R., Ayub, H., Iqbal, M. M., Saeed, M., & Hussain, M. (2022). Urdu Sentiment Analysis Using Deep Attention-Based Technique. Foundation University Journal of Engineering and Applied Sciences (HEC Recognized Y Category, ISSN 2706-7351), 3(1), 1-12.

[24] Rajalakshmi, S., Asha, S., & Pazhaniraja, N. (2017, March). A comprehensive survey on sentiment analysis. In 2017 fourth international conference on signal processing, communication and networking (ICSCN) (pp. 1-5). IEEE.

[25] Chandio, B. A., Imran, A. S., Bakhtyar, M., Daudpota, S. M., & Baber, J. (2022). Attention-based RU-BiLSTM sentiment analysis model for roman Urdu. Applied Sciences, 12(7), 3641.

[26] Li, D., Ahmed, K., Zheng, Z., Mohsan, S. A. H., Alsharif, M. H., Hadjouni, M., ... & Mostafa, S. M. (2022). Roman Urdu sentiment analysis using transfer learning. Applied Sciences, 12(20), 10344.

[27] Khan, L., Amjad, A., Ashraf, N., Chang, H. T., & Gelbukh, A. (2021). Urdu sentiment analysis with deep learning methods. IEEE access, 9, 97803-97812.

[28] Ahmed, M. J., Afridi, U., Shah, H. A., Khan, H., Bhatt, M. W., Alwabli, A., & Ullah, I. (2024). CardioGuard: AI-driven ECG authentication hybrid neural network for predictive health monitoring in telehealth systems. SLAS technology, 29(5), 100193.

[29] Khan, L., Amjad, A., Ashraf, N., & Chang, H. T. (2022). Multi-class sentiment analysis of urdu text using multilingual BERT. Scientific Reports, 12(1), 5436.

[30] Ahmed, K., Nadeem, M. I., Li, D., Zheng, Z., Al-Kahtani, N., Alkahtani, H. K., ... & Mamyrbayev, O. (2023). Contextually enriched meta-learning ensemble model for Urdu sentiment analysis. Symmetry, 15(3), 645.

[31] Sehar, U., Kanwal, S., Dashtipur, K., Mir, U., Abbasi, U., & Khan, F. (2021). Urdu sentiment analysis via multimodal data mining based on deep learning algorithms. IEEE Access, 9, 153072-153082.

[32] Mehmood, K., Essam, D., Shafi, K., & Malik, M.K. (2019). Sentiment analysis for a resource poor language—Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(1), 1-15.

[33] Ahmad, P. N., Liu, Y., Ullah, I., & Shabaz, M. (2024). Enhancing coherence and diversity in multi-class slogan generation systems. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(8), 1-24.

[34] Nasim, Z., & Ghani, S. (2020). Sentiment analysis on Urdu tweets using Markov chains. SN Computer Science, 1(5), 269.

[35] Asghar, M. Z., Sattar, A., Khan, A., Ali, A., Masud Kundi, F., & Ahmad, S. (2019). Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language. Expert Systems, 36(3), e12397.

[36] Mukhtar, N., Khan, M. A., Chiragh, N., & Nazir, S. (2018). Identification and handling of intensifiers for enhancing accuracy of Urdu sentiment analysis. Expert Systems, 35(6), e12317.

[37] Mukhtar, N., & Khan, M. A. (2020). Effective lexicon-based approach for Urdu sentiment analysis. Artificial Intelligence Review, 53(4), 2521-2548.

[38] Khan, K., Khan, W., Rahman, A. U., Khan, A., Khan, A., Khan, A. U., & Saqia, B. (2018). Urdu sentiment analysis. International Journal of Advanced Computer Science and Applications, 9(9).

[39] ul Mustafa, F., Ashraf, I., Baqir, A., Ahmad, U., Malik, S., & Mehmood, S. (2020, October). Prediction of user’s interest based on urdu tweets. In 2020 International Symposium on Recent Advances in Electrical Engineering & Computer Sciences (RAEE & CS) (Vol. 5, pp. 1-6). IEEE.


Cite This Article
APA Style
Saeed, M., Ahmed, N., Ali, D., Ramzan, M., Mohib, M., Bagga, K., Rahman, A. U., & Khan, I. M. (2024). In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches. IECE Transactions on Intelligent Systematics, 1(3), 161-175. https://doi.org/10.62762/TIS.2024.585616

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 93
PDF Downloads: 9

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
IECE or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
IECE Transactions on Intelligent Systematics

IECE Transactions on Intelligent Systematics

ISSN: 2998-3355 (Online) | ISSN: 2998-3320 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2024 Institute of Emerging and Computer Engineers Inc.