-
CiteScore
2.28
Impact Factor
Volume 1, Issue 3, IECE Transactions on Intelligent Systematics
Volume 1, Issue 3, 2024
Submit Manuscript Edit a Special Issue
Academic Editor
Habib Khan
Habib Khan
Gachon University, Republic of Korea
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
IECE Transactions on Intelligent Systematics, Volume 1, Issue 3, 2024: 161-175

Free to Read | Research Article | 09 November 2024
In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches
1 School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
2 School of Computer Science, Wuhan University, Wuhan 430070, China
3 Department of Information technology, the University of Haripur, 22620, Pakistan
4 School of Business, Nanjing University of Information Science and Technology, Nanjing 210044, China
5 Department of Computer Science, Technical Hochshule (THWS) Wuerzburg, Germany
6 Department of Computer Science, IQRA National University, Swat Campus, Pakistan
7 Coventry University, Priory St, Post code:CV1 5FB, Coventry, England,UK
* Corresponding Authors: Naeem Ahmed, [email protected] ; Danish Ali, [email protected]
Received: 03 October 2024, Accepted: 27 October 2024, Published: 09 November 2024  
Cited by: 1  (Source: Google Scholar)
Abstract
Sentiment analysis is the process of identifying and categorizing opinions expressed in a piece of text. It has been extensively studied for languages like English and Chinese but still needs to be explored for languages such as Urdu and Hindi. This paper presents an in-depth analysis of Urdu text using state-of-the-art supervised learning techniques and a transformer-based technique. We manually annotated and preprocessed the dataset from various Urdu blog websites to categorize the sentiments into positive, neutral, and negative classes. We utilize five machine learning classifiers: Support Vector Machine (SVM), K-nearest neighbor (KNN), Naive Bayes, Multinomial Logistic Regression (MLR), and the transformer-based multilingual BERT (mBERT) model. This model was fine-tuned to capture deep contextual embeddings specific to Urdu text. The mBERT model was pre-trained on 104 languages and optimized for Urdu-specific sentiment classification by fine-tuning it on the dataset. Our results demonstrated that the mBERT model significantly outperformed traditional classifiers, achieving an accuracy of 96.5% on the test set. The study highlights the effectiveness of transfer learning via mBERT for low-resource languages such as Urdu, making it a highly promising approach for sentiment analysis.

Graphical Abstract
In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches

Keywords
machine learning
sentiment analysis
Urdu language
natural language processing (NLP)
computational linguistics

Funding
This work was supported without any funding.

References
  1. Mukhtar, N., & Khan, M. A. (2018). Urdu sentiment analysis using supervised machine learning approach. International Journal of Pattern Recognition and Artificial Intelligence, 32(02), 1851001.
    [Google Scholar]
  2. Ghulam, H., Zeng, F., Li, W., & Xiao, Y. (2019). Deep learning-based sentiment analysis for roman urdu text. Procedia computer science, 147, 131-135.
    [Google Scholar]
  3. Ali, D., Huque, M. T., Godhuli, J. J., & Ahmed, N. (2022). Detection of Face Emotion and Music Recommendation System using Machine Learning. International Journal of Research and Innovation in Applied Science, 7(11), 05-08.
    [Google Scholar]
  4. Ali, D., Iqbal, S., Mehmood, S., Khalil, I., Ullah, I., Khan, H., & Ali, F. (2024). Unleashing the Power of AI in Communication Technology: Advances, Challenges, and Collaborative Prospects. In Artificial General Intelligence (AGI) Security: Smart Applications and Sustainable Technologies (pp. 211-226). Singapore: Springer Nature Singapore.
    [Google Scholar]
  5. Amin, R., Gantassi, R., Ahmed, N., Alshehri, A. H., Alsubaei, F. S., & Frnda, J. (2024). A hybrid approach for adversarial attack detection based on sentiment analysis model using Machine learning. Engineering Science and Technology, an International Journal, 58, 101829.
    [Google Scholar]
  6. Khan, H., Ullah, I., Shabaz, M., Omer, M. F., Usman, M. T., Guellil, M. S., & Koo, J. (2024). Visionary vigilance: Optimized YOLOV8 for fallen person detection with large-scale benchmark dataset. Image and Vision Computing, 149, 105195.
    [Google Scholar]
  7. Dar, G., Bhagat, A., Ansarullah, S., Othman, M., Hamid, Y., Alkahtani, H., Ullah, I. & Hamam, H. (2023). A novel framework for classification of different Alzheimer’s disease stages using CNN model. Electronics, 12, 469.
    [Google Scholar]
  8. Ali, D., Younis, B., & Iqbal, S. (2024). A DEEP LEARNING TECHNIQUE FOR CLASSIFYINGIMAGES OF BRAIN TUMOR. Exceed Journal of Biological and Computer Sciences, 1(1).
    [Google Scholar]
  9. Mukhtar, N., Khan, M. A., & Chiragh, N. (2017). Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation, 9, 446-456.
    [Google Scholar]
  10. Rehman, Z. U., & Bajwa, I. S. (2016, August). Lexicon-based sentiment analysis for Urdu language. In 2016 sixth international conference on innovative computing technology (INTECH) (pp. 497-501). IEEE.
    [Google Scholar]
  11. Khan, H., Ullah, M., Al-Machot, F., Cheikh, F. A., & Sajjad, M. (2023). Deep learning based speech emotion recognition for Parkinson patient. Electronic Imaging, 35, 298-1.
    [Google Scholar]
  12. Saeed, M., Ahmed, N., Mehmood, A., Aftab, M., Amin, R., & Kamal, S. (2023). Sentiment analysis for covid-19 vaccine popularity. KSII Transactions on Internet and Information Systems (TIIS), 17(5), 1377-1393.
    [Google Scholar]
  13. Khan, H., Hussain, T., Khan, S. U., Khan, Z. A., & Baik, S. W. (2024). Deep multi-scale pyramidal features network for supervised video summarization. Expert Systems with Applications, 237, 121288.
    [Google Scholar]
  14. Mehmood, K., Essam, D., Shafi, K., & Malik, M. K. (2019). Discriminative feature spamming technique for roman urdu sentiment analysis. IEEE Access, 7, 47991-48002.
    [Google Scholar]
  15. Ahmed, N., Amin, R., Ayub, H., Iqbal, M. M., Saeed, M., & Hussain, M. (2022). Urdu Sentiment Analysis Using Deep Attention-Based Technique. Foundation University Journal of Engineering and Applied Sciences (HEC Recognized Y Category, ISSN 2706-7351), 3(1), 1-12.
    [Google Scholar]
  16. Mehmood, F., Ghani, M. U., Ibrahim, M. A., Shahzadi, R., Mahmood, W., & Asim, M. N. (2020). A precisely xtreme-multi channel hybrid approach for roman urdu sentiment analysis. IEEE Access, 8, 192740-192759.
    [Google Scholar]
  17. Syed, A. Z., Aslam, M., & Martinez-Enriquez, A. M. (2010). Lexicon based sentiment analysis of Urdu text using SentiUnits. In Advances in Artificial Intelligence: 9th Mexican International Conference on Artificial Intelligence, MICAI 2010, Pachuca, Mexico, November 8-13, 2010, Proceedings, Part I 9 (pp. 32-43). Springer Berlin Heidelberg.
    [Google Scholar]
  18. Ahmed, N., Amin, R., Aldabbas, H., Koundal, D., Alouffi, B., & Shah, T. (2022). Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges. Security and Communication Networks, 2022(1), 1862888.
    [Google Scholar]
  19. Pal, R., Adhikari, D., Heyat, M. B. B., Ullah, I., & You, Z. (2023). Yoga meets intelligent internet of things: recent challenges and future directions. Bioengineering, 10(4), 459.
    [Google Scholar]
  20. Rajalakshmi, S., Asha, S., & Pazhaniraja, N. (2017, March). A comprehensive survey on sentiment analysis. In 2017 fourth international conference on signal processing, communication and networking (ICSCN) (pp. 1-5). IEEE.
    [Google Scholar]
  21. Khan, L., Amjad, A., Afaq, K. M., & Chang, H. T. (2022). Deep sentiment analysis using CNN-LSTM architecture of English and Roman Urdu text shared in social media. Applied Sciences, 12(5), 2694.
    [Google Scholar]
  22. Ahmed, N., Amin, R., Aldabbas, H., Saeed, M., Bilal, M., & Song, H. (2024). A Novel Approach for Sentiment Analysis of a Low Resource Language Using Deep Learning Models. ACM Transactions on Asian and Low-Resource Language Information Processing.
    [Google Scholar]
  23. Ahmed, N., Amin, R., Ayub, H., Iqbal, M. M., Saeed, M., & Hussain, M. (2022). Urdu Sentiment Analysis Using Deep Attention-Based Technique. Foundation University Journal of Engineering and Applied Sciences (HEC Recognized Y Category, ISSN 2706-7351), 3(1), 1-12.
    [Google Scholar]
  24. Rajalakshmi, S., Asha, S., & Pazhaniraja, N. (2017, March). A comprehensive survey on sentiment analysis. In 2017 fourth international conference on signal processing, communication and networking (ICSCN) (pp. 1-5). IEEE.
    [Google Scholar]
  25. Chandio, B. A., Imran, A. S., Bakhtyar, M., Daudpota, S. M., & Baber, J. (2022). Attention-based RU-BiLSTM sentiment analysis model for roman Urdu. Applied Sciences, 12(7), 3641.
    [Google Scholar]
  26. Li, D., Ahmed, K., Zheng, Z., Mohsan, S. A. H., Alsharif, M. H., Hadjouni, M., ... & Mostafa, S. M. (2022). Roman Urdu sentiment analysis using transfer learning. Applied Sciences, 12(20), 10344.
    [Google Scholar]
  27. Khan, L., Amjad, A., Ashraf, N., Chang, H. T., & Gelbukh, A. (2021). Urdu sentiment analysis with deep learning methods. IEEE access, 9, 97803-97812.
    [Google Scholar]
  28. Ahmed, M. J., Afridi, U., Shah, H. A., Khan, H., Bhatt, M. W., Alwabli, A., & Ullah, I. (2024). CardioGuard: AI-driven ECG authentication hybrid neural network for predictive health monitoring in telehealth systems. SLAS technology, 29(5), 100193.
    [Google Scholar]
  29. Khan, L., Amjad, A., Ashraf, N., & Chang, H. T. (2022). Multi-class sentiment analysis of urdu text using multilingual BERT. Scientific Reports, 12(1), 5436.
    [Google Scholar]
  30. Ahmed, K., Nadeem, M. I., Li, D., Zheng, Z., Al-Kahtani, N., Alkahtani, H. K., ... & Mamyrbayev, O. (2023). Contextually enriched meta-learning ensemble model for Urdu sentiment analysis. Symmetry, 15(3), 645.
    [Google Scholar]
  31. Sehar, U., Kanwal, S., Dashtipur, K., Mir, U., Abbasi, U., & Khan, F. (2021). Urdu sentiment analysis via multimodal data mining based on deep learning algorithms. IEEE Access, 9, 153072-153082.
    [Google Scholar]
  32. Mehmood, K., Essam, D., Shafi, K., & Malik, M.K. (2019). Sentiment analysis for a resource poor language—Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(1), 1-15.
    [Google Scholar]
  33. Ahmad, P. N., Liu, Y., Ullah, I., & Shabaz, M. (2024). Enhancing coherence and diversity in multi-class slogan generation systems. ACM Transactions on Asian and Low-Resource Language Information Processing, 23(8), 1-24.
    [Google Scholar]
  34. Nasim, Z., & Ghani, S. (2020). Sentiment analysis on Urdu tweets using Markov chains. SN Computer Science, 1(5), 269.
    [Google Scholar]
  35. Asghar, M. Z., Sattar, A., Khan, A., Ali, A., Masud Kundi, F., & Ahmad, S. (2019). Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language. Expert Systems, 36(3), e12397.
    [Google Scholar]
  36. Mukhtar, N., Khan, M. A., Chiragh, N., & Nazir, S. (2018). Identification and handling of intensifiers for enhancing accuracy of Urdu sentiment analysis. Expert Systems, 35(6), e12317.
    [Google Scholar]
  37. Mukhtar, N., & Khan, M. A. (2020). Effective lexicon-based approach for Urdu sentiment analysis. Artificial Intelligence Review, 53(4), 2521-2548.
    [Google Scholar]
  38. Khan, K., Khan, W., Rahman, A. U., Khan, A., Khan, A., Khan, A. U., & Saqia, B. (2018). Urdu sentiment analysis. International Journal of Advanced Computer Science and Applications, 9(9).
    [Google Scholar]
  39. ul Mustafa, F., Ashraf, I., Baqir, A., Ahmad, U., Malik, S., & Mehmood, S. (2020, October). Prediction of user’s interest based on urdu tweets. In 2020 International Symposium on Recent Advances in Electrical Engineering & Computer Sciences (RAEE & CS) (Vol. 5, pp. 1-6). IEEE.
    [Google Scholar]

Cite This Article
APA Style
Saeed, M., Ahmed, N., Ali, D., Ramzan, M., Mohib, M., Bagga, K., Rahman, A. U., & Khan, I. M. (2024). In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches. IECE Transactions on Intelligent Systematics, 1(3), 161-175. https://doi.org/10.62762/TIS.2024.585616

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 1004
PDF Downloads: 162

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
Institute of Emerging and Computer Engineers (IECE) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
IECE Transactions on Intelligent Systematics

IECE Transactions on Intelligent Systematics

ISSN: 2998-3355 (Online) | ISSN: 2998-3320 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2024 Institute of Emerging and Computer Engineers Inc.