Abstract
Sentiment analysis is the process of identifying and categorizing opinions expressed in a piece of text. It has been extensively studied for languages like English and Chinese but still needs to be explored for languages such as Urdu and Hindi. This paper presents an in-depth analysis of Urdu text using state-of-the-art supervised learning techniques and a transformer-based technique. We manually annotated and preprocessed the dataset from various Urdu blog websites to categorize the sentiments into positive, neutral, and negative classes. We utilize five machine learning classifiers: Support Vector Machine (SVM), K-nearest neighbor (KNN), Naive Bayes, Multinomial Logistic Regression (MLR), and the transformer-based multilingual BERT (mBERT) model. This model was fine-tuned to capture deep contextual embeddings specific to Urdu text. The mBERT model was pre-trained on 104 languages and optimized for Urdu-specific sentiment classification by fine-tuning it on the dataset. Our results demonstrated that the mBERT model significantly outperformed traditional classifiers, achieving an accuracy of 96.5% on the test set. The study highlights the effectiveness of transfer learning via mBERT for low-resource languages such as Urdu, making it a highly promising approach for sentiment analysis.
Keywords
machine learning
sentiment analysis
Urdu language
natural language processing (NLP)
computational linguistics
Funding
This work was supported without any funding.
Cite This Article
APA Style
Saeed, M., Ahmed, N., Ali, D., Ramzan, M., Mohib, M., Bagga, K., Rahman, A. U., & Khan, I. M. (2024). In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches. IECE Transactions on Intelligent Systematics, 1(3), 161-175. https://doi.org/10.62762/TIS.2024.585616
Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Institute of Emerging and Computer Engineers (IECE) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.