-
CiteScore
1.36
Impact Factor
IECE Transactions on Intelligent Systematics, 2024, Volume 2, Issue 1: 27-37

Free Access | Research Article | 31 December 2024
1 School of Electronics Engineering, Kyungpook National University, Daegu 41566, South Korea
2 Sensify Inc., New York, NY 10016,USA
3 Department of Electronic Engineering, Maynooth University, W23 A3HY, Ireland
* Corresponding Author: Zahra Shah, [email protected]
Received: 16 October 2024, Accepted: 09 December 2024, Published: 31 December 2024  

Abstract
Speaker identification systems have gained significant attention due to their potential applications in security and personalized systems. This study evaluates the performance of various time and frequency domain physical features for text-independent speaker identification. Specifically, four key features—pitch, intensity, spectral flux, and spectral slope—were examined along with their statistical variations (minimum, maximum, and average values). These features were fused with log power spectral features and trained using a Convolutional Neural Network (CNN). The goal was to identify the most effective feature combinations for improving speaker identification accuracy. The experimental results revealed that the proposed feature fusion method outperformed the baseline system by 8%, achieving an accuracy of 87.18%.

Graphical Abstract
Feature Fusion for Performance Enhancement of Text Independent Speaker Identification

Keywords
speaker identification
prosodic features
physical features
CNN
features fusion

Funding
This work was supported without any funding.

Cite This Article
APA Style
Shah, Z., Jang, G., & Farooq, A. (2024). Feature Fusion for Performance Enhancement of Text Independent Speaker Identification. IECE Transactions on Intelligent Systematics, 2(1), 27–37. https://doi.org/10.62762/TIS.2024.649374

References
  1. Sharma, R., Govind, D., Mishra, J., Dubey, A. K., Deepak, K. T., & Prasanna, S. R. M. (2024). Milestones in speaker recognition. Artificial Intelligence Review, 57(3), 58.
    [Google Scholar]
  2. Mak, M. W., & Chien, J. T. (2020). Machine learning for speaker recognition. Cambridge University Press.
    [Google Scholar]
  3. Alrusaini, O., & Daqrouq, K. (2024). Text-independent speaker identification system using discrete wavelet transform with linear prediction coding. Journal of Umm Al-Qura University for Engineering and Architecture, 1-8.
    [Google Scholar]
  4. O’Shaughnessy, D. (2023). Review of Methods for Automatic Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing,vol. 32, pp. 1776-1789.
    [Google Scholar]
  5. Ozsahin, D. U., Emegano, D. I., Hassan, A., Aldakhil, M., Banat, A. M., Duwa, B. B., & Ozsahin, I. (2024). A speech recognition system using technologies of audio signal processing. In Practical Design and Applications of Medical Devices (pp. 203-216). Academic Press.
    [Google Scholar]
  6. Singh, M. K. (2024). A text independent speaker identification system using ANN, RNN, and CNN classification technique. Multimedia Tools and Applications, 83(16), 48105-48117.
    [Google Scholar]
  7. Bose, S., Pal, A., Mukherjee, A., & Das, D. (2017). Robust speaker identification using fusion of features and classifiers. International Journal of Machine Learning and Computing, 7(5), 133-138.
    [Google Scholar]
  8. Guo, J., Yang, R., Arsikere, H., & Alwan, A. (2017). Robust speaker identification via fusion of subglottal resonances and cepstral features. the Journal of the Acoustical Society of America, 141(4), EL420-EL426.
    [Google Scholar]
  9. Bai, Z., & Zhang, X. L. (2021). Speaker recognition based on deep learning: An overview. Neural Networks, 140, 65-99.
    [Google Scholar]
  10. Alías, F., Socoró, J. C., & Sevillano, X. (2016). A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds. Applied Sciences, 6(5), 143.
    [Google Scholar]
  11. Richard, G., Sundaram, S., & Narayanan, S. (2013). An overview on perceptually motivated audio indexing and classification. Proceedings of the IEEE, 101(9), 1939-1954.
    [Google Scholar]
  12. Hui, L., Dai, B. Q., & Wei, L. (2006, May). A pitch detection algorithm based on AMDF and ACF. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 1, pp. I-I). IEEE.
    [Google Scholar]
  13. Jadoul, Y., Thompson, B., & De Boer, B. (2018). Introducing parselmouth: A python interface to praat. Journal of Phonetics, 71, 1-15.
    [Google Scholar]
  14. Zhang, X., Su, Z., Lin, P., He, Q., & Yang, J. (2014, July). An audio feature extraction scheme based on spectral decomposition. In 2014 International Conference on Audio, Language and Image Processing (pp. 730-733). IEEE.
    [Google Scholar]
  15. Geiger, J. T., Schuller, B., & Rigoll, G. (2013, October). Large-scale audio feature extraction and SVM for acoustic scene classification. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 1-4). IEEE.
    [Google Scholar]
  16. S. Al-Kaltakchi, M. T., Woo, W. L., Dlay, S., & Chambers, J. A. (2017). Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects. EURASIP Journal on Advances in Signal Processing, 2017, 1-17.
    [Google Scholar]
  17. Usman, M. T., Khan, H., Singh, S. K., Lee, M. Y., & Koo, J. (2024). Efficient deepfake detection via layer-frozen assisted dual attention network for consumer imaging devices. IEEE Transactions on Consumer Electronics.
    [Google Scholar]
  18. Ohi, A. Q., Mridha, M. F., Hamid, M. A., & Monowar, M. M. (2021). Deep speaker recognition: Process, progress, and challenges. IEEE Access, 9, 89619-89643.
    [Google Scholar]
  19. Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250-271. pp. 250-271, 2017.
    [Google Scholar]
  20. Koolagudi, S. G., Sreenivasa Rao, K., Reddy, R., Kumar, V. A., & Chakrabarti, S. (2012). Robust speaker recognition in noisy environments: Using dynamics of speaker-specific prosody. Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism, 183-204.
    [Google Scholar]
  21. Shakil, M. D., Rahman, M. A., Soliman, M. M., & Islam, M. A. (2020, September). Automatic Isolated Speech Recognition System Using MFCC Analysis and Artificial Neural Network Classifier: Feasible for Diversity of Speech Applications. In 2020 IEEE Student Conference on Research and Development (SCOReD) (pp. 300-305). IEEE.
    [Google Scholar]
  22. Rathi, T., & Tripathy, M. (2024). Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review. Speech Communication, 103102.
    [Google Scholar]
  23. Vekkot, S., Gupta, D., Zakariah, M., & Alotaibi, Y. A. (2020). Emotional voice conversion using a hybrid framework with speaker-adaptive DNN and particle-swarm-optimized neural network. IEEE Access, 8, 74627-74647.
    [Google Scholar]
  24. Farooq, A., Khan, A. K., & Raja, G. (2013). Implementation of a speech based interface system for visually impaired persons. Life Science Journal, 10(9s).
    [Google Scholar]
  25. Vongprechakorn, K., Chumuang, N., & Farooq, A. (2019, October). Prediction model for amphetamine behaviors based on bayes network classifier. In 2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (pp. 1-6). IEEE.
    [Google Scholar]
  26. Khan, M. U., Hanbali, R., Sharma, S., Iqtidar, K., Aziz, S., & Farooq, A. (2022, November). Expert system for diagnosis of multiple neuromuscular disorders using emg signals. In 2022 14th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS) (pp. 1-5). IEEE.
    [Google Scholar]
  27. Farooq, A., Aroos, S., Mumtaz, L., Jafri, I., & Khaliq, A. (2022). Low-cost portable ecg monitoring device for inaccessible areas in pakistan. Sir Syed University Research Journal of Engineering & Technology, 12(1), 8-13.
    [Google Scholar]
  28. Cai, Y., Pan, S., Wang, X., Chen, H., Cai, X., & Zuo, M. (2020). Measuring distance-based semantic similarity using meronymy and hyponymy relations. Neural Computing and Applications, 32, 3521-3534.
    [Google Scholar]
  29. Farooq, A., & Villing, R. (2024, August). Challenges in zero-shot cross-domain transfer for plant disease classification. In IET Conference Proceedings CP887 (Vol. 2024, No. 10, pp. 331-334). Stevenage, UK: The Institution of Engineering and Technology.
    [Google Scholar]
  30. Saeed, M., Ahmed, N., Ali, D., Ramzan, M., Mohib, M., Bagga, K., Rahman, A. U., & Khan, I. M. (2024). In-depth Urdu Sentiment Analysis Through Multilingual BERT and Supervised Learning Approaches. IECE Transactions on Intelligent Systematics, 1(3), 161-175.
    [Google Scholar]
  31. Chang, X., Zhang, W., Qian, Y., Le Roux, J., & Watanabe, S. (2020, May). End-to-end multi-speaker speech recognition with transformer. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6134-6138). IEEE.
    [Google Scholar]
  32. Chumuang, N., Pramkeaw, P., & Farooq, A. (2019, November). Electrical impedance of breast’s tissue classification by using bootstrap aggregating. In 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) (pp. 551-556). IEEE.
    [Google Scholar]

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 139
PDF Downloads: 10

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
IECE or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
IECE Transactions on Intelligent Systematics

IECE Transactions on Intelligent Systematics

ISSN: 2998-3355 (Online) | ISSN: 2998-3320 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2024 Institute of Emerging and Computer Engineers Inc.