-
CiteScore
3.42
Impact Factor
Volume 2, Issue 2, IECE Transactions on Emerging Topics in Artificial Intelligence
Volume 2, Issue 2, 2025
Submit Manuscript Edit a Special Issue
Academic Editor
Aytuğ Onan
Aytuğ Onan
İzmir Katip Celebi University, Turkey
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
IECE Transactions on Emerging Topics in Artificial Intelligence, Volume 2, Issue 2, 2025: 57-67

Open Access | Research Article | 15 April 2025
Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment
1 School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
2 Homesite Group Inc, GA 30043, United States
3 Payment Department, Chewy Inc, MA 02210, United States
4 School of Information, Yunnan University of Finance and Economics, Yunnan 650000, China
5 School of Information Science and Technology, Yunnan University, Yunnan 650000, China
† These authors contributed equally to this work
* Corresponding Author: Chengwei Ye, [email protected]
Received: 09 March 2025, Accepted: 11 April 2025, Published: 15 April 2025  
Abstract
Predicting personality traits automatically has emerged as a challenging problem in computer vision. This paper introduces an innovative multimodal feature learning framework for personality analysis in short video clips. For visual processing, we construct a facial graph and design a Geo-based two-stream network incorporating an attention mechanism, leveraging both Graph Convolutional Networks (GCN) and Convolutional Neural Networks (CNN) to capture static facial expressions. Additionally, ResNet18 and VGGFace networks are employed to extract global scene and facial appearance features at the frame level. To capture dynamic temporal information, we integrate a BiGRU with a temporal attention module for extracting salient frame representations. To further enhance the model’s robustness, we incorporate the VGGish CNN for audio-based features and XLM-Roberta for text-based features. Finally, a multimodal channel attention mechanism is introduced to integrate different modalities, and a Multi-Layer Perceptron (MLP) regression model is utilized to predict personality traits. Experimental results confirm that our proposed framework surpasses existing state-of-the-art approaches in performance.

Graphical Abstract
Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment

Keywords
personality prediction
facial graph
graph convolutional network(GCN)
convolutional neural network(CNN)
attention mechanism
geometric and appearance features

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
Chengwei Ye is an employee of Homesite Group Inc, GA 30043, United States and Huanzhen Zhang is an employee of Payment Department, Chewy Inc, MA 02210, United States.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Wei, X. S., Zhang, C. L., Zhang, H., & Wu, J. (2017). Deep bimodal regression of apparent personality traits from short video sequences. IEEE Transactions on Affective Computing, 9(3), 303-315.
    [CrossRef]   [Google Scholar]
  2. Kaya, H., Gurpinar, F., & Ali Salah, A. (2017). Multi-modal score fusion and decision trees for explainable automatic job candidate screening from video cvs. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 1-9).
    [CrossRef]   [Google Scholar]
  3. Güçlütürk, Y., Güçlü, U., Baro, X., Escalante, H. J., Guyon, I., Escalera, S., ... & Van Lier, R. (2017). Multimodal first impression analysis with deep residual networks. IEEE Transactions on Affective Computing, 9(3), 316-329.
    [CrossRef]   [Google Scholar]
  4. Ventura, C., Masip, D., & Lapedriza, A. (2017). Interpreting cnn models for apparent personality trait regression. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 55-63).
    [CrossRef]   [Google Scholar]
  5. Subramaniam, A., Patel, V., Mishra, A., Balasubramanian, P., & Mittal, A. (2016). Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 337-348). Springer International Publishing.
    [CrossRef]   [Google Scholar]
  6. Suman, C., Saha, S., Gupta, A., Pandey, S. K., & Bhattacharyya, P. (2022). A multi-modal personality prediction system. Knowledge-Based Systems, 236, 107715.
    [CrossRef]   [Google Scholar]
  7. Escalante, H. J., Kaya, H., Salah, A. A., Escalera, S., Güçlütürk, Y., Güçlü, U., ... & Van Lier, R. (2020). Modeling, recognizing, and explaining apparent personality from videos. IEEE Transactions on Affective Computing, 13(2), 894-911.
    [CrossRef]   [Google Scholar]
  8. Mujtaba, D. F., & Mahapatra, N. R. (2021, December). Multi-task deep neural networks for multimodal personality trait prediction. In 2021 international conference on computational science and computational intelligence (CSCI) (pp. 85-91). IEEE.
    [CrossRef]   [Google Scholar]
  9. Agrawal, T., Agarwal, D., Balazia, M., Sinha, N., & Bremond, F. (2021). Multimodal personality recognition using cross-attention transformer and behaviour encoding. arXiv preprint arXiv:2112.12180.
    [Google Scholar]
  10. Diwakar, M. P., & Gupta, B. (2024, January). VGGish Deep Learning Model: Audio Feature Extraction and Analysis. In International Conference on Data Management, Analytics & Innovation (pp. 59-70). Singapore: Springer Nature Singapore.
    [CrossRef]   [Google Scholar]
  11. Qu, S., Yang, Y., & Que, Q. (2021). Emotion Classification for Spanish with XLM-RoBERTa and TextCNN. In IberLEF@ SEPLN (pp. 94-100).
    [Google Scholar]
  12. Parkhi, O., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC 2015-Proceedings of the British Machine Vision Conference 2015. British Machine Vision Association.
    [Google Scholar]
  13. Eddine Bekhouche, S., Dornaika, F., Ouafi, A., & Taleb-Ahmed, A. (2017). Personality traits and job candidate screening via analyzing facial videos. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 10-13).
    [CrossRef]   [Google Scholar]
  14. Gürpinar, F., Kaya, H., & Salah, A. A. (2016, December). Multimodal fusion of audio, scene, and face features for first impression estimation. In 2016 23rd International conference on pattern recognition (ICPR) (pp. 43-48). IEEE.
    [CrossRef]   [Google Scholar]
  15. Güçlütürk, Y., Güçlü, U., van Gerven, M. A., & van Lier, R. (2016). Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 349-358). Springer International Publishing.
    [CrossRef]   [Google Scholar]
  16. Zhang, C. L., Zhang, H., Wei, X. S., & Wu, J. (2016, October). Deep bimodal regression for apparent personality analysis. In European conference on computer vision (pp. 311-324). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  17. Vo, N. N., Liu, S., He, X., & Xu, G. (2018). Multimodal mixture density boosting network for personality mining. In Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part I 22 (pp. 644-655). Springer International Publishing.
    [CrossRef]   [Google Scholar]
  18. Principi, R. D. P., Palmero, C., Junior, J. C. J., & Escalera, S. (2019). On the effect of observed subject biases in apparent personality analysis from audio-visual signals. IEEE Transactions on Affective Computing, 12(3), 607-621.
    [CrossRef]   [Google Scholar]
  19. Zhao, X., Liao, Y., Tang, Z., Xu, Y., Tao, X., Wang, D., ... & Lu, H. (2023). Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning. Frontiers in Neuroscience, 16, 1107284.
    [CrossRef]   [Google Scholar]
  20. Ponce-López, V., Chen, B., Oliu, M., Corneanu, C., Clapés, A., Guyon, I., ... & Escalera, S. (2016). Chalearn lap 2016: First round challenge on first impressions-dataset and results. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 400-418). Springer International Publishing.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Wang, K., Ye, C., Zhang, H., Xu, L., & Liu, S. (2025). Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment. IECE Transactions on Emerging Topics in Artificial Intelligence, 2(2), 57–67. https://doi.org/10.62762/TETAI.2025.279350

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 173
PDF Downloads: 21

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Emerging and Computer Engineers. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
IECE Transactions on Emerging Topics in Artificial Intelligence

IECE Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3066-1676 (Online) | ISSN: 3066-1668 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2025 Institute of Emerging and Computer Engineers Inc.