Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment

Kangsheng Wang; Chengwei Ye; Huanzhen Zhang; Linuo Xu; Shuyan Liu

doi:10.62762/TETAI.2025.279350

CiteScore

3.42

Impact Factor

Volume 2, Issue 2, IECE Transactions on Emerging Topics in Artificial Intelligence

Volume 2, Issue 2, 2025

Submit Manuscript Edit a Special Issue

Academic Editor

Aytuğ Onan

İzmir Katip Celebi University, Turkey

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion Simultaneous Spatiotemporal Bias Compensation and Data Fusion for Asynchronous Multisensor Systems YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Extraction of Motion Information from Occupancy Grid Map Using Keystone Transform

IECE Transactions on Emerging Topics in Artificial Intelligence, Volume 2, Issue 2, 2025: 57-67

Open Access | Research Article | 15 April 2025

Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment

Kangsheng Wang 1,†

Chengwei Ye 2,† *

Huanzhen Zhang 3,†

Linuo Xu 4

Shuyan Liu 5

1 School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China

2 Homesite Group Inc, GA 30043, United States

3 Payment Department, Chewy Inc, MA 02210, United States

4 School of Information, Yunnan University of Finance and Economics, Yunnan 650000, China

5 School of Information Science and Technology, Yunnan University, Yunnan 650000, China

† These authors contributed equally to this work

* Corresponding Author: Chengwei Ye, [email protected]

DOI: 10.62762/TETAI.2025.279350

Received: 09 March 2025, Accepted: 11 April 2025, Published: 15 April 2025

PDF (1.86 MB) Full-Text HTML XML

Article Metrics Cite This Article

Abstract

Predicting personality traits automatically has emerged as a challenging problem in computer vision. This paper introduces an innovative multimodal feature learning framework for personality analysis in short video clips. For visual processing, we construct a facial graph and design a Geo-based two-stream network incorporating an attention mechanism, leveraging both Graph Convolutional Networks (GCN) and Convolutional Neural Networks (CNN) to capture static facial expressions. Additionally, ResNet18 and VGGFace networks are employed to extract global scene and facial appearance features at the frame level. To capture dynamic temporal information, we integrate a BiGRU with a temporal attention module for extracting salient frame representations. To further enhance the model’s robustness, we incorporate the VGGish CNN for audio-based features and XLM-Roberta for text-based features. Finally, a multimodal channel attention mechanism is introduced to integrate different modalities, and a Multi-Layer Perceptron (MLP) regression model is utilized to predict personality traits. Experimental results confirm that our proposed framework surpasses existing state-of-the-art approaches in performance.

Graphical Abstract

Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment

Keywords

personality prediction

facial graph

graph convolutional network(GCN)

convolutional neural network(CNN)

attention mechanism

geometric and appearance features

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

Chengwei Ye is an employee of Homesite Group Inc, GA 30043, United States and Huanzhen Zhang is an employee of Payment Department, Chewy Inc, MA 02210, United States.

Ethical Approval and Consent to Participate

Not applicable.

References

Wei, X. S., Zhang, C. L., Zhang, H., & Wu, J. (2017). Deep bimodal regression of apparent personality traits from short video sequences. IEEE Transactions on Affective Computing, 9(3), 303-315.
[CrossRef] [Google Scholar]
Kaya, H., Gurpinar, F., & Ali Salah, A. (2017). Multi-modal score fusion and decision trees for explainable automatic job candidate screening from video cvs. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 1-9).
[CrossRef] [Google Scholar]
Güçlütürk, Y., Güçlü, U., Baro, X., Escalante, H. J., Guyon, I., Escalera, S., ... & Van Lier, R. (2017). Multimodal first impression analysis with deep residual networks. IEEE Transactions on Affective Computing, 9(3), 316-329.
[CrossRef] [Google Scholar]
Ventura, C., Masip, D., & Lapedriza, A. (2017). Interpreting cnn models for apparent personality trait regression. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 55-63).
[CrossRef] [Google Scholar]
Subramaniam, A., Patel, V., Mishra, A., Balasubramanian, P., & Mittal, A. (2016). Bi-modal first impressions recognition using temporally ordered deep audio and stochastic visual features. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 337-348). Springer International Publishing.
[CrossRef] [Google Scholar]
Suman, C., Saha, S., Gupta, A., Pandey, S. K., & Bhattacharyya, P. (2022). A multi-modal personality prediction system. Knowledge-Based Systems, 236, 107715.
[CrossRef] [Google Scholar]
Escalante, H. J., Kaya, H., Salah, A. A., Escalera, S., Güçlütürk, Y., Güçlü, U., ... & Van Lier, R. (2020). Modeling, recognizing, and explaining apparent personality from videos. IEEE Transactions on Affective Computing, 13(2), 894-911.
[CrossRef] [Google Scholar]
Mujtaba, D. F., & Mahapatra, N. R. (2021, December). Multi-task deep neural networks for multimodal personality trait prediction. In 2021 international conference on computational science and computational intelligence (CSCI) (pp. 85-91). IEEE.
[CrossRef] [Google Scholar]
Agrawal, T., Agarwal, D., Balazia, M., Sinha, N., & Bremond, F. (2021). Multimodal personality recognition using cross-attention transformer and behaviour encoding. arXiv preprint arXiv:2112.12180.
[Google Scholar]
Diwakar, M. P., & Gupta, B. (2024, January). VGGish Deep Learning Model: Audio Feature Extraction and Analysis. In International Conference on Data Management, Analytics & Innovation (pp. 59-70). Singapore: Springer Nature Singapore.
[CrossRef] [Google Scholar]
Qu, S., Yang, Y., & Que, Q. (2021). Emotion Classification for Spanish with XLM-RoBERTa and TextCNN. In IberLEF@ SEPLN (pp. 94-100).
[Google Scholar]
Parkhi, O., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC 2015-Proceedings of the British Machine Vision Conference 2015. British Machine Vision Association.
[Google Scholar]
Eddine Bekhouche, S., Dornaika, F., Ouafi, A., & Taleb-Ahmed, A. (2017). Personality traits and job candidate screening via analyzing facial videos. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 10-13).
[CrossRef] [Google Scholar]
Gürpinar, F., Kaya, H., & Salah, A. A. (2016, December). Multimodal fusion of audio, scene, and face features for first impression estimation. In 2016 23rd International conference on pattern recognition (ICPR) (pp. 43-48). IEEE.
[CrossRef] [Google Scholar]
Güçlütürk, Y., Güçlü, U., van Gerven, M. A., & van Lier, R. (2016). Deep impression: Audiovisual deep residual networks for multimodal apparent personality trait recognition. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 349-358). Springer International Publishing.
[CrossRef] [Google Scholar]
Zhang, C. L., Zhang, H., Wei, X. S., & Wu, J. (2016, October). Deep bimodal regression for apparent personality analysis. In European conference on computer vision (pp. 311-324). Cham: Springer International Publishing.
[CrossRef] [Google Scholar]
Vo, N. N., Liu, S., He, X., & Xu, G. (2018). Multimodal mixture density boosting network for personality mining. In Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part I 22 (pp. 644-655). Springer International Publishing.
[CrossRef] [Google Scholar]
Principi, R. D. P., Palmero, C., Junior, J. C. J., & Escalera, S. (2019). On the effect of observed subject biases in apparent personality analysis from audio-visual signals. IEEE Transactions on Affective Computing, 12(3), 607-621.
[CrossRef] [Google Scholar]
Zhao, X., Liao, Y., Tang, Z., Xu, Y., Tao, X., Wang, D., ... & Lu, H. (2023). Integrating audio and visual modalities for multimodal personality trait recognition via hybrid deep learning. Frontiers in Neuroscience, 16, 1107284.
[CrossRef] [Google Scholar]
Ponce-López, V., Chen, B., Oliu, M., Corneanu, C., Clapés, A., Guyon, I., ... & Escalera, S. (2016). Chalearn lap 2016: First round challenge on first impressions-dataset and results. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14 (pp. 400-418). Springer International Publishing.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Wang, K., Ye, C., Zhang, H., Xu, L., & Liu, S. (2025). Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment. IECE Transactions on Emerging Topics in Artificial Intelligence, 2(2), 57–67. https://doi.org/10.62762/TETAI.2025.279350

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 21

Publisher's Note

IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Copyright © 2025 by the Author(s). Published by Institute of Emerging and Computer Engineers. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

IECE Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3066-1676 (Online) | ISSN: 3066-1668 (Print)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies