-
CiteScore
2.36
Impact Factor
Volume 2, Issue 2, Chinese Journal of Information Fusion
Volume 2, Issue 2, 2025
Submit Manuscript Edit a Special Issue
Academic Editor
Yulong Huang
Yulong Huang
Harbin Engineering University, China
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
Chinese Journal of Information Fusion, Volume 2, Issue 2, 2025: 112-126

Open Access | Research Article | 26 April 2025
Using Psycholinguistic Clues to Index Deep Semantic Evidences: Personality Detection in Social Media Texts
1 School of Cyber Security, University of Chinese Academy of Sciences, Beijing 101408, China
2 Australian Institute for Machine Learning, The University of Adelaide, Adelaide 5005, Australia
3 Institute of Information Fusion, Naval Aviation University, Yantai 264001, China
4 Computer Network Information Center, Chinese Academy of Sciences, Beijing 100085, China
5 School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China
6 Institute for Network Science and Cyberspace, Tsinghua University, Beijing 100084, China
* Corresponding Authors: Xinlong Pan, [email protected] ; Yihua Du, [email protected]
Received: 04 March 2025, Accepted: 15 April 2025, Published: 26 April 2025  
Abstract
Detecting personalities in social media content is an important application of personality psychology. Most early studies apply a coherent piece of writing to personality detection, but today, the challenge is to identify dominant personality traits from a series of short, noisy social media posts. To this end, recent studies have attempted to individually encode the deep semantics of posts, often using attention-based methods, and then relate them, or directly assemble them into graph structures. However, due to the inherently disjointed and noisy nature of social media content, constructing meaningful connections remains challenging. While such methods rely on well-defined relationships between posts, effectively capturing these connections in fragmented and sparse content is non-trivial, particularly under limited supervision or noisy input. To tackle this, we draw inspiration from the scanning reading technique—commonly recommended for efficiently processing large volumes of information—and propose an index attention mechanism as a solution. This mechanism leverages prior psycholinguistic knowledge as an “index” to guide attention, thereby enabling more effective information fusion across scattered semantic signals. Building on this idea, we introduce the Index Attention Network (IAN)—a novel framework designed to infer personality labels by performing targeted information fusion over deep semantic representations of individual posts. Through a series of experiments, IAN achieved state-of-the-art performance on the Kaggle dataset and performance comparable to graph convolutional networks (GCN) on the Pandora dataset. Notably, IAN delivered an average improvement of 13% in terms of macro-F1 scores with the Kaggle dataset. The code for IAN is available at GitHub: https://github.com/Once2gain/IAN.

Graphical Abstract
Using Psycholinguistic Clues to Index Deep Semantic Evidences: Personality Detection in Social Media Texts

Keywords
personality detection
attention mechanism
social media text mining
information fusion

Data Availability Statement
The source code used in this study is publicly available on GitHub at the following link: https://github.com/Once2gain/IAN.

Funding
This work was supported by the Natural Science Foundation of Shandong Province under Grant ZR2020MF154.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
This study utilizes an anonymized public dataset, which is publicly available and does not contain any personally identifiable information. As the dataset is fully anonymized and used without the collection of personal data from individuals, ethical approval is not required for this research.

References
  1. Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of artificial intelligence research, 30, 457-500.
    [CrossRef]   [Google Scholar]
  2. Holtgraves, T. (2011). Text messaging, personality, and the social context. Journal of research in personality, 45(1), 92-99.
    [CrossRef]   [Google Scholar]
  3. Lee, C. H., Kim, K., Seo, Y. S., & Chung, C. K. (2007). The relations between personality and language use. The Journal of general psychology, 134(4), 405-413.
    [CrossRef]   [Google Scholar]
  4. Fast, L. A., & Funder, D. C. (2008). Personality as manifest in word use: Correlations with self-report, acquaintance report, and behavior. Journal of personality and social psychology, 94(2), 334.
    [Google Scholar]
  5. Schnurr, P. P., Rosenberg, S. D., Oxman, T. E., & Tucker, G. J. (1986). A methodological note on content analysis: Estimates of reliability. Journal of personality assessment, 50(4), 601-609.
    [CrossRef]   [Google Scholar]
  6. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: language use as an individual difference. Journal of personality and social psychology, 77(6), 1296.
    [Google Scholar]
  7. Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual review of psychology, 54(1), 547-577.
    [CrossRef]   [Google Scholar]
  8. Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life. Journal of personality and social psychology, 90(5), 862.
    [Google Scholar]
  9. Hirsh, J. B., & Peterson, J. B. (2009). Personality and language use in self-narratives. Journal of research in personality, 43(3), 524-527.
    [CrossRef]   [Google Scholar]
  10. Ireland, M. E., & Pennebaker, J. W. (2010). Language style matching in writing: synchrony in essays, correspondence, and poetry. Journal of personality and social psychology, 99(3), 549.
    [Google Scholar]
  11. Nowson, S., & Oberlander, J. (2007, March). Identifying more bloggers: Towards large scale personality classification of personal weblogs. In Proceedings of the international conference on weblogs and social.
    [Google Scholar]
  12. Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of research in personality, 44(3), 363-373.
    [CrossRef]   [Google Scholar]
  13. Golbeck, J., Robles, C., & Turner, K. (2011). Predicting personality with social media. In CHI'11 extended abstracts on human factors in computing systems (pp. 253-262).
    [CrossRef]   [Google Scholar]
  14. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.
    [CrossRef]   [Google Scholar]
  15. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1), 24-54.
    [CrossRef]   [Google Scholar]
  16. Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001), 2001.
    [Google Scholar]
  17. Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., & Howard, N. (2013). Common sense knowledge based personality recognition from text. In Advances in Soft Computing and Its Applications: 12th Mexican International Conference on Artificial Intelligence, MICAI 2013, Mexico City, Mexico, November 24-30, 2013, Proceedings, Part II 12 (pp. 484-496). Springer Berlin Heidelberg.
    [CrossRef]   [Google Scholar]
  18. Majumder, N., Poria, S., Gelbukh, A., & Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE intelligent systems, 32(2), 74-79.
    [CrossRef]   [Google Scholar]
  19. Watson, D., & Clark, L. A. (1992). On traits and temperament: General and specific factors of emotional experience and their relation to the five‐factor model. Journal of personality, 60(2), 441-476.
    [CrossRef]   [Google Scholar]
  20. Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., ... & Seligman, M. E. (2015). Automatic personality assessment through social media language. Journal of personality and social psychology, 108(6), 934.
    [Google Scholar]
  21. Sun, X., Liu, B., Cao, J., Luo, J., & Shen, X. (2018, May). Who am I? Personality detection based on deep learning for texts. In 2018 IEEE international conference on communications (ICC) (pp. 1-6). IEEE.
    [CrossRef]   [Google Scholar]
  22. Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020, November). Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. In 2020 IEEE international conference on data mining (ICDM) (pp. 1184-1189). IEEE.
    [CrossRef]   [Google Scholar]
  23. Celli, F., Pianesi, F., Stillwell, D., & Kosinski, M. (2013). Workshop on computational personality recognition: Shared task. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 7, No. 2, pp. 2-5).
    [CrossRef]   [Google Scholar]
  24. Oberlander, J., & Nowson, S. (2006, July). Whose thumb is it anyway? Classifying author personality from weblog text. In Proceedings of the COLING/ACL 2006 main conference poster sessions (pp. 627-634).
    [Google Scholar]
  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
    [Google Scholar]
  26. Christian, H., Suhartono, D., Chowanda, A., & Zamli, K. Z. (2021). Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging. Journal of Big Data, 8(1), 68.
    [CrossRef]   [Google Scholar]
  27. Han, S., Huang, H., & Tang, Y. (2020). Knowledge of words: An interpretable approach for personality recognition from social media. Knowledge-Based Systems, 194, 105550.
    [CrossRef]   [Google Scholar]
  28. Gjurković, M., & Šnajder, J. (2018, June). Reddit: A gold mine for personality prediction. In Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media (pp. 87-97).
    [CrossRef]   [Google Scholar]
  29. Gjurković, M., Karan, M., Vukojević, I., Bošnjak, M., & Šnajder, J. (2020). PANDORA talks: Personality and demographics on Reddit. arXiv preprint arXiv:2004.04460.
    [Google Scholar]
  30. Yang, T., Deng, J., Quan, X., & Wang, Q. (2023, June). Orders are unwanted: dynamic deep graph convolutional network for personality detection. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 11, pp. 13896-13904).
    [CrossRef]   [Google Scholar]
  31. Yang, F., Quan, X., Yang, Y., & Yu, J. (2021, May). Multi-document transformer for personality detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 16, pp. 14221-14229).
    [CrossRef]   [Google Scholar]
  32. Yang, T., Yang, F., Ouyang, H., & Quan, X. (2021). Psycholinguistic tripartite graph network for personality detection. arXiv preprint arXiv:2106.04963.
    [Google Scholar]
  33. Zhu, Y., Hu, L., Ge, X., Peng, W., & Wu, B. (2022). Contrastive Graph Transformer Network for Personality Detection. In IJCAI (pp. 4559-4565).
    [Google Scholar]
  34. Lynn, V., Balasubramanian, N., & Schwartz, H. A. (2020, July). Hierarchical modeling for user personality prediction: The role of message-level attention. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 5306-5316).
    [CrossRef]   [Google Scholar]
  35. Ren, Z., Shen, Q., Diao, X., & Xu, H. (2021). A sentiment-aware deep learning approach for personality detection from text. Information Processing & Management, 58(3), 102532.
    [CrossRef]   [Google Scholar]
  36. Štajner, S., & Yenikent, S. (2021, April). Why is MBTI personality detection from texts a difficult task?. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume (pp. 3580-3589).
    [CrossRef]   [Google Scholar]
  37. Liu, Y., Wang, J., & Jiang, Y. (2016). PT-LDA: A latent variable model to predict personality traits of social network users. Neurocomputing, 210, 155-163.
    [CrossRef]   [Google Scholar]
  38. Zhao, J., Zeng, D., Xiao, Y., Che, L., & Wang, M. (2020). User personality prediction based on topic preference and sentiment analysis using LSTM model. Pattern Recognition Letters, 138, 397-402.
    [CrossRef]   [Google Scholar]
  39. Gill, A., Nowson, S., & Oberlander, J. (2009, March). What are they blogging about? Personality, topic and motivation in blogs. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 3, No. 1, pp. 18-25).
    [CrossRef]   [Google Scholar]
  40. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).
    [CrossRef]   [Google Scholar]
  41. Campello, R. J., Moulavi, D., & Sander, J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Berlin, Heidelberg: Springer Berlin Heidelberg.
    [CrossRef]   [Google Scholar]
  42. El-Demerdash, K., El-Khoribi, R. A., Shoman, M. A. I., & Abdou, S. (2022). Deep learning based fusion strategies for personality prediction. Egyptian Informatics Journal, 23(1), 47-53.
    [CrossRef]   [Google Scholar]
  43. KN, P. K., & Gavrilova, M. L. (2021). Latent personality traits assessment from social network activity using contextual language embedding. IEEE Transactions on Computational Social Systems, 9(2), 638-649.
    [CrossRef]   [Google Scholar]
  44. Cambria, E., Liu, Q., Decherchi, S., Xing, F., & Kwok, K. (2022). SenticNet 7: A commonsense-based neurosymbolic AI framework for explainable sentiment analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 3829–3839).
    [Google Scholar]
  45. Susanto, Y., Livingstone, A. G., Ng, B. C., & Cambria, E. (2020). The hourglass model revisited. IEEE Intelligent Systems, 35(5), 96-102.
    [CrossRef]   [Google Scholar]
  46. Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3), 436-465.
    [CrossRef]   [Google Scholar]
  47. Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
    [CrossRef]   [Google Scholar]
  48. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
    [Google Scholar]
  49. Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., ... & Auli, M. (2019). fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038.
    [Google Scholar]

Cite This Article
APA Style
Tang, Q., Jiang, W., Pan, X., Lin, L., Zhu, J., Du, Y., & Sun, D. (2025). Using Psycholinguistic Clues to Index Deep Semantic Evidences: Personality Detection in Social Media Texts. Chinese Journal of Information Fusion, 2(2), 112–126. https://doi.org/10.62762/CJIF.2025.820998

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 14
PDF Downloads: 4

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Emerging and Computer Engineers. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Chinese Journal of Information Fusion

Chinese Journal of Information Fusion

ISSN: 2998-3371 (Online) | ISSN: 2998-3363 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2025 Institute of Emerging and Computer Engineers Inc.