-
CiteScore
3.42
Impact Factor
Volume 2, Issue 1, IECE Transactions on Emerging Topics in Artificial Intelligence
Volume 2, Issue 1, 2025
Submit Manuscript Edit a Special Issue
Academic Editor
Arbi Haza Nasution
Arbi Haza Nasution
Universitas Islam Riau, Pekanbaru, Indonesia
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
IECE Transactions on Emerging Topics in Artificial Intelligence, Volume 2, Issue 1, 2025: 43-56

Open Access | Research Article | 28 March 2025
NLP and AI for Public Health Intelligence: Automating Disease Surveillance from Unstructured Data
1 CYNOSOFT SOLUTIONS INC, Austin, TX 78750, United States
* Corresponding Author: Vijayalaxmi Methuku, [email protected]
Received: 02 March 2025, Accepted: 25 March 2025, Published: 28 March 2025  
Abstract
Public health surveillance is crucial for early disease detection, outbreak prediction, and epidemic response. However, traditional surveillance systems primarily rely on structured clinical data, limiting their capacity to capture emerging health threats from diverse and unstructured sources. This study explores the integration of Natural Language Processing (NLP) and Artificial Intelligence (AI) to automate disease surveillance by analyzing unstructured data, including electronic health records (EHRs), social media posts, news reports, and online health forums. Leveraging state-of-the-art NLP techniques—such as transformer-based language models, named entity recognition (NER), sentiment analysis, and topic modeling—an AI-driven surveillance framework is proposed to process, classify, and extract epidemiological insights from vast unstructured text streams in real time. The framework integrates multilingual data processing, anomaly detection, and geospatial trend analysis to enhance early warning capabilities for healthcare authorities. Its effectiveness is evaluated using benchmark datasets, such as the BioCaster Global Health Monitor, and real-world case studies on infectious disease outbreaks, demonstrating significant improvements in detection speed and accuracy. The findings highlight the transformative role of NLP and AI in advancing public health intelligence, improving disease surveillance scalability, and enabling proactive intervention strategies.

Graphical Abstract
NLP and AI for Public Health Intelligence: Automating Disease Surveillance from Unstructured Data

Keywords
natural language processing
artificial intelligence
public health surveillance
disease monitoring
unstructured data
social media analysis
electronic health records
epidemiological intelligence

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
Vijayalaxmi Methuku is an employee of CYNOSOFT SOLUTIONS INC, Austin, TX 78750, United States.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. World Health Organization. (2020). Public health surveillance for COVID-19: Interim guidance. WHO. Retrieved from https://www.who.int/publications/i/item/WHO-2019-nCoV-SurveillanceGuidance-2022.2
    [Google Scholar]
  2. Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific data, 3(1), 1-9.
    [CrossRef]   [Google Scholar]
  3. Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PloS one, 6(5), e19467.
    [CrossRef]   [Google Scholar]
  4. Bose, P., Roy, S., & Ghosh, P. (2021). A comparative NLP-based study on the current trends and future directions in COVID-19 research. Ieee Access, 9, 78341-78355.
    [CrossRef]   [Google Scholar]
  5. Freifeld, C. C., Mandl, K. D., Reis, B. Y., & Brownstein, J. S. (2008). HealthMap: Global infectious disease monitoring through automated classification and visualization of internet media reports. Journal of the American Medical Informatics Association, 15(2), 150-157.
    [CrossRef]   [Google Scholar]
  6. Wang, Z., Zhang, P., Huang, Y., Chao, G., Xie, X., & Fu, Y. (2023). Oriented transformer for infectious disease case prediction. Applied Intelligence, 53(24), 30097-30112.
    [CrossRef]   [Google Scholar]
  7. Ye, J., Hai, J., Wang, Z., Wei, C., & Song, J. (2023). Leveraging natural language processing and geospatial time series model to analyze COVID-19 vaccination sentiment dynamics on Tweets. JAMIA open, 6(2), ooad023.
    [CrossRef]   [Google Scholar]
  8. Myakala, P. K., Jonnalagadda, A. K., & Bura, C. (2024). Federated learning and data privacy: A review of challenges and opportunities. International Journal of Research Publication and Reviews, 5(12), 10-55248.
    [Google Scholar]
  9. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.
    [CrossRef]   [Google Scholar]
  10. Parwez, M. A., Abulaish, M., & Jahiruddin, J. (2020, December). A social media time-series data analytics approach for digital epidemiology. In 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) (pp. 852-859). IEEE.
    [Google Scholar]
  11. Huang, S., Cai, M., Xu, X., Wang, H., & Feng, J. (2022). EHR-NLP: A comprehensive survey on deep learning research and applications in electronic health records. Journal of Biomedical Informatics, 125, 103958.
    [CrossRef]   [Google Scholar]
  12. Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., ... & Poon, H. (2020). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 1(3), 1-23.
    [CrossRef]   [Google Scholar]
  13. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
    [CrossRef]   [Google Scholar]
  14. Kumar, V., Iqbal, M. I., & Rathore, R. (2025). Natural Language Processing (NLP) in Disease Detection—A Discussion of How NLP Techniques Can Be Used to Analyze and Classify Medical Text Data for Disease Diagnosis. AI in Disease Detection: Advancements and Applications, 53-75.
    [CrossRef]   [Google Scholar]
  15. Sheller, M. J., Reina, G. A., Edwards, B., Martin, J., & Bakas, S. (2020). Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Scientific Reports, 10, 12598.
    [CrossRef]   [Google Scholar]
  16. Benton, A., Hill, S., Ungar, L., & Hennessy, S. (2017). Ethical implications of social media health research. Big Data & Society, 4(2), 2053951717736338.
    [CrossRef]   [Google Scholar]
  17. Rasmy, L., Xiang, Y., Xie, Z., Tao, C., & Zhi, D. (2021). MedBERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Digital Medicine, 4(1), 1-13.
    [CrossRef]   [Google Scholar]
  18. Ismail, A. I., Soronnadi, A., Adekanmbi, O., Ibrahim, B. O., & Akanji, D. O. Geo-Semantic Analysis of Medical Research Trends in Nigeria. In 5th Workshop on African Natural Language Processing.
    [Google Scholar]
  19. Thomas, S. G., & Myakala, P. K. (2025). Beyond the Cloud: Federated Learning and Edge AI for the Next Decade. Journal of Computer and Communications, 13(2), 37-50.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Methuku, V. (2025). NLP and AI for Public Health Intelligence: Automating Disease Surveillance from Unstructured Data. IECE Transactions on Emerging Topics in Artificial Intelligence, 2(1), 43–56. https://doi.org/10.62762/TETAI.2025.222799

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 408
PDF Downloads: 92

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Emerging and Computer Engineers. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
IECE Transactions on Emerging Topics in Artificial Intelligence

IECE Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3066-1676 (Online) | ISSN: 3066-1668 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2025 Institute of Emerging and Computer Engineers Inc.