-
CiteScore
1.44
Impact Factor
IECE Transactions on Intelligent Systematics, 2024, Volume 2, Issue 1: 1-13

Free to Read | Research Article | 22 December 2024
1 Georgia Institute of Technology, Atlanta, GA 30332, United States
2 Faculty of Management, McGill University, Montreal, QC H3B0C7, Canada
3 Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, United States
4 School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97333, United States
5 University of Pennsylvania, Philadelphia, PA 19104, United States
6 College of Engineering, Northeastern University, Boston, MA 02115, United States
7 Department of Electrical and Computer Engineering, University of California, San Diego, CA 92037, United States
† Huadong Pang and Li Zhou contributed equally to this work
* Corresponding Author: Huadong Pang, [email protected]
Received: 17 October 2024, Accepted: 05 December 2024, Published: 22 December 2024  
Abstract
In the healthcare sector, the application of deep learning technologies has revolutionized data analysis and disease forecasting. This is particularly evident in the field of diabetes, where the deep analysis of Electronic Health Records (EHR) has unlocked new opportunities for early detection and effective intervention strategies. Our research presents an innovative model that synergizes the capabilities of Bidirectional Long Short-Term Memory Networks-Conditional Random Field (BiLSTM-CRF) with a fusion of XGBoost and Logistic Regression. This model is designed to enhance the accuracy of diabetes risk prediction by conducting an in-depth analysis of electronic medical records data. The first phase of our approach involves employing BiLSTM-CRF to delve into the temporal characteristics and latent patterns present in EHR data. This method effectively uncovers the progression trends of diabetes, which are often hidden in the complex data structures of medical records. The second phase leverages the combined strength of XGBoost and Logistic Regression to classify these extracted features and evaluate associated risks. This dual approach facilitates a more nuanced and precise prediction of diabetes, outperforming traditional models, particularly in handling multifaceted and nonlinear medical datasets. Our research demonstrates a notable advancement in diabetes prediction over traditional methods, showcasing the effectiveness of our combined BiLSTM-CRF, XGBoost, and Logistic Regression model. This study highlights the value of data-driven strategies in clinical decision-making, equipping healthcare professionals with precise tools for early detection and intervention. By enabling personalized treatment and timely care, our approach signifies progress in incorporating advanced analytics in healthcare, potentially improving outcomes for diabetes and other chronic conditions.

Graphical Abstract
Electronic Health Records-Based Data-Driven Diabetes Knowledge Unveiling and Risk Prognosis

Keywords
deep learning
electronic health records
BiLSTM-CRF
XGBoost
healthcare analytics

Funding
This work was supported without any funding.

Cite This Article
APA Style
Pang, H., Zhou, L., Dong, Y., Chen, P., Gu, D., Lyu, T. & Zhang, H. (2024). Electronic Health Records-Based Data-Driven Diabetes Knowledge Unveiling and Risk Prognosis. IECE Transactions on Intelligent Systematics, 2(1), 1–13. https://doi.org/10.62762/TIS.2025.367320

References
  1. Colombo, F., Oderkirk, J., & Slawomirski, L. (2020). Health information systems, electronic medical records, and big data in global healthcare: Progress and challenges in oecd countries. Handbook of global health, 1-31.
    [Google Scholar]
  2. Auffray, C., Balling, R., Barroso, I., Bencze, L., Benson, M., Bergeron, J., ... & Zanetti, G. (2016). Making sense of big data in health research: towards an EU action plan. Genome medicine, 8, 1-13.
    [Google Scholar]
  3. Roski, J., Bo-Linn, G. W., & Andrews, T. A. (2014). Creating value in health care through big data: opportunities and policy implications. Health affairs, 33(7), 1115-1122.
    [Google Scholar]
  4. Heitmueller, A., Henderson, S., Warburton, W., Elmagarmid, A., Pentland, A. S., & Darzi, A. (2014). Developing public policy to advance the use of big data in health care. Health Affairs, 33(9), 1523-1530.
    [Google Scholar]
  5. Andreu-Perez, J., Poon, C. C., Merrifield, R. D., Wong, S. T., & Yang, G. Z. (2015). Big data for health. IEEE journal of biomedical and health informatics, 19(4), 1193-1208.
    [Google Scholar]
  6. Safran, C., Bloomrosen, M., Hammond, W. E., Labkoff, S., Markel-Fox, S., Tang, P. C., & Detmer, D. E. (2007). Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. Journal of the American Medical Informatics Association, 14(1), 1-9.
    [Google Scholar]
  7. Graffy, J., Eaton, S., Sturt, J., & Chadwick, P. (2009). Personalized care planning for diabetes: policy lessons from systematic reviews of consultation and self-management interventions. Primary Health Care Research & Development, 10(3), 210-222.
    [Google Scholar]
  8. Hu, J., Perer, A., & Wang, F. (2016). Data driven analytics for personalized healthcare. Healthcare Information Management Systems: Cases, Strategies, and Solutions, 529-554.
    [Google Scholar]
  9. Woldaregay, A. Z., Årsand, E., Walderhaug, S., Albers, D., Mamykina, L., Botsis, T., & Hartvigsen, G. (2019). Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes. Artificial intelligence in medicine, 98, 109-134.
    [Google Scholar]
  10. Gatiti, P., Ndirangu, E., Mwangi, J., Mwanzu, A., & Ramadhani, T. (2021). Enhancing healthcare quality in hospitals through electronic health records: a systematic review. Libraries.
    [Google Scholar]
  11. Kruse, C. S., Goswamy, R., Raval, Y. J., & Marawi, S. (2016). Challenges and opportunities of big data in health care: a systematic review. JMIR medical informatics, 4(4), e5359.
    [Google Scholar]
  12. Kumari, J., Kumar, E., & Kumar, D. (2023). A structured analysis to study the role of machine learning and deep learning in the healthcare sector with big data analytics. Archives of Computational Methods in Engineering, 30(6), 3673-3701.
    [Google Scholar]
  13. Peng, X., Xu, Q., Feng, Z., Zhao, H., Tan, L., Zhou, Y., ... & Zheng, Y. (2024). Automatic News Generation and Fact-Checking System Based on Language Processing. arXiv preprint arXiv:2405.10492.
    [Google Scholar]
  14. Majnarić, L. T., Babič, F., O’Sullivan, S., & Holzinger, A. (2021). AI and big data in healthcare: towards a more comprehensive research framework for multimorbidity. Journal of Clinical Medicine, 10(4), 766.
    [Google Scholar]
  15. Kreimeyer, K., Foster, M., Pandey, A., Arya, N., Halford, G., Jones, S. F., ... & Botsis, T. (2017). Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review. Journal of biomedical informatics, 73, 14-29.
    [Google Scholar]
  16. Sun, W., Cai, Z., Li, Y., Liu, F., Fang, S., & Wang, G. (2018). Data processing and text mining technologies on electronic medical records: a review. Journal of healthcare engineering, 2018(1), 4302425.
    [Google Scholar]
  17. Juhn, Y., & Liu, H. (2020). Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. Journal of Allergy and Clinical Immunology, 145(2), 463-469.
    [Google Scholar]
  18. Zou, H., Zhang, M., Farzamkia, S., & Huang, A. Q. (2024, February). Simplified Fixed Frequency Phase Shift Modulation for A Novel Single-Stage Single Phase Series-Resonant AC-DC Converter. In 2024 IEEE Applied Power Electronics Conference and Exposition (APEC) (pp. 1261-1268). IEEE.
    [Google Scholar]
  19. Kamalraj, R., Neelakandan, S., Kumar, M. R., Rao, V. C. S., Anand, R., & Singh, H. (2021). Interpretable filter based convolutional neural network (IF-CNN) for glucose prediction and classification using PD-SS algorithm. Measurement, 183, 109804.
    [Google Scholar]
  20. Henrard, S., Speybroeck, N., & Hermans, C. (2015). Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia. Haemophilia, 21(6), 715-722.
    [Google Scholar]
  21. Kang, Y., McHugh, M. D., Chittams, J., & Bowles, K. H. (2016). Utilizing home healthcare electronic health records for telehomecare patients with heart failure: a decision tree approach to detect associations with rehospitalizations. CIN: Computers, Informatics, Nursing, 34(4), 175-182.
    [Google Scholar]
  22. Zhang, D., Yin, C., Zeng, J., Yuan, X., & Zhang, P. (2020). Combining structured and unstructured data for predictive models: a deep learning approach. BMC medical informatics and decision making, 20, 1-11.
    [Google Scholar]
  23. Guo, A., Beheshti, R., Khan, Y. M., Langabeer, J. R., & Foraker, R. E. (2021). Predicting cardiovascular health trajectories in time-series electronic health records with LSTM models. BMC medical informatics and decision making, 21, 1-10.
    [Google Scholar]
  24. Ning, E., Wang, C., Zhang, H., Ning, X., & Tiwari, P. (2024). Occluded person re-identification with deep learning: a survey and perspectives. Expert systems with applications, 239, 122419.
    [Google Scholar]
  25. Latif, J., Xiao, C., Tu, S., Rehman, S. U., Imran, A., & Bilal, A. (2020). Implementation and use of disease diagnosis systems for electronic medical records based on machine learning: A complete review. IEEE Access, 8, 150489-150513.
    [Google Scholar]
  26. Zhang, P., Wang, X., Ya, J., Zhao, J., Liu, T., & Shi, J. (2021, December). Darknet public hazard entity recognition based on deep learning. In Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications (pp. 94-100).
    [Google Scholar]
  27. Qin, Y., & Zeng, Y. (2018). Research of clinical named entity recognition based on Bi-LSTM-CRF. Journal of Shanghai Jiaotong University (Science), 23, 392-397.
    [Google Scholar]
  28. Zhang, H., Ning, X., Wang, C., Ning, E., & Li, L. (2024). Deformation depth decoupling network for point cloud domain adaptation. Neural Networks, 180, 106626.
    [Google Scholar]
  29. Wang, J., Deng, H., Liu, B., Hu, A., Liang, J., Fan, L., ... & Lei, J. (2020). Systematic evaluation of research progress on natural language processing in medicine over the past 20 years: bibliometric study on PubMed. Journal of medical Internet research, 22(1), e16816.
    [Google Scholar]
  30. Yuanyuan, F., & Zhongmin, L. I. (2022). Research and application progress of Chinese medical knowledge graph. Journal of Frontiers of Computer Science & Technology, 16(10), 2219.
    [Google Scholar]
  31. Yao, Z., Yang, C., Peng, Y., Zhang, X., & Chen, F. (2023). A data-driven fault detection approach for Modular Reconfigurable Flying Array based on the Improved Deep Forest. Measurement, 206, 112217.
    [Google Scholar]
  32. Jin, X. B., Gong, W. T., Kong, J. L., Bai, Y. T., & Su, T. L. (2022). PFVAE: a planar flow-based variational auto-encoder prediction model for time series data. Mathematics, 10(4), 610.
    [Google Scholar]
  33. Otero, F. E., Freitas, A. A., & Johnson, C. G. (2012). Inducing decision trees with an ant colony optimization algorithm. Applied Soft Computing, 12(11), 3615-3626.
    [Google Scholar]
  34. Enayati, M., Bozorg-Haddad, O., Pourgholam-Amiji, M., Zolghadr-Asli, B., & Tahmasebi Nasab, M. (2022). Decision tree (DT): a valuable tool for water resources engineering. In Computational Intelligence for Water and Environmental Sciences (pp. 201-223). Singapore: Springer Nature Singapore.
    [Google Scholar]
  35. Zhou, Y., Wang, Z., Zheng, S., Zhou, L., Dai, L., Luo, H., ... & Sui, M. (2024). Optimization of automated garbage recognition model based on resnet-50 and weakly supervised cnn for sustainable urban development. Alexandria Engineering Journal, 108, 415-427.
    [Google Scholar]
  36. Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-means clustering algorithm. IEEE access, 8, 80716-80727.
    [Google Scholar]
  37. Ning, X., Tian, W., Yu, Z., Li, W., Bai, X., & Wang, Y. (2022). HCFNN: high-order coverage function neural network for image classification. Pattern Recognition, 131, 108873.
    [Google Scholar]
  38. Sober, E. (2002, January). Bayesianism—Its scope and limits. In Proceedings-British Academy (Vol. 113, pp. 21-38). OXFORD UNIVERSITY PRESS INC..
    [Google Scholar]
  39. Goyal, D., Choudhary, A., Pabla, B. S., & Dhami, S. S. (2020). Support vector machines based non-contact fault diagnosis system for bearings. Journal of Intelligent Manufacturing, 31, 1275-1289.
    [Google Scholar]
  40. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
    [Google Scholar]
  41. Li, Z., Dong, M., Wen, S., Hu, X., Zhou, P., & Zeng, Z. (2019). CLU-CNNs: Object detection for medical images. Neurocomputing, 350, 53-59.
    [Google Scholar]
  42. Singh, S. P., Wang, L., Gupta, S., Goli, H., Padmanabhan, P., & Gulyás, B. (2020). 3D deep learning on medical images: a review. Sensors, 20(18), 5097.
    [Google Scholar]
  43. Xu, Y., Wu, G., & Chen, Y. (2022). Predicting patients’ satisfaction with doctors in online medical communities: An approach based on XGBoost algorithm. Journal of Organizational and End User Computing (JOEUC), 34(4), 1-17.
    [Google Scholar]
  44. Madan, P., Singh, V., Chaudhari, V., Albagory, Y., Dumka, A., Singh, R., ... & AlGhamdi, A. S. (2022). An optimization-based diabetes prediction model using CNN and Bi-directional LSTM in real-time environment. Applied Sciences, 12(8), 3989.
    [Google Scholar]
  45. Ju, R., Zhou, P., Wen, S., Wei, W., Xue, Y., Huang, X., & Yang, X. (2020). 3D-CNN-SPP: A patient risk prediction system from electronic health records via 3D CNN and spatial pyramid pooling. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(2), 247-261.
    [Google Scholar]
  46. Rasmy, L., Xiang, Y., Xie, Z., Tao, C., & Zhi, D. (2021). Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine, 4(1), 86.
    [Google Scholar]

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 296
PDF Downloads: 45

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
IECE or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
IECE Transactions on Intelligent Systematics

IECE Transactions on Intelligent Systematics

ISSN: 2998-3355 (Online) | ISSN: 2998-3320 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2024 Institute of Emerging and Computer Engineers Inc.