-
CiteScore
3.38
Impact Factor
Volume 2, Issue 1, IECE Transactions on Emerging Topics in Artificial Intelligence
Volume 2, Issue 1, 2025
Submit Manuscript Edit a Special Issue
Academic Editor
Jawad Khan
Jawad Khan
Gachon University, South Korea
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
IECE Transactions on Emerging Topics in Artificial Intelligence, 2025, Volume 2, Issue 1: 16-25

Free to Read | Research Article | 26 February 2025
NMRGen: A Generative Modeling Framework for Molecular Structure Prediction from NMR Spectra
1 Datalink Research and Technology Lab, Islamkot 69240, Sindh, Pakistan
* Corresponding Author: Raja Vavekanand, [email protected]
Received: 29 November 2024, Accepted: 14 February 2025, Published: 26 February 2025  
Abstract
Interpreting NMR spectra to accurately predict molecular structures remains a significant challenge in chemistry due to the complexity of spectral data and the need for precise structural elucidation. This study introduces NMRGen, a generative modeling framework that predicts molecular structures from NMR spectra and molecular formulas. The framework combines a SMILES autoencoder (GRU-based encoder-decoder) and an NMR encoder (CNN and DNN layers) to map spectral data to molecular representations. The SMILES autoencoder compresses and reconstructs SMILES strings, while the NMR encoder processes NMR spectra to generate latent vectors aligned with those from the SMILES encoder. Experiments were conducted using NMR spectra and SMILES datasets. The model was trained in three stages: (1) training the SMILES autoencoder, (2) aligning latent vectors from the NMR encoder, and (3) simultaneous training of both components. Results revealed that while the SMILES autoencoder performed adequately, the NMR encoder struggled to map spectral data effectively. Most generated SMILES strings were invalid, with valid ones primarily consisting of carbon chains (e.g., CCC...C). The Tanimoto coefficient between generated and target molecules ranged from 0.1 to 0.2, indicating low similarity. Despite these limitations, NMRGen demonstrates the potential of generative models for molecular structure prediction. Future work will focus on improving performance through larger datasets, advanced loss functions, and enhanced architectures.

Graphical Abstract
NMRGen: A Generative Modeling Framework for Molecular Structure Prediction from NMR Spectra

Keywords
generative modeling
molecular structure
NMR
AI in chemistry.

Funding
This work was supported without any funding.

Cite This Article
APA Style
Vavekanand, R. (2025). NMRGen: A Generative Modeling Framework for Molecular Structure Prediction from NMR Spectra. IECE Transactions on Emerging Topics in Artificial Intelligence, 2(1), 16–25. https://doi.org/10.62762/TETAI.2024.277656

References
  1. Weininger, D. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1), 31-36.
    [CrossRef]   [Google Scholar]
  2. Cereto-Massagué, A., Ojeda, M. J., Valls, C., Mulero, M., Garcia-Vallvé, S., & Pujadas, G. (2015). Molecular fingerprint similarity search in virtual screening. Methods, 71, 58-63.
    [CrossRef]   [Google Scholar]
  3. Yao, L., Yang, M., Song, J., Yang, Z., Sun, H., Shi, H., ... & Wang, X. (2023). Conditional molecular generation net enables automated structure elucidation based on 13C NMR spectra and prior knowledge. Analytical chemistry, 95(12), 5393-5401.
    [CrossRef]   [Google Scholar]
  4. Gao, P., Zhang, J., Peng, Q., Zhang, J., & Glezakou, V. A. (2020). A general protocol for the accurate prediction of molecular 13C/1H NMR chemical shifts via machine learning augmented DFT. Journal of Chemical Information and Modeling, 60(8), 3746-3754.
    [CrossRef]   [Google Scholar]
  5. Bajusz, D., Rácz, A., & Héberger, K. (2015). Why is the Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of cheminformatics, 7, 1-13.
    [CrossRef]   [Google Scholar]
  6. Vavekanand, R. (2024). A Machine Learning Approach for Imputing ECG Missing Healthcare Data. Available at SSRN 4822530. https://dx.doi.org/10.2139/ssrn.4822530
    [Google Scholar]
  7. Xue, X., Sun, H., Yang, M., Liu, X., Hu, H. Y., Deng, Y., & Wang, X. (2023). Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective. Analytical Chemistry, 95(37), 13733-13745.
    [CrossRef]   [Google Scholar]
  8. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.
    [CrossRef]   [Google Scholar]
  9. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
    [CrossRef]   [Google Scholar]
  10. Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 1-6.
    [CrossRef]   [Google Scholar]
  11. Smith, S. G., & Goodman, J. M. (2010). Assigning stereochemistry to single diastereoisomers by GIAO NMR calculation: The DP4 probability. Journal of the American Chemical Society, 132(37), 12946-12959.
    [CrossRef]   [Google Scholar]
  12. Zimmerman, D. E., Kulikowski, C. A., Huang, Y., Feng, W., Tashiro, M., Shimotakahara, S., ... & Montelione, G. T. (1997). Automated analysis of protein NMR assignments using methods from artificial intelligence. Journal of molecular biology, 269(4), 592-610.
    [CrossRef]   [Google Scholar]
  13. Howarth, A., & Goodman, J. M. (2022). The DP5 probability, quantification, and visualisation of structural uncertainty in single molecules. Chemical Science, 13(12), 3507-3518.
    [CrossRef]   [Google Scholar]
  14. Zhang, C., Idelbayev, Y., Roberts, N., Tao, Y., Nannapaneni, Y., Duggan, B. M., ... & Gerwick, W. H. (2017). Small molecule accurate recognition technology (SMART) to enhance natural products research. Scientific reports, 7(1), 14243.
    [CrossRef]   [Google Scholar]
  15. Bruguière, A., Derbré, S., Dietsch, J., Leguy, J., Rahier, V., Pottier, Q., ... & Richomme, P. (2020). MixONat, a software for the dereplication of mixtures based on 13C NMR spectroscopy. Analytical Chemistry, 92(13), 8793-8801.
    [CrossRef]   [Google Scholar]
  16. Meiler, J., & Will, M. (2002). Genius: a genetic algorithm for automated structure elucidation from 13C NMR spectra. Journal of the American Chemical Society, 124(9), 1868-1870.
    [CrossRef]   [Google Scholar]
  17. Zhang, J., Terayama, K., Sumita, M., Yoshizoe, K., Ito, K., Kikuchi, J., & Tsuda, K. (2020). NMR-TS: de novo molecule identification from NMR spectra. Science and technology of advanced materials, 21(1), 552-561.
    [CrossRef]   [Google Scholar]
  18. Lampen, P., Lambert, J., Lancashire, R. J., McDonald, R. S., McIntyre, P. S., Rutledge, D. N., ... & Davies, A. N. (1999). An extension to the JCAMP-DX standard file format, JCAMP-DX V. 5.01. Pure and Applied Chemistry, 71(8), 1549-1556.
    [CrossRef]   [Google Scholar]
  19. Litsa, E., Chenthamarakshan, V., Das, P., & Kavraki, L. (2021). Spec2Mol: An end-to-end deep learning framework for translating MS/MS Spectra to de-novo molecules.
    [CrossRef]   [Google Scholar]
  20. Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., ... & Aspuru-Guzik, A. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2), 268-276.
    [CrossRef]   [Google Scholar]
  21. Wishart, D. S., Sayeeda, Z., Budinski, Z., Guo, A., Lee, B. L., Berjanskii, M., ... & Cort, J. R. (2022). NP-MRD: the natural products magnetic resonance database. Nucleic Acids Research, 50(D1), D665-D677.
    [CrossRef]   [Google Scholar]
  22. Alberts, M., Zipoli, F., & Vaucher, A. C. (2023). Learning the Language of NMR: Structure Elucidation from NMR spectra using Transformer Models.
    [CrossRef]   [Google Scholar]
  23. Huang, Z., Chen, M. S., Woroch, C. P., Markland, T. E., & Kanan, M. W. (2021). A framework for automated structure elucidation from routine NMR spectra. Chemical Science, 12(46), 15329-15338.
    [CrossRef]   [Google Scholar]

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 180
PDF Downloads: 48

Publisher's Note
IECE stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions
Institute of Emerging and Computer Engineers (IECE) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
IECE Transactions on Emerging Topics in Artificial Intelligence

IECE Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3066-1676 (Online) | ISSN: 3066-1668 (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/iece/

Copyright © 2025 Institute of Emerging and Computer Engineers Inc.