File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: A Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus

TitleA Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus
Authors
Keywordsbiomarker-driven
diabetes mellitus
interpretable
machine learning
prediction model
Issue Date30-Apr-2025
PublisherWiley Open Access
Citation
Food Science & Nutrition, 2025, v. 13, n. 5 How to Cite?
AbstractDiabetes is one of the leading causes of death and disability worldwide. Developing earlier and more accurate diagnosis methods is crucial for clinical prevention and treatment of diabetes. Here, data on biochemical indicators and physiological characteristics of 4335 participants from the National Health and Nutrition Examination Survey (NHANES) database from 2017 to 2020 were collected. After data preprocessing, the dataset was randomly divided into a training set (70%) and a test set (30%); then the Boruta algorithm was used to screen feature indicators on the training set. Next, three machine learning algorithms, including Random Forest (RF), Multi-Layer Perceptron (MLP), and Extreme Gradient Boosting (XGBoost) were employed to build predictive models through 10-fold cross-validation on the training dataset, followed by performance evaluation on the test dataset. The RF model exhibited the best performance, with an area under the curve (AUC) of 0.958 (95% CI: 0.943–0.973), a recall of 0.897, a specificity and F1 score of 0.916 and 0.747, respectively, and an overall accuracy of 0.913. Moreover, SHapley Additive exPlanations (SHAP) and Partial Dependency Plots (PDP) were applied to interpret the RF model to analyze the risk factors for diabetes. Glycohemoglobin, glucose, fasting glucose, age, cholesterol, osmolality, BMI, blood urea nitrogen, and insulin were found to exert the greatest influence on the prevalence of diabetes. Collectively, the RF model has considerable application prospects for the diagnosis of diabetes and can serve as a valuable supplementary tool for clinical diagnosis and risk assessment in diabetes.
Persistent Identifierhttp://hdl.handle.net/10722/367318

 

DC FieldValueLanguage
dc.contributor.authorXiao, Zhihui-
dc.contributor.authorWang, Mingfu-
dc.contributor.authorZhao, Yueliang-
dc.contributor.authorWang, Hui-
dc.date.accessioned2025-12-10T08:06:31Z-
dc.date.available2025-12-10T08:06:31Z-
dc.date.issued2025-04-30-
dc.identifier.citationFood Science & Nutrition, 2025, v. 13, n. 5-
dc.identifier.urihttp://hdl.handle.net/10722/367318-
dc.description.abstractDiabetes is one of the leading causes of death and disability worldwide. Developing earlier and more accurate diagnosis methods is crucial for clinical prevention and treatment of diabetes. Here, data on biochemical indicators and physiological characteristics of 4335 participants from the National Health and Nutrition Examination Survey (NHANES) database from 2017 to 2020 were collected. After data preprocessing, the dataset was randomly divided into a training set (70%) and a test set (30%); then the Boruta algorithm was used to screen feature indicators on the training set. Next, three machine learning algorithms, including Random Forest (RF), Multi-Layer Perceptron (MLP), and Extreme Gradient Boosting (XGBoost) were employed to build predictive models through 10-fold cross-validation on the training dataset, followed by performance evaluation on the test dataset. The RF model exhibited the best performance, with an area under the curve (AUC) of 0.958 (95% CI: 0.943–0.973), a recall of 0.897, a specificity and F1 score of 0.916 and 0.747, respectively, and an overall accuracy of 0.913. Moreover, SHapley Additive exPlanations (SHAP) and Partial Dependency Plots (PDP) were applied to interpret the RF model to analyze the risk factors for diabetes. Glycohemoglobin, glucose, fasting glucose, age, cholesterol, osmolality, BMI, blood urea nitrogen, and insulin were found to exert the greatest influence on the prevalence of diabetes. Collectively, the RF model has considerable application prospects for the diagnosis of diabetes and can serve as a valuable supplementary tool for clinical diagnosis and risk assessment in diabetes.-
dc.languageeng-
dc.publisherWiley Open Access-
dc.relation.ispartofFood Science & Nutrition-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectbiomarker-driven-
dc.subjectdiabetes mellitus-
dc.subjectinterpretable-
dc.subjectmachine learning-
dc.subjectprediction model-
dc.titleA Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus-
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1002/fsn3.70234-
dc.identifier.scopuseid_2-s2.0-105004209467-
dc.identifier.volume13-
dc.identifier.issue5-
dc.identifier.eissn2048-7177-
dc.identifier.issnl2048-7177-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats