File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Smooth Bayesian network model for the prediction of future high-cost patients with COPD

TitleSmooth Bayesian network model for the prediction of future high-cost patients with COPD
Authors
KeywordsBayesian network
Complex causal relationships
COPD
Cost prediction
Data sparsity
Graphical representation
Health informatics
Machine learning
Smoothing
Temporal data
Issue Date2019
Citation
International Journal of Medical Informatics, 2019, v. 126, p. 147-155 How to Cite?
AbstractIntroduction: The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed. Objectives: We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods. Methods: We used the 2011–2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about “similar patients”). Results: The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found “region” (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals. Conclusion: The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.
Persistent Identifierhttp://hdl.handle.net/10722/328756
ISSN
2023 Impact Factor: 3.7
2023 SCImago Journal Rankings: 1.110
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLin, Shaochong-
dc.contributor.authorZhang, Qingpeng-
dc.contributor.authorChen, Frank-
dc.contributor.authorLuo, Li-
dc.contributor.authorChen, Lei-
dc.contributor.authorZhang, Wei-
dc.date.accessioned2023-07-22T06:23:40Z-
dc.date.available2023-07-22T06:23:40Z-
dc.date.issued2019-
dc.identifier.citationInternational Journal of Medical Informatics, 2019, v. 126, p. 147-155-
dc.identifier.issn1386-5056-
dc.identifier.urihttp://hdl.handle.net/10722/328756-
dc.description.abstractIntroduction: The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed. Objectives: We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods. Methods: We used the 2011–2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about “similar patients”). Results: The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found “region” (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals. Conclusion: The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.-
dc.languageeng-
dc.relation.ispartofInternational Journal of Medical Informatics-
dc.subjectBayesian network-
dc.subjectComplex causal relationships-
dc.subjectCOPD-
dc.subjectCost prediction-
dc.subjectData sparsity-
dc.subjectGraphical representation-
dc.subjectHealth informatics-
dc.subjectMachine learning-
dc.subjectSmoothing-
dc.subjectTemporal data-
dc.titleSmooth Bayesian network model for the prediction of future high-cost patients with COPD-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/j.ijmedinf.2019.03.017-
dc.identifier.pmid31029256-
dc.identifier.scopuseid_2-s2.0-85064082254-
dc.identifier.volume126-
dc.identifier.spage147-
dc.identifier.epage155-
dc.identifier.eissn1872-8243-
dc.identifier.isiWOS:000465414600018-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats