File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: Effect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients

TitleEffect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients
Authors
Keywords18F-FDG PET
Radiomics
Re-sampling techniques
Imbalanced datasets
Head and neck cancer
Issue Date2020
PublisherSpringer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00259/index.htm
Citation
European Journal of Nuclear Medicine and Molecular Imaging, 2020, v. 47, p. 2826-2835 How to Cite?
AbstractPurpose: Biomedical data frequently contain imbalance characteristics which make achieving good predictive performance with data-driven machine learning approaches a challenging task. In this study, we investigated the impact of re-sampling techniques for imbalanced datasets in PET radiomics-based prognostication model in head and neck (HNC) cancer patients. Methods: Radiomics analysis was performed in two cohorts of patients, including 166 patients newly diagnosed with nasopharyngeal carcinoma (NPC) in our centre and 182 HNC patients from open database. Conventional PET parameters and robust radiomics features were extracted for correlation analysis of the overall survival (OS) and disease progression-free survival (DFS). We investigated a cross-combination of 10 re-sampling methods (oversampling, undersampling, and hybrid sampling) with 4 machine learning classifiers for survival prediction. Diagnostic performance was assessed in hold-out test sets. Statistical differences were analysed using Monte Carlo cross-validations by post hoc Nemenyi analysis. Results: Oversampling techniques like ADASYN and SMOTE could improve prediction performance in terms of G-mean and F-measures in minority class, without significant loss of F-measures in majority class. We identified optimal PET radiomics-based prediction model of OS (AUC of 0.82, G-mean of 0.77) for our NPC cohort. Similar findings that oversampling techniques improved the prediction performance were seen when this was tested on an external dataset indicating generalisability. Conclusion: Our study showed a significant positive impact on the prediction performance in imbalanced datasets by applying re-sampling techniques. We have created an open-source solution for automated calculations and comparisons of multiple re-sampling techniques and machine learning classifiers for easy replication in future studies.
Persistent Identifierhttp://hdl.handle.net/10722/281982
ISSN
2021 Impact Factor: 10.057
2020 SCImago Journal Rankings: 2.313
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorXIE, C-
dc.contributor.authorDu, R-
dc.contributor.authorHo, JWK-
dc.contributor.authorPang, HH-
dc.contributor.authorChiu, KWH-
dc.contributor.authorLee, EYP-
dc.contributor.authorVardhanabhuti, V-
dc.date.accessioned2020-04-19T03:33:43Z-
dc.date.available2020-04-19T03:33:43Z-
dc.date.issued2020-
dc.identifier.citationEuropean Journal of Nuclear Medicine and Molecular Imaging, 2020, v. 47, p. 2826-2835-
dc.identifier.issn1619-7070-
dc.identifier.urihttp://hdl.handle.net/10722/281982-
dc.description.abstractPurpose: Biomedical data frequently contain imbalance characteristics which make achieving good predictive performance with data-driven machine learning approaches a challenging task. In this study, we investigated the impact of re-sampling techniques for imbalanced datasets in PET radiomics-based prognostication model in head and neck (HNC) cancer patients. Methods: Radiomics analysis was performed in two cohorts of patients, including 166 patients newly diagnosed with nasopharyngeal carcinoma (NPC) in our centre and 182 HNC patients from open database. Conventional PET parameters and robust radiomics features were extracted for correlation analysis of the overall survival (OS) and disease progression-free survival (DFS). We investigated a cross-combination of 10 re-sampling methods (oversampling, undersampling, and hybrid sampling) with 4 machine learning classifiers for survival prediction. Diagnostic performance was assessed in hold-out test sets. Statistical differences were analysed using Monte Carlo cross-validations by post hoc Nemenyi analysis. Results: Oversampling techniques like ADASYN and SMOTE could improve prediction performance in terms of G-mean and F-measures in minority class, without significant loss of F-measures in majority class. We identified optimal PET radiomics-based prediction model of OS (AUC of 0.82, G-mean of 0.77) for our NPC cohort. Similar findings that oversampling techniques improved the prediction performance were seen when this was tested on an external dataset indicating generalisability. Conclusion: Our study showed a significant positive impact on the prediction performance in imbalanced datasets by applying re-sampling techniques. We have created an open-source solution for automated calculations and comparisons of multiple re-sampling techniques and machine learning classifiers for easy replication in future studies.-
dc.languageeng-
dc.publisherSpringer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00259/index.htm-
dc.relation.ispartofEuropean Journal of Nuclear Medicine and Molecular Imaging-
dc.rightsThis is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. The final authenticated version is available online at: http://dx.doi.org/[insert DOI]-
dc.subject18F-FDG PET-
dc.subjectRadiomics-
dc.subjectRe-sampling techniques-
dc.subjectImbalanced datasets-
dc.subjectHead and neck cancer-
dc.titleEffect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients-
dc.typeArticle-
dc.identifier.emailHo, JWK: jwkho@hku.hk-
dc.identifier.emailPang, HH: herbpang@hku.hk-
dc.identifier.emailChiu, KWH: kwhchiu@hku.hk-
dc.identifier.emailLee, EYP: eyplee77@hku.hk-
dc.identifier.emailVardhanabhuti, V: varv@hku.hk-
dc.identifier.authorityHo, JWK=rp02436-
dc.identifier.authorityPang, HH=rp01857-
dc.identifier.authorityChiu, KWH=rp02074-
dc.identifier.authorityLee, EYP=rp01456-
dc.identifier.authorityVardhanabhuti, V=rp01900-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1007/s00259-020-04756-4-
dc.identifier.scopuseid_2-s2.0-85083363317-
dc.identifier.hkuros309738-
dc.identifier.volume47-
dc.identifier.spage2826-
dc.identifier.epage2835-
dc.identifier.isiWOS:000524374000001-
dc.publisher.placeGermany-
dc.identifier.issnl1619-7070-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats