Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

Adeoye, J; Hui, LL; Su, YX

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/s40537-023-00703-w
Scopus: eid_2-s2.0-85149912660
WOS: WOS:000943314200001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Faculty of Dentistry: Journal/Magazine Articles

See more details

Article: Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

Title	Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
Authors	Adeoye, J Hui, LL Su, YX
Keywords	Artificial intelligence Data quality Data-centric AI Head and neck cancer Machine learning Review
Issue Date	4-Mar-2023
Publisher	SpringerOpen
Citation	Journal of Big Data, 2023, v. 10, n. 1 How to Cite? DOI: http://dx.doi.org/10.1186/s40537-023-00703-w
Abstract	Machine learning models have been increasingly considered to model head and neck cancer outcomes for improved screening, diagnosis, treatment, and prognostication of the disease. As the concept of data-centric artificial intelligence is still incipient in healthcare systems, little is known about the data quality of the models proposed for clinical utility. This is important as it supports the generalizability of the models and data standardization. Therefore, this study overviews the quality of structured and unstructured data used for machine learning model construction in head and neck cancer. Relevant studies reporting on the use of machine learning models based on structured and unstructured custom datasets between January 2016 and June 2022 were sourced from PubMed, EMBASE, Scopus, and Web of Science electronic databases. Prediction model Risk of Bias Assessment (PROBAST) tool was used to assess the quality of individual studies before comprehensive data quality parameters were assessed according to the type of dataset used for model construction. A total of 159 studies were included in the review; 106 utilized structured datasets while 53 utilized unstructured datasets. Data quality assessments were deliberately performed for 14.2% of structured datasets and 11.3% of unstructured datasets before model construction. Class imbalance and data fairness were the most common limitations in data quality for both types of datasets while outlier detection and lack of representative outcome classes were common in structured and unstructured datasets respectively. Furthermore, this review found that class imbalance reduced the discriminatory performance for models based on structured datasets while higher image resolution and good class overlap resulted in better model performance using unstructured datasets during internal validation. Overall, data quality was infrequently assessed before the construction of ML models in head and neck cancer irrespective of the use of structured or unstructured datasets. To improve model generalizability, the assessments discussed in this study should be introduced during model construction to achieve data-centric intelligent systems for head and neck cancer management.
Persistent Identifier	http://hdl.handle.net/10722/337591
ISSN	2196-1115 2023 Impact Factor: 8.6 2023 SCImago Journal Rankings: 2.068
ISI Accession Number ID	WOS:000943314200001

DC Field	Value	Language
dc.contributor.author	Adeoye, J	-
dc.contributor.author	Hui, LL	-
dc.contributor.author	Su, YX	-
dc.date.accessioned	2024-03-11T10:22:19Z	-
dc.date.available	2024-03-11T10:22:19Z	-
dc.date.issued	2023-03-04	-
dc.identifier.citation	Journal of Big Data, 2023, v. 10, n. 1	-
dc.identifier.issn	2196-1115	-
dc.identifier.uri	http://hdl.handle.net/10722/337591	-
dc.description.abstract	Machine learning models have been increasingly considered to model head and neck cancer outcomes for improved screening, diagnosis, treatment, and prognostication of the disease. As the concept of data-centric artificial intelligence is still incipient in healthcare systems, little is known about the data quality of the models proposed for clinical utility. This is important as it supports the generalizability of the models and data standardization. Therefore, this study overviews the quality of structured and unstructured data used for machine learning model construction in head and neck cancer. Relevant studies reporting on the use of machine learning models based on structured and unstructured custom datasets between January 2016 and June 2022 were sourced from PubMed, EMBASE, Scopus, and Web of Science electronic databases. Prediction model Risk of Bias Assessment (PROBAST) tool was used to assess the quality of individual studies before comprehensive data quality parameters were assessed according to the type of dataset used for model construction. A total of 159 studies were included in the review; 106 utilized structured datasets while 53 utilized unstructured datasets. Data quality assessments were deliberately performed for 14.2% of structured datasets and 11.3% of unstructured datasets before model construction. Class imbalance and data fairness were the most common limitations in data quality for both types of datasets while outlier detection and lack of representative outcome classes were common in structured and unstructured datasets respectively. Furthermore, this review found that class imbalance reduced the discriminatory performance for models based on structured datasets while higher image resolution and good class overlap resulted in better model performance using unstructured datasets during internal validation. Overall, data quality was infrequently assessed before the construction of ML models in head and neck cancer irrespective of the use of structured or unstructured datasets. To improve model generalizability, the assessments discussed in this study should be introduced during model construction to achieve data-centric intelligent systems for head and neck cancer management.	-
dc.language	eng	-
dc.publisher	SpringerOpen	-
dc.relation.ispartof	Journal of Big Data	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Artificial intelligence	-
dc.subject	Data quality	-
dc.subject	Data-centric AI	-
dc.subject	Head and neck cancer	-
dc.subject	Machine learning	-
dc.subject	Review	-
dc.title	Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer	-
dc.type	Article	-
dc.identifier.doi	10.1186/s40537-023-00703-w	-
dc.identifier.scopus	eid_2-s2.0-85149912660	-
dc.identifier.volume	10	-
dc.identifier.issue	1	-
dc.identifier.eissn	2196-1115	-
dc.identifier.isi	WOS:000943314200001	-
dc.publisher.place	LONDON	-
dc.identifier.issnl	2196-1115	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats