File Download
Supplementary

postgraduate thesis: Natural language processing algorithms for randomized trials

TitleNatural language processing algorithms for randomized trials
Authors
Advisors
Advisor(s):Pang, HMHWu, JTK
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wang, F. [王帆]. (2020). Natural language processing algorithms for randomized trials. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractRandomized controlled trial is the ‘gold standard’ study design in clinical and health research. Adequate reporting in trials can maximize the value of the trial findings to clinical and health care. Failure to comply with the CONSORT reporting guideline for randomized trials could complicate study interpretation and potentially impact subsequent research. The objective of this thesis is 1) to develop an automated reporting checklist generation tool using natural language processing, 2) to apply machine learning-based prediction algorithms to check for compliance with CONSORT reporting item 18, and 3) to perform machine learning prediction on whether a published study is positive or negative using only the content from the study’s abstract. This thesis explores natural language processing algorithms for randomized trials based on rules and machine learning approaches. The first study used published journal articles as training, testing, and validation sets to develop, refine, and evaluate our rule-based tool. 158 articles reporting randomized controlled trials were selected from high impact factor journals under the following categories: 1) Medicine, general and internal, 2) Oncology; and 3) Cardiac and cardiovascular systems. A graphical user interface for the tool was built using Java. For evaluating the performance of our method, we calculated an accuracy metric defined as the number of correct assessments divided by all assessments. Two case studies for randomized trials are provided as an illustration for the tool. Of the 30 fully implemented items, 28 (93%) have more than 90% accuracy on the validation set. The results showed that our tool performed well in the validation set evaluation of fully implemented reporting items in terms of accuracy. The second study used machine learning methods to 1) detect reporting of CONSORT Item 18 for ancillary analysis, and 2) use the study abstract to predict whether a study is positive. We used three levels of feature extraction engines, including word vectors, TF-IDF vectors, and word embedding. Results of several prediction classifiers, including naïve Bayes, linear logistic regression, support vector machine, random forests, gradient boosting, and convolutional neural network, were compared. The results of two sub-studies showed that the performance of feature extraction engines and prediction models varied on different tasks. A workflow for the text classification task was also proposed. Given this framework, our work is broadly applicable to articles in other medical and health categories outside of the three that we focused on. The results from the two studies showed that natural language processing algorithms could be applied to assist in the reporting of randomized trials and to better utilize the information from randomized trials. The use of natural language processing could help users save substantial time when generating the CONSORT checklist, narrow the search scope and reduce the manual effort when screening for appropriate articles. Findings of the thesis also provide some good guidance for future applications of artificial intelligence and machine learning techniques to other medical and health literature.
DegreeMaster of Philosophy
SubjectNatural language processing (Computer science)
Clinical trials
Machine learning
Artificial intelligence - Medical applications
Dept/ProgramPublic Health
Persistent Identifierhttp://hdl.handle.net/10722/302556

 

DC FieldValueLanguage
dc.contributor.advisorPang, HMH-
dc.contributor.advisorWu, JTK-
dc.contributor.authorWang, Fan-
dc.contributor.author王帆-
dc.date.accessioned2021-09-07T03:41:27Z-
dc.date.available2021-09-07T03:41:27Z-
dc.date.issued2020-
dc.identifier.citationWang, F. [王帆]. (2020). Natural language processing algorithms for randomized trials. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/302556-
dc.description.abstractRandomized controlled trial is the ‘gold standard’ study design in clinical and health research. Adequate reporting in trials can maximize the value of the trial findings to clinical and health care. Failure to comply with the CONSORT reporting guideline for randomized trials could complicate study interpretation and potentially impact subsequent research. The objective of this thesis is 1) to develop an automated reporting checklist generation tool using natural language processing, 2) to apply machine learning-based prediction algorithms to check for compliance with CONSORT reporting item 18, and 3) to perform machine learning prediction on whether a published study is positive or negative using only the content from the study’s abstract. This thesis explores natural language processing algorithms for randomized trials based on rules and machine learning approaches. The first study used published journal articles as training, testing, and validation sets to develop, refine, and evaluate our rule-based tool. 158 articles reporting randomized controlled trials were selected from high impact factor journals under the following categories: 1) Medicine, general and internal, 2) Oncology; and 3) Cardiac and cardiovascular systems. A graphical user interface for the tool was built using Java. For evaluating the performance of our method, we calculated an accuracy metric defined as the number of correct assessments divided by all assessments. Two case studies for randomized trials are provided as an illustration for the tool. Of the 30 fully implemented items, 28 (93%) have more than 90% accuracy on the validation set. The results showed that our tool performed well in the validation set evaluation of fully implemented reporting items in terms of accuracy. The second study used machine learning methods to 1) detect reporting of CONSORT Item 18 for ancillary analysis, and 2) use the study abstract to predict whether a study is positive. We used three levels of feature extraction engines, including word vectors, TF-IDF vectors, and word embedding. Results of several prediction classifiers, including naïve Bayes, linear logistic regression, support vector machine, random forests, gradient boosting, and convolutional neural network, were compared. The results of two sub-studies showed that the performance of feature extraction engines and prediction models varied on different tasks. A workflow for the text classification task was also proposed. Given this framework, our work is broadly applicable to articles in other medical and health categories outside of the three that we focused on. The results from the two studies showed that natural language processing algorithms could be applied to assist in the reporting of randomized trials and to better utilize the information from randomized trials. The use of natural language processing could help users save substantial time when generating the CONSORT checklist, narrow the search scope and reduce the manual effort when screening for appropriate articles. Findings of the thesis also provide some good guidance for future applications of artificial intelligence and machine learning techniques to other medical and health literature.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNatural language processing (Computer science)-
dc.subject.lcshClinical trials-
dc.subject.lcshMachine learning-
dc.subject.lcshArtificial intelligence - Medical applications-
dc.titleNatural language processing algorithms for randomized trials-
dc.typePG_Thesis-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplinePublic Health-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044291215103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats