A restricted Boltzmann machine based method for efficient processing of large biomedical datasets

Lu, Jianliang; 鲁建亮

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: A restricted Boltzmann machine based method for efficient processing of large biomedical datasets

Title	A restricted Boltzmann machine based method for efficient processing of large biomedical datasets
Authors	Lu, Jianliang 鲁建亮
Advisors	Advisor(s):Lam, TW Luo, R
Issue Date	2021
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Lu, J. [鲁建亮]. (2021). A restricted Boltzmann machine based method for efficient processing of large biomedical datasets. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	In the biomedical and biomedicine fields, big data processing and analysis has been widely recognized as a fundamental but challenging task. As missing or lost values in datasets are inevitable for various reasons in biomedical studies and clinical practice, the imputation of missing data is critical for providing appropriate datasets in biomedical studies. Incorrect imputation may affect the accuracy of data analysis and results prediction. Previously, a number of algorithms and tools were used to impute missing data. But most of them focused on datasets with low interaction of variables, or a small number of samples or variables. These problems limit further application of the existing methods in more complicated biomedical studies. Also, risk prediction models are increasingly used in diagnosis and prognosis of diseases, clinical interventions, and so forth. The quality of the missing data imputation affects modeling performance. In this study, I developed a Restricted Boltzmann Machine (RBM)-based methodology that can use biomedical datasets to impute missing values and predict disease risk. The new RBM algorithm can process continuous and categorical data simultaneously. The RBM approach is more effective than six existing imputation algorithms for imputing missing values in six disease-related datasets. In particular, this method takes much less time to impute datasets containing a large number of samples or variables than five other algorithms. I also built a Deep Belief Network based on the modified RBM algorithm to predict the risk of human disease. Using the same datasets, this model makes better predictions. The RBM-based method was applied to analyze pregnancy and live birth prediction for embryos after in-vitro fertilization. The results show the efficiency of the algorithm in missing data imputation and risk prediction, and show its potential as an alternative method of embryo selection for transfer in-vitro fertilization. Thus, this study provides guidance for biomedical programs containing complex data structures with a large diversity of data types, patients and other variables.
Degree	Master of Philosophy
Subject	Missing observations (Statistics) Multiple imputation (Statistics) Neural networks (Computer science)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/335986

DC Field	Value	Language
dc.contributor.advisor	Lam, TW	-
dc.contributor.advisor	Luo, R	-
dc.contributor.author	Lu, Jianliang	-
dc.contributor.author	鲁建亮	-
dc.date.accessioned	2023-12-29T04:05:25Z	-
dc.date.available	2023-12-29T04:05:25Z	-
dc.date.issued	2021	-
dc.identifier.citation	Lu, J. [鲁建亮]. (2021). A restricted Boltzmann machine based method for efficient processing of large biomedical datasets. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/335986	-
dc.description.abstract	In the biomedical and biomedicine fields, big data processing and analysis has been widely recognized as a fundamental but challenging task. As missing or lost values in datasets are inevitable for various reasons in biomedical studies and clinical practice, the imputation of missing data is critical for providing appropriate datasets in biomedical studies. Incorrect imputation may affect the accuracy of data analysis and results prediction. Previously, a number of algorithms and tools were used to impute missing data. But most of them focused on datasets with low interaction of variables, or a small number of samples or variables. These problems limit further application of the existing methods in more complicated biomedical studies. Also, risk prediction models are increasingly used in diagnosis and prognosis of diseases, clinical interventions, and so forth. The quality of the missing data imputation affects modeling performance. In this study, I developed a Restricted Boltzmann Machine (RBM)-based methodology that can use biomedical datasets to impute missing values and predict disease risk. The new RBM algorithm can process continuous and categorical data simultaneously. The RBM approach is more effective than six existing imputation algorithms for imputing missing values in six disease-related datasets. In particular, this method takes much less time to impute datasets containing a large number of samples or variables than five other algorithms. I also built a Deep Belief Network based on the modified RBM algorithm to predict the risk of human disease. Using the same datasets, this model makes better predictions. The RBM-based method was applied to analyze pregnancy and live birth prediction for embryos after in-vitro fertilization. The results show the efficiency of the algorithm in missing data imputation and risk prediction, and show its potential as an alternative method of embryo selection for transfer in-vitro fertilization. Thus, this study provides guidance for biomedical programs containing complex data structures with a large diversity of data types, patients and other variables.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Missing observations (Statistics)	-
dc.subject.lcsh	Multiple imputation (Statistics)	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.title	A restricted Boltzmann machine based method for efficient processing of large biomedical datasets	-
dc.type	PG_Thesis	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2022	-
dc.identifier.mmsid	991044494007303414	-

File Download

Supplementary

postgraduate thesis: A restricted Boltzmann machine based method for efficient processing of large biomedical datasets

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats