File Download
Supplementary

postgraduate thesis: Integration and processing of large-scale biomedical data

TitleIntegration and processing of large-scale biomedical data
Authors
Advisors
Advisor(s):Pan, JWang, WP
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhang, W. [张闻华]. (2023). Integration and processing of large-scale biomedical data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractWith the improvements in data collection methods, high-quality data abounds in medical imaging and other fields. While many algorithms have emerged to detect, segment, or classify the data, few have proposed methods to re-organize or mine them. It is important to develop approaches to dive into the data and fully exploit them. This thesis tackles three dataset integration and processing problems of large-scale biomedical data: labeled dataset merging, unlabeled dataset self-supervised training, and scalable volumetric data mesh generating. The first part of this thesis addresses the problem of integrating inconsistent datasets. A large number of labeled data is required to train effective nucleus classification models. However, it is challenging to label a large-scale nucleus classification dataset, considering that high-quality labeling requires specific domain knowledge and tremendous efforts. In addition, existing public datasets are often inconsistently labeled. Due to this inconsistency, conventional models tend to work independently to infer their classification results, thus limiting the classification performance. To fully utilize all annotated datasets, we propose a method to integrate all the available annotated datasets. Specifically, we formulate the problem as a multi-label problem with missing labels. Thus, we can utilize all the datasets in a unified framework. Besides the substantial improvement compared to other methods, our result dataset also has a uniform format which can help future research on nucleus classification. The second part of this thesis addresses the problem of representation learning for nucleus instance classification. Unlike the limited scale of annotated data, unlabeled data is usually of large scale. Thus, we aim to design a self-supervised method for representation learning on unlabeled datasets to alleviate the burden of data annotation. Moreover, previous methods often downplay the contextual information that is critical for classification. To explicitly provide the information, we design a new structured input consisting of a content-rich image patch and a target instance mask. Benefiting from our structured input format, we propose Structured Triplet a triplet learning framework on unlabeled nucleus instances with customized sampling strategies. We also add two auxiliary branches to further improve its performance. Results show that our model reduces the burden of extensive labeling by fully exploiting the large-scale unlabeled data. The third part of this thesis considers the scalable mesh generation for volumetric data with multiple materials. With the improved imaging quality and the increased resolution, volumetric datasets are getting so large that the existing tools have become inadequate for processing and analyzing the data. Here we consider the problem of computing tetrahedral meshes to represent these large volumetric datasets. We propose a novel approach, called Marching Windows, that uses a moving window and a disk-swap strategy to reduce the run-time memory footprint. We also devise a new scheme that guarantees to preserve the topological structure of the original dataset, and adopt an error-guided optimization technique to improve both geometric approximation error and mesh quality. Extensive experiments show that our method is capable of processing very large volumetric datasets beyond the capability of the existing methods and producing tetrahedral meshes of high quality.
DegreeDoctor of Philosophy
SubjectMedical informatics - Data processing
Biomedical engineering - Data processing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/328917

 

DC FieldValueLanguage
dc.contributor.advisorPan, J-
dc.contributor.advisorWang, WP-
dc.contributor.authorZhang, Wenhua-
dc.contributor.author张闻华-
dc.date.accessioned2023-08-01T06:48:14Z-
dc.date.available2023-08-01T06:48:14Z-
dc.date.issued2023-
dc.identifier.citationZhang, W. [张闻华]. (2023). Integration and processing of large-scale biomedical data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/328917-
dc.description.abstractWith the improvements in data collection methods, high-quality data abounds in medical imaging and other fields. While many algorithms have emerged to detect, segment, or classify the data, few have proposed methods to re-organize or mine them. It is important to develop approaches to dive into the data and fully exploit them. This thesis tackles three dataset integration and processing problems of large-scale biomedical data: labeled dataset merging, unlabeled dataset self-supervised training, and scalable volumetric data mesh generating. The first part of this thesis addresses the problem of integrating inconsistent datasets. A large number of labeled data is required to train effective nucleus classification models. However, it is challenging to label a large-scale nucleus classification dataset, considering that high-quality labeling requires specific domain knowledge and tremendous efforts. In addition, existing public datasets are often inconsistently labeled. Due to this inconsistency, conventional models tend to work independently to infer their classification results, thus limiting the classification performance. To fully utilize all annotated datasets, we propose a method to integrate all the available annotated datasets. Specifically, we formulate the problem as a multi-label problem with missing labels. Thus, we can utilize all the datasets in a unified framework. Besides the substantial improvement compared to other methods, our result dataset also has a uniform format which can help future research on nucleus classification. The second part of this thesis addresses the problem of representation learning for nucleus instance classification. Unlike the limited scale of annotated data, unlabeled data is usually of large scale. Thus, we aim to design a self-supervised method for representation learning on unlabeled datasets to alleviate the burden of data annotation. Moreover, previous methods often downplay the contextual information that is critical for classification. To explicitly provide the information, we design a new structured input consisting of a content-rich image patch and a target instance mask. Benefiting from our structured input format, we propose Structured Triplet a triplet learning framework on unlabeled nucleus instances with customized sampling strategies. We also add two auxiliary branches to further improve its performance. Results show that our model reduces the burden of extensive labeling by fully exploiting the large-scale unlabeled data. The third part of this thesis considers the scalable mesh generation for volumetric data with multiple materials. With the improved imaging quality and the increased resolution, volumetric datasets are getting so large that the existing tools have become inadequate for processing and analyzing the data. Here we consider the problem of computing tetrahedral meshes to represent these large volumetric datasets. We propose a novel approach, called Marching Windows, that uses a moving window and a disk-swap strategy to reduce the run-time memory footprint. We also devise a new scheme that guarantees to preserve the topological structure of the original dataset, and adopt an error-guided optimization technique to improve both geometric approximation error and mesh quality. Extensive experiments show that our method is capable of processing very large volumetric datasets beyond the capability of the existing methods and producing tetrahedral meshes of high quality.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMedical informatics - Data processing-
dc.subject.lcshBiomedical engineering - Data processing-
dc.titleIntegration and processing of large-scale biomedical data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044705906303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats