File Download
Supplementary

postgraduate thesis: Interpretable deep learning methods for biological sequencing data

TitleInterpretable deep learning methods for biological sequencing data
Authors
Advisors
Advisor(s):Ho, JWKHuang, Y
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zheng, W. [鄭煒忠]. (2024). Interpretable deep learning methods for biological sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractIn recent years, advances in sequencing techniques have revolutionized our ability to assess the activity and heterogeneity of genetic sequences and to profile the state of individual cells with unprecedented resolution. Deep learning is a powerful technique to harness patterns in the data without explicitly requiring feature extraction, which allows accurate predictors to be built using large and complex data sets, including sequencing data in a variety of molecular and cell biology studies. Nonetheless, the prediction mechanisms encoded in these deep learning models are not transparent, making it challenging to interpret the underlying biological regulatory mechanisms. This thesis presents our developed interpretable deep learning methods for a wide range of biological sequencing data, emphasizing the interpretation of biological insights over solely making predictions. We showcase three different applications of our new methods: 1. RNA translation rate prediction. We introduce a multi-task translation rate predictor, MTtrans. This method aims to identify genuine regulatory motifs in the 5’ untranslated regions (UTRs) that are generalizable across different techniques. Learning the patterns from synthetic and endogenous 5’ UTR sequences measured by 3 different techniques, MTtrans discovered translation regulatory motifs whose effect could be experimentally validated. Lastly, we demonstrated how our GRU interpretation method could uncover the reasoning process and pinpoint local regulatory signals. 2. CRISPR editing outcome prediction. We introduce a flexible template-free CRISPR editing outcome prediction tool, inDecay. This deep learning model considers the activity of DNA repair pathways to capture the cell-type variability of the repair profile. With highly informative per-indel features and parameter-efficient multi-stage architecture, inDecay demonstrated its superior performance across different experimental settings, which will largely expand the use of this method. 3. Prediction of transcriptional response to perturbation. We highlight our multi-conditional generative model for studying the induced differentiation of stem cells. Our model takes a single-cell expression, then embeds the influence of induction conditions in the latent space, and finally predicts the short-term expression changes. By taking prediction as a continuous shift in the cell-cell connectivity graph, our model becomes a navigational system of the differentiation landscape, directing stem cells to a particular differentiated state given a perturbation combination. We demonstrated the use of this model in identifying new TFs for the differentiation of hematopoietic and neural cells. Together, the research findings contribute to the development of interpretable deep learning methods for modeling sequence and expression data. By using interpretable models, we gain insight into previously hidden regulatory patterns and processes, opening up new ways to uncover and understand these biological mechanisms. Furthermore, our methods could have practical applications in real-world experiments, such as optimizing 5’UTR sequences for protein products, designing guide RNAs for precise genome editing, and screening transcription factors for differentiating stem cells to target specific cell types.
DegreeDoctor of Philosophy
SubjectBioinformatics
Deep learning (Machine learning)
Dept/ProgramBiomedical Sciences
Persistent Identifierhttp://hdl.handle.net/10722/342901

 

DC FieldValueLanguage
dc.contributor.advisorHo, JWK-
dc.contributor.advisorHuang, Y-
dc.contributor.authorZheng, Weizhong-
dc.contributor.author鄭煒忠-
dc.date.accessioned2024-05-07T01:22:18Z-
dc.date.available2024-05-07T01:22:18Z-
dc.date.issued2024-
dc.identifier.citationZheng, W. [鄭煒忠]. (2024). Interpretable deep learning methods for biological sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/342901-
dc.description.abstractIn recent years, advances in sequencing techniques have revolutionized our ability to assess the activity and heterogeneity of genetic sequences and to profile the state of individual cells with unprecedented resolution. Deep learning is a powerful technique to harness patterns in the data without explicitly requiring feature extraction, which allows accurate predictors to be built using large and complex data sets, including sequencing data in a variety of molecular and cell biology studies. Nonetheless, the prediction mechanisms encoded in these deep learning models are not transparent, making it challenging to interpret the underlying biological regulatory mechanisms. This thesis presents our developed interpretable deep learning methods for a wide range of biological sequencing data, emphasizing the interpretation of biological insights over solely making predictions. We showcase three different applications of our new methods: 1. RNA translation rate prediction. We introduce a multi-task translation rate predictor, MTtrans. This method aims to identify genuine regulatory motifs in the 5’ untranslated regions (UTRs) that are generalizable across different techniques. Learning the patterns from synthetic and endogenous 5’ UTR sequences measured by 3 different techniques, MTtrans discovered translation regulatory motifs whose effect could be experimentally validated. Lastly, we demonstrated how our GRU interpretation method could uncover the reasoning process and pinpoint local regulatory signals. 2. CRISPR editing outcome prediction. We introduce a flexible template-free CRISPR editing outcome prediction tool, inDecay. This deep learning model considers the activity of DNA repair pathways to capture the cell-type variability of the repair profile. With highly informative per-indel features and parameter-efficient multi-stage architecture, inDecay demonstrated its superior performance across different experimental settings, which will largely expand the use of this method. 3. Prediction of transcriptional response to perturbation. We highlight our multi-conditional generative model for studying the induced differentiation of stem cells. Our model takes a single-cell expression, then embeds the influence of induction conditions in the latent space, and finally predicts the short-term expression changes. By taking prediction as a continuous shift in the cell-cell connectivity graph, our model becomes a navigational system of the differentiation landscape, directing stem cells to a particular differentiated state given a perturbation combination. We demonstrated the use of this model in identifying new TFs for the differentiation of hematopoietic and neural cells. Together, the research findings contribute to the development of interpretable deep learning methods for modeling sequence and expression data. By using interpretable models, we gain insight into previously hidden regulatory patterns and processes, opening up new ways to uncover and understand these biological mechanisms. Furthermore, our methods could have practical applications in real-world experiments, such as optimizing 5’UTR sequences for protein products, designing guide RNAs for precise genome editing, and screening transcription factors for differentiating stem cells to target specific cell types. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshBioinformatics-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleInterpretable deep learning methods for biological sequencing data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineBiomedical Sciences-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044791815403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats