File Download
Supplementary

postgraduate thesis: Genomics and deep learning guided discovery of ribosomal peptide biosynthetic genes : from mining to design

TitleGenomics and deep learning guided discovery of ribosomal peptide biosynthetic genes : from mining to design
Authors
Advisors
Advisor(s):Li, YPLi, XC
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhong, Z. [钟铮]. (2023). Genomics and deep learning guided discovery of ribosomal peptide biosynthetic genes : from mining to design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractRibosomally synthesized and post-translationally modified peptides (RiPPs) are a noteworthy class of natural products with remarkable chemical diversity and bioactivity. The biosynthesis of RiPPs begins with the ribosomal translation of precursor peptides, followed by post-translational modifications (PTMs) carried out by PTM enzymes, and finally, cleavage of precursor peptides by proteases. Numerous genomic tools have been developed to uncover novel RiPPs with unique chemical structures, primarily focusing on identifying PTM enzymes and biosynthetic gene clusters (BGCs). However, these approaches overlook two major challenges: the identification of unclustered biosynthetic genes, such as protease genes, and the prediction of RiPPs in fragmented metagenomes, where the complete BGCs or even individual PTM enzymes are difficult to identify. To tackle the issue of unclustered genes, a correlational network method was proposed, which integrates genomics and transcriptomics to establish correlations between different biosynthetic genes. This approach was exemplified through the identification of unclustered RiPP protease genes. Network predictions unveiled a previously undiscovered protease responsible for the maturation of paenilan. Furthermore, our approach identified the widely distributed bacterial M16B metallopeptidases as a new family of class III lanthipeptide proteases. These findings demonstrate the strength of the correlational network approach in discovering hidden lanthipeptide proteases and potentially other missing enzymes involved in natural product biosynthesis. To identify RiPPs in fragmented metagenomes, a deep learning-based approach called TrRiPP was proposed. TrRiPP exhibited higher accuracy in classifying RiPP precursors encoded in fragmented metagenomes than previous methods. By applying TrRiPP to globally distributed marine microbiomes, the diversity, abundance, and distribution of RiPPs in the ocean were investigated. These results underscore the power of TrRiPP in identifying RiPPs without PTM enzyme annotation, thereby paving the way to study RiPPs in environmental or symbiotic sources, such as exploring the ecological functions of RiPPs in the ocean. In addition to naturally occurring sources, generating novel RiPPs through the artificial design of post-translational modification enzymes and/or precursors is possible. However, this area of research is currently underdeveloped and requires further exploration. Fortunately, the rise of artificial intelligence provide new opportunities to explore RiPP design from a new perspective. To facilitate RiPP design, deep learning models were developed to generate new PTM enzymes and their corresponding precursors. These models, trained on radical S-adenosylmethionine (rSAM) enzymes and their precursors, can produce diverse artificial rSAM enzymes and accurately predict the corresponding precursors. Experiments confirmed that the predicted precursors could be modified by natural rSAM enzymes, resulting in RiPPs with more modification sites than their natural counterparts. This study demonstrates the successful application of protein language models in the design of new RiPPs, thereby opening the door to enzyme-substrate design through deep learning. This thesis delves into the intricacies of RiPP biosynthesis, with a particular focus on proteases, precursors, and PTM enzymes. The research addresses three significant challenges in bioinformatics related to RiPPs. By developing innovative methods such as biosynthetic gene correlations, deep learning-aided precursor prediction and BGC design, this thesis aims to advance our knowledge of RiPP biosynthesis and facilitate the identification of novel RiPPs.
DegreeDoctor of Philosophy
SubjectDeep learning (Machine learning)
Peptides - Synthesis
Dept/ProgramChemistry
Persistent Identifierhttp://hdl.handle.net/10722/335970

 

DC FieldValueLanguage
dc.contributor.advisorLi, YP-
dc.contributor.advisorLi, XC-
dc.contributor.authorZhong, Zheng-
dc.contributor.author钟铮-
dc.date.accessioned2023-12-29T04:05:17Z-
dc.date.available2023-12-29T04:05:17Z-
dc.date.issued2023-
dc.identifier.citationZhong, Z. [钟铮]. (2023). Genomics and deep learning guided discovery of ribosomal peptide biosynthetic genes : from mining to design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/335970-
dc.description.abstractRibosomally synthesized and post-translationally modified peptides (RiPPs) are a noteworthy class of natural products with remarkable chemical diversity and bioactivity. The biosynthesis of RiPPs begins with the ribosomal translation of precursor peptides, followed by post-translational modifications (PTMs) carried out by PTM enzymes, and finally, cleavage of precursor peptides by proteases. Numerous genomic tools have been developed to uncover novel RiPPs with unique chemical structures, primarily focusing on identifying PTM enzymes and biosynthetic gene clusters (BGCs). However, these approaches overlook two major challenges: the identification of unclustered biosynthetic genes, such as protease genes, and the prediction of RiPPs in fragmented metagenomes, where the complete BGCs or even individual PTM enzymes are difficult to identify. To tackle the issue of unclustered genes, a correlational network method was proposed, which integrates genomics and transcriptomics to establish correlations between different biosynthetic genes. This approach was exemplified through the identification of unclustered RiPP protease genes. Network predictions unveiled a previously undiscovered protease responsible for the maturation of paenilan. Furthermore, our approach identified the widely distributed bacterial M16B metallopeptidases as a new family of class III lanthipeptide proteases. These findings demonstrate the strength of the correlational network approach in discovering hidden lanthipeptide proteases and potentially other missing enzymes involved in natural product biosynthesis. To identify RiPPs in fragmented metagenomes, a deep learning-based approach called TrRiPP was proposed. TrRiPP exhibited higher accuracy in classifying RiPP precursors encoded in fragmented metagenomes than previous methods. By applying TrRiPP to globally distributed marine microbiomes, the diversity, abundance, and distribution of RiPPs in the ocean were investigated. These results underscore the power of TrRiPP in identifying RiPPs without PTM enzyme annotation, thereby paving the way to study RiPPs in environmental or symbiotic sources, such as exploring the ecological functions of RiPPs in the ocean. In addition to naturally occurring sources, generating novel RiPPs through the artificial design of post-translational modification enzymes and/or precursors is possible. However, this area of research is currently underdeveloped and requires further exploration. Fortunately, the rise of artificial intelligence provide new opportunities to explore RiPP design from a new perspective. To facilitate RiPP design, deep learning models were developed to generate new PTM enzymes and their corresponding precursors. These models, trained on radical S-adenosylmethionine (rSAM) enzymes and their precursors, can produce diverse artificial rSAM enzymes and accurately predict the corresponding precursors. Experiments confirmed that the predicted precursors could be modified by natural rSAM enzymes, resulting in RiPPs with more modification sites than their natural counterparts. This study demonstrates the successful application of protein language models in the design of new RiPPs, thereby opening the door to enzyme-substrate design through deep learning. This thesis delves into the intricacies of RiPP biosynthesis, with a particular focus on proteases, precursors, and PTM enzymes. The research addresses three significant challenges in bioinformatics related to RiPPs. By developing innovative methods such as biosynthetic gene correlations, deep learning-aided precursor prediction and BGC design, this thesis aims to advance our knowledge of RiPP biosynthesis and facilitate the identification of novel RiPPs.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDeep learning (Machine learning)-
dc.subject.lcshPeptides - Synthesis-
dc.titleGenomics and deep learning guided discovery of ribosomal peptide biosynthetic genes : from mining to design-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineChemistry-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044751041303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats