File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Chinese population analyses based on next-generation sequencing data
Title | Chinese population analyses based on next-generation sequencing data |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Ou, M. [区敏]. (2024). Chinese population analyses based on next-generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Population analysis studies the characteristics of the population to investigate the population diversity and guide clinical management. Researchers make use of the next-generation sequencing technologies to study the populations via genomic variations more efficiently and cost-effectively. Chinese population analysis is limited due to the lack of publicly available samples with variants. This thesis introduces three population analyses based on NGS data generated from Chinese samples with different levels of demographic data.
The first analysis studied the variant distribution, population diversity, and variant severity of whole genome sequencing data of 707 high coverage samples in the Chinese Genome Database (CGDB). Disease or drug-related variants that are common in CGDB but rare in other populations were investigated to improve diagnostic accuracy and guide precision medicine. Moreover, to fill the research gap of no pathogenicity score considering the gene region and non-gene region altogether, a genome-wide pathogenicity score (Phred-scaled immutability, Pim) based on 486 healthy Chinese samples in CGDB was developed. Significant associations between the Pim and pathogenic variants, genes, and regulatory features were found.
The second analysis investigated the genomic composition of Chinese subgroups based on whole-exome sequencing data of 205 Hong Kong Cantonese (HKG). HKG is the first variant database for the population and available to the public. Unlike CGDB without the geographic location information, HKG samples are all Cantonese from Hong Kong which enables the Chinese subpopulation comparisons. The unique position of HKG among other Chinese was demonstrated via the principal component analysis. The IBD analysis revealed the population mixture and inbreeding level of HKG is the closest to the southern Chinese. The ancestral compositions of HKG, EAS, and SAS gradual change in proportion along the latitude, which aligns with the geological origin. HKG variants that are ClinVar pathogenic or drug-related according to CIViC and PharmGKB were investigated to support the diagnosis and surveillance. Distributions by impact and allele frequency have shown the importance of sample collection. By adding HKG data to the reference, improved variant imputation showed Hong Kong Cantonese can support the future investigations requiring southern Chinese genetic data.
The last analysis focused on a combined dataset of 1,582 Neurofibromatosis type 1 patients consisting of 736 cases recruited in Hong Kong and 846 reported cases from 12 studies with detailed clinical features, unlike CGDB and HKG samples either healthy or no phenotype. Neurofibromatosis type 1 is a neurocutaneous disorder caused by genetic alterations in the NF1 gene, which exhibits nearly full penetrance and affects multiple systems. This analysis extended to involve eight protein domains, two types of variants, including truncating and non-truncating variants, and 32 clinical features while previous analysis only focused on one clinical feature and multiple domains or one type of variants and a domain. This analysis has identified 133 significant associations between clinical features, age, types of variants, and protein domains, with 131 of them being novel findings. These new insights about the genotype-phenotype association promote better clinical management. |
Degree | Doctor of Philosophy |
Subject | Human population genetics - China - Data processing Nucleotide sequence - Data processing |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/353400 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ou, Min | - |
dc.contributor.author | 区敏 | - |
dc.date.accessioned | 2025-01-17T09:46:20Z | - |
dc.date.available | 2025-01-17T09:46:20Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Ou, M. [区敏]. (2024). Chinese population analyses based on next-generation sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/353400 | - |
dc.description.abstract | Population analysis studies the characteristics of the population to investigate the population diversity and guide clinical management. Researchers make use of the next-generation sequencing technologies to study the populations via genomic variations more efficiently and cost-effectively. Chinese population analysis is limited due to the lack of publicly available samples with variants. This thesis introduces three population analyses based on NGS data generated from Chinese samples with different levels of demographic data. The first analysis studied the variant distribution, population diversity, and variant severity of whole genome sequencing data of 707 high coverage samples in the Chinese Genome Database (CGDB). Disease or drug-related variants that are common in CGDB but rare in other populations were investigated to improve diagnostic accuracy and guide precision medicine. Moreover, to fill the research gap of no pathogenicity score considering the gene region and non-gene region altogether, a genome-wide pathogenicity score (Phred-scaled immutability, Pim) based on 486 healthy Chinese samples in CGDB was developed. Significant associations between the Pim and pathogenic variants, genes, and regulatory features were found. The second analysis investigated the genomic composition of Chinese subgroups based on whole-exome sequencing data of 205 Hong Kong Cantonese (HKG). HKG is the first variant database for the population and available to the public. Unlike CGDB without the geographic location information, HKG samples are all Cantonese from Hong Kong which enables the Chinese subpopulation comparisons. The unique position of HKG among other Chinese was demonstrated via the principal component analysis. The IBD analysis revealed the population mixture and inbreeding level of HKG is the closest to the southern Chinese. The ancestral compositions of HKG, EAS, and SAS gradual change in proportion along the latitude, which aligns with the geological origin. HKG variants that are ClinVar pathogenic or drug-related according to CIViC and PharmGKB were investigated to support the diagnosis and surveillance. Distributions by impact and allele frequency have shown the importance of sample collection. By adding HKG data to the reference, improved variant imputation showed Hong Kong Cantonese can support the future investigations requiring southern Chinese genetic data. The last analysis focused on a combined dataset of 1,582 Neurofibromatosis type 1 patients consisting of 736 cases recruited in Hong Kong and 846 reported cases from 12 studies with detailed clinical features, unlike CGDB and HKG samples either healthy or no phenotype. Neurofibromatosis type 1 is a neurocutaneous disorder caused by genetic alterations in the NF1 gene, which exhibits nearly full penetrance and affects multiple systems. This analysis extended to involve eight protein domains, two types of variants, including truncating and non-truncating variants, and 32 clinical features while previous analysis only focused on one clinical feature and multiple domains or one type of variants and a domain. This analysis has identified 133 significant associations between clinical features, age, types of variants, and protein domains, with 131 of them being novel findings. These new insights about the genotype-phenotype association promote better clinical management. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Human population genetics - China - Data processing | - |
dc.subject.lcsh | Nucleotide sequence - Data processing | - |
dc.title | Chinese population analyses based on next-generation sequencing data | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2025 | - |
dc.identifier.mmsid | 991044897477003414 | - |