File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data
Title | Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data |
---|---|
Authors | |
Advisors | |
Issue Date | 2012 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhang, L. [张璐]. (2012). Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4852190 |
Abstract | With the completion of human genome sequencing project and the rapid development of sequencing technologies, our capacity in tackling with genetic and genomic changes that underlie human diseases has never been greater. The recent successes in identifying disease causal single nucleotide variations (SNVs) for Mendelian disorders using whole exome sequencing may bring us one step further to understand the pathogenesis of Mendelian diseases. However, many hurdles need to be overcome before the promises can become widespread reality.
In this study, we investigated various strategies and designed a toolkit named PriSNV for SNV identification and prioritization, respectively. The SNV identification pipeline including read alignment, PCR duplication removal, indel realignment, base quality score recalibration, SNV and genotype calling was examined by simulation and real sequencing data. By incorporating sequencing errors and small indels, most of the read alignment software can achieve satisfied results. Nonetheless, the reads with medium size and large indels are prone to be wrongly mapped to the reference genome due to the limitation of gap opening strategies of available read alignment software. In addition, although mapping quality can only reflect certain information of the mapping error rate, it is still important to be adopted to filter out obvious read alignment errors. The PCR duplication removal, indel realignment and base quality score recalibration have proven to be necessary and can substantially reduce the false positive SNV calls. Based on the same quality criterion, Varscan performs as the most sensitive software for SNV calling, unfortunately at mean time the false positive calls are enriched in its result.
In order to prioritize the small subset of functionally important variants from tens of thousands of variants in whole human exome, we developed a toolkit called PriSNV, a systematic prioritization pipeline that makes use of information on variant quality, gene candidacy based on the number of novel nonsynonymous mutations in a gene, gene functional annotation, known involvement in the disease or relevant pathways, and location in linkage regions. Prediction of functional impact of the coding variants is also used to aid the search for causal mutations in Mendelian disorders. For the patient affected by Chron's disease, the candidate genes can be substantially reduced from 9615 to 3 by the gene selection strategies implemented in PriSNV.
In general, our results for SNV identification can help the biologists to realize the limitation of available software and shed light on the development of new strategies for accurately identifying SNV calls in the future. PriSNV, the software we developed for SNV prioritization, can provide significant help to biologists in prioritizing SNV calls in a systematic way and reducing search space for further analysis and experimental verification. |
Degree | Master of Philosophy |
Subject | Genetic disorders. |
Dept/Program | Paediatrics and Adolescent Medicine |
Persistent Identifier | http://hdl.handle.net/10722/179999 |
HKU Library Item ID | b4852190 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Yang, W | - |
dc.contributor.advisor | Lau, YL | - |
dc.contributor.author | Zhang, Lu | - |
dc.contributor.author | 张璐 | - |
dc.date.issued | 2012 | - |
dc.identifier.citation | Zhang, L. [张璐]. (2012). Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4852190 | - |
dc.identifier.uri | http://hdl.handle.net/10722/179999 | - |
dc.description.abstract | With the completion of human genome sequencing project and the rapid development of sequencing technologies, our capacity in tackling with genetic and genomic changes that underlie human diseases has never been greater. The recent successes in identifying disease causal single nucleotide variations (SNVs) for Mendelian disorders using whole exome sequencing may bring us one step further to understand the pathogenesis of Mendelian diseases. However, many hurdles need to be overcome before the promises can become widespread reality. In this study, we investigated various strategies and designed a toolkit named PriSNV for SNV identification and prioritization, respectively. The SNV identification pipeline including read alignment, PCR duplication removal, indel realignment, base quality score recalibration, SNV and genotype calling was examined by simulation and real sequencing data. By incorporating sequencing errors and small indels, most of the read alignment software can achieve satisfied results. Nonetheless, the reads with medium size and large indels are prone to be wrongly mapped to the reference genome due to the limitation of gap opening strategies of available read alignment software. In addition, although mapping quality can only reflect certain information of the mapping error rate, it is still important to be adopted to filter out obvious read alignment errors. The PCR duplication removal, indel realignment and base quality score recalibration have proven to be necessary and can substantially reduce the false positive SNV calls. Based on the same quality criterion, Varscan performs as the most sensitive software for SNV calling, unfortunately at mean time the false positive calls are enriched in its result. In order to prioritize the small subset of functionally important variants from tens of thousands of variants in whole human exome, we developed a toolkit called PriSNV, a systematic prioritization pipeline that makes use of information on variant quality, gene candidacy based on the number of novel nonsynonymous mutations in a gene, gene functional annotation, known involvement in the disease or relevant pathways, and location in linkage regions. Prediction of functional impact of the coding variants is also used to aid the search for causal mutations in Mendelian disorders. For the patient affected by Chron's disease, the candidate genes can be substantially reduced from 9615 to 3 by the gene selection strategies implemented in PriSNV. In general, our results for SNV identification can help the biologists to realize the limitation of available software and shed light on the development of new strategies for accurately identifying SNV calls in the future. PriSNV, the software we developed for SNV prioritization, can provide significant help to biologists in prioritizing SNV calls in a systematic way and reducing search space for further analysis and experimental verification. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.source.uri | http://hub.hku.hk/bib/B48521905 | - |
dc.subject.lcsh | Genetic disorders. | - |
dc.title | Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b4852190 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Paediatrics and Adolescent Medicine | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b4852190 | - |
dc.date.hkucongregation | 2012 | - |
dc.identifier.mmsid | 991033921709703414 | - |