File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data

TitleIdentification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data
Authors
Advisors
Advisor(s):Yang, WLau, YL
Issue Date2012
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhang, L. [张璐]. (2012). Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4852190
AbstractWith the completion of human genome sequencing project and the rapid development of sequencing technologies, our capacity in tackling with genetic and genomic changes that underlie human diseases has never been greater. The recent successes in identifying disease causal single nucleotide variations (SNVs) for Mendelian disorders using whole exome sequencing may bring us one step further to understand the pathogenesis of Mendelian diseases. However, many hurdles need to be overcome before the promises can become widespread reality. In this study, we investigated various strategies and designed a toolkit named PriSNV for SNV identification and prioritization, respectively. The SNV identification pipeline including read alignment, PCR duplication removal, indel realignment, base quality score recalibration, SNV and genotype calling was examined by simulation and real sequencing data. By incorporating sequencing errors and small indels, most of the read alignment software can achieve satisfied results. Nonetheless, the reads with medium size and large indels are prone to be wrongly mapped to the reference genome due to the limitation of gap opening strategies of available read alignment software. In addition, although mapping quality can only reflect certain information of the mapping error rate, it is still important to be adopted to filter out obvious read alignment errors. The PCR duplication removal, indel realignment and base quality score recalibration have proven to be necessary and can substantially reduce the false positive SNV calls. Based on the same quality criterion, Varscan performs as the most sensitive software for SNV calling, unfortunately at mean time the false positive calls are enriched in its result. In order to prioritize the small subset of functionally important variants from tens of thousands of variants in whole human exome, we developed a toolkit called PriSNV, a systematic prioritization pipeline that makes use of information on variant quality, gene candidacy based on the number of novel nonsynonymous mutations in a gene, gene functional annotation, known involvement in the disease or relevant pathways, and location in linkage regions. Prediction of functional impact of the coding variants is also used to aid the search for causal mutations in Mendelian disorders. For the patient affected by Chron's disease, the candidate genes can be substantially reduced from 9615 to 3 by the gene selection strategies implemented in PriSNV. In general, our results for SNV identification can help the biologists to realize the limitation of available software and shed light on the development of new strategies for accurately identifying SNV calls in the future. PriSNV, the software we developed for SNV prioritization, can provide significant help to biologists in prioritizing SNV calls in a systematic way and reducing search space for further analysis and experimental verification.
DegreeMaster of Philosophy
SubjectGenetic disorders.
Dept/ProgramPaediatrics and Adolescent Medicine

 

DC FieldValueLanguage
dc.contributor.advisorYang, W-
dc.contributor.advisorLau, YL-
dc.contributor.authorZhang, Lu-
dc.contributor.author张璐-
dc.date.issued2012-
dc.identifier.citationZhang, L. [张璐]. (2012). Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4852190-
dc.description.abstractWith the completion of human genome sequencing project and the rapid development of sequencing technologies, our capacity in tackling with genetic and genomic changes that underlie human diseases has never been greater. The recent successes in identifying disease causal single nucleotide variations (SNVs) for Mendelian disorders using whole exome sequencing may bring us one step further to understand the pathogenesis of Mendelian diseases. However, many hurdles need to be overcome before the promises can become widespread reality. In this study, we investigated various strategies and designed a toolkit named PriSNV for SNV identification and prioritization, respectively. The SNV identification pipeline including read alignment, PCR duplication removal, indel realignment, base quality score recalibration, SNV and genotype calling was examined by simulation and real sequencing data. By incorporating sequencing errors and small indels, most of the read alignment software can achieve satisfied results. Nonetheless, the reads with medium size and large indels are prone to be wrongly mapped to the reference genome due to the limitation of gap opening strategies of available read alignment software. In addition, although mapping quality can only reflect certain information of the mapping error rate, it is still important to be adopted to filter out obvious read alignment errors. The PCR duplication removal, indel realignment and base quality score recalibration have proven to be necessary and can substantially reduce the false positive SNV calls. Based on the same quality criterion, Varscan performs as the most sensitive software for SNV calling, unfortunately at mean time the false positive calls are enriched in its result. In order to prioritize the small subset of functionally important variants from tens of thousands of variants in whole human exome, we developed a toolkit called PriSNV, a systematic prioritization pipeline that makes use of information on variant quality, gene candidacy based on the number of novel nonsynonymous mutations in a gene, gene functional annotation, known involvement in the disease or relevant pathways, and location in linkage regions. Prediction of functional impact of the coding variants is also used to aid the search for causal mutations in Mendelian disorders. For the patient affected by Chron's disease, the candidate genes can be substantially reduced from 9615 to 3 by the gene selection strategies implemented in PriSNV. In general, our results for SNV identification can help the biologists to realize the limitation of available software and shed light on the development of new strategies for accurately identifying SNV calls in the future. PriSNV, the software we developed for SNV prioritization, can provide significant help to biologists in prioritizing SNV calls in a systematic way and reducing search space for further analysis and experimental verification.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.source.urihttp://hub.hku.hk/bib/B48521905-
dc.subject.lcshGenetic disorders.-
dc.titleIdentification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data-
dc.typePG_Thesis-
dc.identifier.hkulb4852190-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplinePaediatrics and Adolescent Medicine-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b4852190-
dc.date.hkucongregation2012-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats