File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Learning local and global context from sequence and matrix inputs

TitleLearning local and global context from sequence and matrix inputs
Authors
Advisors
Advisor(s):Yu, Y
Issue Date2018
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Li, Z. [李鎮]. (2018). Learning local and global context from sequence and matrix inputs. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractLearning local and global context from sequence and matrix inputs plays an extremely significate role for bioinformatics and computer vision problems. By taking advantage of big data and appealing data-driven methods, in this thesis, we propose novel pipelines for learning local, global context and integrated local-global context. At first, a novel deep learning pipeline is proposed for protein secondary structure prediction. Specifically, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance. Inspired by the success of previous sequence context learning, a new proposed base-caller, WaveNano, are presented to improve the Oxford MinION nanopore basecalling. We further show that the indel (insertions and deletions, mainly cause the high error rate) issue can be significantly reduced via accurate labeling of nucleotide and move labels directly from the raw signal. Our bi-directional WaveNet model with residual blocks and skip connections is able to capture the extremely long dependency in the raw sequential signal. Taking the predicted move as the segmentation guidance, we employ the Viterbi decoding to obtain the final basecalling results from the smoothed nucleotide probability matrix. Our proposed base-caller, WaveNano, achieves state-of-the-art performance on real MinION sequencing data from Lambda phage. Though protein contacts contain key information for protein structure understanding, the predicted contacts based on existing methods learning context form matrix inputs are still of low quality, especially for membrane proteins (MPs) with lack of sufficient solved structures. A low-cost, high-throughput deep transfer learning method is proposed to first predict MP contacts by learning from non-membrane proteins (non-MPs) using integrated local and global context from amino acid sequential and matrix co-evolutional features, and then predict 3D structure models using predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has much better contact prediction accuracy than existing ones. A rigorous blind test in CAMEO and human multi-pass MPs test verify the priority of our method. Finally, we address the RGB-D scene labeling problem, which generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Our proposed pipeline solves this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. At last, the fused contextual representation is concatenated with the local convolutional features extracted from the photometric channels in order to improve the accuracy of fine-scale semantic labeling. Our proposed model has set a new state-of-the-art on three main datasets.
DegreeDoctor of Philosophy
SubjectBiometric identification
Nucleotide sequence
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/265387

 

DC FieldValueLanguage
dc.contributor.advisorYu, Y-
dc.contributor.authorLi, Zhen-
dc.contributor.author李鎮-
dc.date.accessioned2018-11-29T06:22:32Z-
dc.date.available2018-11-29T06:22:32Z-
dc.date.issued2018-
dc.identifier.citationLi, Z. [李鎮]. (2018). Learning local and global context from sequence and matrix inputs. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/265387-
dc.description.abstractLearning local and global context from sequence and matrix inputs plays an extremely significate role for bioinformatics and computer vision problems. By taking advantage of big data and appealing data-driven methods, in this thesis, we propose novel pipelines for learning local, global context and integrated local-global context. At first, a novel deep learning pipeline is proposed for protein secondary structure prediction. Specifically, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance. Inspired by the success of previous sequence context learning, a new proposed base-caller, WaveNano, are presented to improve the Oxford MinION nanopore basecalling. We further show that the indel (insertions and deletions, mainly cause the high error rate) issue can be significantly reduced via accurate labeling of nucleotide and move labels directly from the raw signal. Our bi-directional WaveNet model with residual blocks and skip connections is able to capture the extremely long dependency in the raw sequential signal. Taking the predicted move as the segmentation guidance, we employ the Viterbi decoding to obtain the final basecalling results from the smoothed nucleotide probability matrix. Our proposed base-caller, WaveNano, achieves state-of-the-art performance on real MinION sequencing data from Lambda phage. Though protein contacts contain key information for protein structure understanding, the predicted contacts based on existing methods learning context form matrix inputs are still of low quality, especially for membrane proteins (MPs) with lack of sufficient solved structures. A low-cost, high-throughput deep transfer learning method is proposed to first predict MP contacts by learning from non-membrane proteins (non-MPs) using integrated local and global context from amino acid sequential and matrix co-evolutional features, and then predict 3D structure models using predicted contacts as distance restraints. Tested on 510 non-redundant MPs, our method has much better contact prediction accuracy than existing ones. A rigorous blind test in CAMEO and human multi-pass MPs test verify the priority of our method. Finally, we address the RGB-D scene labeling problem, which generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Our proposed pipeline solves this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. At last, the fused contextual representation is concatenated with the local convolutional features extracted from the photometric channels in order to improve the accuracy of fine-scale semantic labeling. Our proposed model has set a new state-of-the-art on three main datasets. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshBiometric identification-
dc.subject.lcshNucleotide sequence-
dc.titleLearning local and global context from sequence and matrix inputs-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044058178003414-
dc.date.hkucongregation2018-
dc.identifier.mmsid991044058178003414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats