File Download
Supplementary

postgraduate thesis: Efficient neural machine translation

TitleEfficient neural machine translation
Authors
Advisors
Advisor(s):Li, VOK
Issue Date2018
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Gu, J. [顾佳涛]. (2018). Efficient neural machine translation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe dream of automatic translation that builds the communication bridge between people from different civilizations dates back to thousands of years ago. For the past decades, researchers devoted to proposing practical plans, from rule-based machine translation to statistical machine translation. In recent years, with the general success of artificial intelligence (AI) and the emergence of neural network models, a.k.a. deep learning, neural machine translation (NMT), as the new generation of machine translation framework based on sequence-to-sequence learning has achieved the state-of-the-art and even human-level translation performance on a variety of languages. The impressive achievements brought by NMT are mainly due to its deep neural network structures with massive numbers of parameters, which can be efficiently tuned from vast volume of parallel data in the order of tens or hundreds of millions of sentences. Unfortunately, in spite of their success, neural systems also bring about new challenges to machine translation, in which one of the central problems is efficiency. The efficiency issue involves two aspects: (1) NMT is data-hungry because of its vast size of parameters, which makes training a reasonable model difficult in practice for low resource cases. For instance, most of the human languages do not have enough parallel data with other languages to learn an NMT model. Moreover, documents in specialized domains such as law or medicine usually contain tons of professional translations, leading to less efficiency for NMT to learn from; (2) NMT is slow in computation compared to conventional methods due to its deep structure and limitations of the decoding algorithms. Especially the low efficiency at inference time profoundly affects the real-life application and the smoothness of the communication. In some cases, like video conference, we also hope the neural system translates at real-time which, however, is difficult for the existing NMT models. This dissertation attempts to tackle these two challenges. Contributions are twofold: (1) We address the data-efficiency challenges presented by existing NMT models and introduce insights based on the characteristics of the data, which includes (a) developing the copy-mechanism to target on rote memories in translation and general sequence-to-sequence learning; (b) using a non-parametric search-engine to guide the NMT system to perform well in special domains; (c) inventing a universal NMT system for extremely low resource languages; (d) extending the universal NMT system to be able to efficiently adapt to new languages by combing with meta-learning. (2) For the decoding-efficiency challenges, we develop novel structures and learning algorithms, including (a) recasting the decoding of NMT in a trainable manner to achieve state-of-the-art performance with less time; (b) inventing the non-autoregressive NMT system which enables translation in parallel; (c) developing the NMT model that learns to translate in real-time using reinforcement learning.
DegreeDoctor of Philosophy
SubjectMachine translating
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/265405

 

DC FieldValueLanguage
dc.contributor.advisorLi, VOK-
dc.contributor.authorGu, Jiatao-
dc.contributor.author顾佳涛-
dc.date.accessioned2018-11-29T06:22:36Z-
dc.date.available2018-11-29T06:22:36Z-
dc.date.issued2018-
dc.identifier.citationGu, J. [顾佳涛]. (2018). Efficient neural machine translation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/265405-
dc.description.abstractThe dream of automatic translation that builds the communication bridge between people from different civilizations dates back to thousands of years ago. For the past decades, researchers devoted to proposing practical plans, from rule-based machine translation to statistical machine translation. In recent years, with the general success of artificial intelligence (AI) and the emergence of neural network models, a.k.a. deep learning, neural machine translation (NMT), as the new generation of machine translation framework based on sequence-to-sequence learning has achieved the state-of-the-art and even human-level translation performance on a variety of languages. The impressive achievements brought by NMT are mainly due to its deep neural network structures with massive numbers of parameters, which can be efficiently tuned from vast volume of parallel data in the order of tens or hundreds of millions of sentences. Unfortunately, in spite of their success, neural systems also bring about new challenges to machine translation, in which one of the central problems is efficiency. The efficiency issue involves two aspects: (1) NMT is data-hungry because of its vast size of parameters, which makes training a reasonable model difficult in practice for low resource cases. For instance, most of the human languages do not have enough parallel data with other languages to learn an NMT model. Moreover, documents in specialized domains such as law or medicine usually contain tons of professional translations, leading to less efficiency for NMT to learn from; (2) NMT is slow in computation compared to conventional methods due to its deep structure and limitations of the decoding algorithms. Especially the low efficiency at inference time profoundly affects the real-life application and the smoothness of the communication. In some cases, like video conference, we also hope the neural system translates at real-time which, however, is difficult for the existing NMT models. This dissertation attempts to tackle these two challenges. Contributions are twofold: (1) We address the data-efficiency challenges presented by existing NMT models and introduce insights based on the characteristics of the data, which includes (a) developing the copy-mechanism to target on rote memories in translation and general sequence-to-sequence learning; (b) using a non-parametric search-engine to guide the NMT system to perform well in special domains; (c) inventing a universal NMT system for extremely low resource languages; (d) extending the universal NMT system to be able to efficiently adapt to new languages by combing with meta-learning. (2) For the decoding-efficiency challenges, we develop novel structures and learning algorithms, including (a) recasting the decoding of NMT in a trainable manner to achieve state-of-the-art performance with less time; (b) inventing the non-autoregressive NMT system which enables translation in parallel; (c) developing the NMT model that learns to translate in real-time using reinforcement learning.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMachine translating-
dc.titleEfficient neural machine translation-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2018-
dc.identifier.mmsid991044058182403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats