File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Knowledge transfer for improved neural machine translation
Title | Knowledge transfer for improved neural machine translation |
---|---|
Authors | |
Advisors | Advisor(s):Li, VOK |
Issue Date | 2018 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Chen, Y. [陈云]. (2018). Knowledge transfer for improved neural machine translation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Being able to communicate seamlessly across human languages has long been associted with the general success of artificial intelligence. Despite great success in the field of neural machine translation (NMT) since its invention in 2014 (Sutskever et al., 2014; Bahdanau et al., 2015), translation quality and speed have not yet satisfied users, especially for resource-scarce language pairs and on-device real-time applications (Koehn and Knowles, 2017).
The standard NMT builds a translation model from source to target with maximum likelihood estimation (MLE) on parallel source-target corpora and then use approximate decoding algorithms such as greedy decoding or beam search to translate source sentences at inference time. The training and decoding procedures are independent without the interaction with other NMT models.
This thesis proposes new training and decoding strategies by interaction with other NMT models through knowledge transfer to improve neural machine translation, including the following topics:
• Improving zero-resource neural machine translation (ZNMT): We propose two pivot-based approaches to tackle this problem: a) a teacher-student framework which transfers the knowledge from a high-resource model (teacher) to a zero-resource model (student) by training the student under the supervision of the teacher; b) a multi-agent communication game in which the zero-resource model learns by playing the game with the high-resource model.
• Improving decoding efficiency for NMT: We propose a novel training strategy for training an actor-augmented decoder to optimize greedy decoding by transferring knowledge of a trained translation model.
• Improving training strategy for NMT: We propose Born Again Networks (BANs) for training NMT. In a manner reminiscent to Minsky’s Sequence of Teaching Selves (Minsky, 1991), we train a sequence of models of identical capacity with improved performance by transferring the knowledge from its previous model.
We compare our approaches with the state-of-the-art NMT systems on different datasets, such as IWSLT, Europarl and WMT, and language pairs, such as German-English, Finnish-English, French-English, and Spanish-English. Extensive experiments demonstrate that all of our proposed methods can achieve their individual goals by effectively transferring the knowledge from auxiliary models or tasks to the target NMT model.
(341 words) |
Degree | Doctor of Philosophy |
Subject | Machine translating |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/265389 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Li, VOK | - |
dc.contributor.author | Chen, Yun | - |
dc.contributor.author | 陈云 | - |
dc.date.accessioned | 2018-11-29T06:22:32Z | - |
dc.date.available | 2018-11-29T06:22:32Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Chen, Y. [陈云]. (2018). Knowledge transfer for improved neural machine translation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/265389 | - |
dc.description.abstract | Being able to communicate seamlessly across human languages has long been associted with the general success of artificial intelligence. Despite great success in the field of neural machine translation (NMT) since its invention in 2014 (Sutskever et al., 2014; Bahdanau et al., 2015), translation quality and speed have not yet satisfied users, especially for resource-scarce language pairs and on-device real-time applications (Koehn and Knowles, 2017). The standard NMT builds a translation model from source to target with maximum likelihood estimation (MLE) on parallel source-target corpora and then use approximate decoding algorithms such as greedy decoding or beam search to translate source sentences at inference time. The training and decoding procedures are independent without the interaction with other NMT models. This thesis proposes new training and decoding strategies by interaction with other NMT models through knowledge transfer to improve neural machine translation, including the following topics: • Improving zero-resource neural machine translation (ZNMT): We propose two pivot-based approaches to tackle this problem: a) a teacher-student framework which transfers the knowledge from a high-resource model (teacher) to a zero-resource model (student) by training the student under the supervision of the teacher; b) a multi-agent communication game in which the zero-resource model learns by playing the game with the high-resource model. • Improving decoding efficiency for NMT: We propose a novel training strategy for training an actor-augmented decoder to optimize greedy decoding by transferring knowledge of a trained translation model. • Improving training strategy for NMT: We propose Born Again Networks (BANs) for training NMT. In a manner reminiscent to Minsky’s Sequence of Teaching Selves (Minsky, 1991), we train a sequence of models of identical capacity with improved performance by transferring the knowledge from its previous model. We compare our approaches with the state-of-the-art NMT systems on different datasets, such as IWSLT, Europarl and WMT, and language pairs, such as German-English, Finnish-English, French-English, and Spanish-English. Extensive experiments demonstrate that all of our proposed methods can achieve their individual goals by effectively transferring the knowledge from auxiliary models or tasks to the target NMT model. (341 words) | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Machine translating | - |
dc.title | Knowledge transfer for improved neural machine translation | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044058178103414 | - |
dc.date.hkucongregation | 2018 | - |
dc.identifier.mmsid | 991044058178103414 | - |