File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: Universal Neural Machine Translation for Extremely Low Resource Languages
Title | Universal Neural Machine Translation for Extremely Low Resource Languages |
---|---|
Authors | |
Issue Date | 2018 |
Publisher | Association for Computational Linguistic. |
Citation | The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, Louisiana, 1-6 June 2018. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), p. 344-354 How to Cite? |
Abstract | In this paper, we propose a new universal machine translation approach focusing on languages with a limited
amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and
sentence level representations across multiple source languages into one target language. The lexical part is
shared through a Universal Lexical Representation to support multi-lingual word-level sharing. The sentencelevel
sharing is represented by a model of experts from all source languages that share the source encoders with
all other languages. This enables the low-resource language to utilize the lexical and sentence representations
of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016
using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses
multi-lingual training and back-translation. Furthermore, we show that the proposed approach can achieve
almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot
setting. |
Description | Oral: Machine Translation 1 |
Persistent Identifier | http://hdl.handle.net/10722/261951 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Gu, J | - |
dc.contributor.author | Hassan, H | - |
dc.contributor.author | Devlin, J | - |
dc.contributor.author | Li, VOK | - |
dc.date.accessioned | 2018-09-28T04:50:51Z | - |
dc.date.available | 2018-09-28T04:50:51Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), New Orleans, Louisiana, 1-6 June 2018. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), p. 344-354 | - |
dc.identifier.uri | http://hdl.handle.net/10722/261951 | - |
dc.description | Oral: Machine Translation 1 | - |
dc.description.abstract | In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multi-lingual word-level sharing. The sentencelevel sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multi-lingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting. | - |
dc.language | eng | - |
dc.publisher | Association for Computational Linguistic. | - |
dc.relation.ispartof | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) | - |
dc.title | Universal Neural Machine Translation for Extremely Low Resource Languages | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Li, VOK: vli@eee.hku.hk | - |
dc.identifier.authority | Li, VOK=rp00150 | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.doi | 10.18653/v1/N18-1032 | - |
dc.identifier.hkuros | 292168 | - |
dc.identifier.hkuros | 306542 | - |
dc.identifier.volume | 1 | - |
dc.identifier.spage | 344 | - |
dc.identifier.epage | 354 | - |
dc.publisher.place | New Orleans, Louisiana | - |