File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.18653/v1/d16-1180
- Scopus: eid_2-s2.0-85021649927
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Distilling an ensemble of greedy dependency parsers into one MST parser
Title | Distilling an ensemble of greedy dependency parsers into one MST parser |
---|---|
Authors | |
Issue Date | 2016 |
Citation | 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, 1-5 November 2016. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, p. 1744-1753 How to Cite? |
Abstract | © 2016 Association for Computational Linguistics. We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a “distillation” of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German. |
Persistent Identifier | http://hdl.handle.net/10722/296149 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kuncoro, Adhiguna | - |
dc.contributor.author | Ballesteros, Miguel | - |
dc.contributor.author | Kong, Lingpeng | - |
dc.contributor.author | Dyer, Chris | - |
dc.contributor.author | Smith, Noah A. | - |
dc.date.accessioned | 2021-02-11T04:52:56Z | - |
dc.date.available | 2021-02-11T04:52:56Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, 1-5 November 2016. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, p. 1744-1753 | - |
dc.identifier.uri | http://hdl.handle.net/10722/296149 | - |
dc.description.abstract | © 2016 Association for Computational Linguistics. We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a “distillation” of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German. | - |
dc.language | eng | - |
dc.relation.ispartof | Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.title | Distilling an ensemble of greedy dependency parsers into one MST parser | - |
dc.type | Conference_Paper | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.18653/v1/d16-1180 | - |
dc.identifier.scopus | eid_2-s2.0-85021649927 | - |
dc.identifier.spage | 1744 | - |
dc.identifier.epage | 1753 | - |