Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Liu, Qi; Ye, Zihuiwen; Yu, Tao; Blunsom, Phil; Song, Linfeng

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.48550/arXiv.2210.12096

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Title	Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play
Authors	Liu, Qi Ye, Zihuiwen Yu, Tao Blunsom, Phil Song, Linfeng
Issue Date	1-Oct-2022
Abstract	The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions to adapt the model to new databases. We first design a SQL-to-text model conditioned on a sampled goal query, which represents a user's intent, that then converses with a text-to-SQL semantic parser to generate new interactions. We then filter the synthesized interactions and retrain the models with the augmented data. We find that self-play improves the accuracy of a strong baseline on SParC and CoSQL, two widely used cross-domain text-to-SQL datasets. Our analysis shows that self-play simulates various conversational thematic relations, enhances cross-domain generalization and improves beam-search.
Persistent Identifier	http://hdl.handle.net/10722/333714

DC Field	Value	Language
dc.contributor.author	Liu, Qi	-
dc.contributor.author	Ye, Zihuiwen	-
dc.contributor.author	Yu, Tao	-
dc.contributor.author	Blunsom, Phil	-
dc.contributor.author	Song, Linfeng	-
dc.date.accessioned	2023-10-06T08:38:29Z	-
dc.date.available	2023-10-06T08:38:29Z	-
dc.date.issued	2022-10-01	-
dc.identifier.uri	http://hdl.handle.net/10722/333714	-
dc.description.abstract	<p>The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions to adapt the model to new databases. We first design a SQL-to-text model conditioned on a sampled goal query, which represents a user's intent, that then converses with a text-to-SQL semantic parser to generate new interactions. We then filter the synthesized interactions and retrain the models with the augmented data. We find that self-play improves the accuracy of a strong baseline on SParC and CoSQL, two widely used cross-domain text-to-SQL datasets. Our analysis shows that self-play simulates various conversational thematic relations, enhances cross-domain generalization and improves beam-search.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	The 2022 Conference on Empirical Methods in Natural Language Processing (07/12/2022-11/12/2022, Abu Dhabi)	-
dc.title	Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.48550/arXiv.2210.12096	-
dc.identifier.spage	5608	-
dc.identifier.epage	5620	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats