Multi-task self-supervised learning for disfluency detection

Wang, S; Che, W; Liu, Q; Qin, P; Liu, T; Wang, WY

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1609/aaai.v34i05.6456
Scopus: eid_2-s2.0-85106599788
WOS: WOS:000668126801078

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Multi-task self-supervised learning for disfluency detection

Title	Multi-task self-supervised learning for disfluency detection
Authors	Wang, S Che, W Liu, Q Qin, P Liu, T Wang, WY
Issue Date	2020
Publisher	AAAI press
Citation	34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, 7-12 February 2020. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 2020, p. 9193-9200 How to Cite? DOI: http://dx.doi.org/10.1609/aaai.v34i05.6456
Abstract	Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasksi. e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.
Persistent Identifier	http://hdl.handle.net/10722/322791
ISBN	9781577358350
ISI Accession Number ID	WOS:000668126801078

DC Field	Value	Language
dc.contributor.author	Wang, S	-
dc.contributor.author	Che, W	-
dc.contributor.author	Liu, Q	-
dc.contributor.author	Qin, P	-
dc.contributor.author	Liu, T	-
dc.contributor.author	Wang, WY	-
dc.date.accessioned	2022-11-16T06:31:36Z	-
dc.date.available	2022-11-16T06:31:36Z	-
dc.date.issued	2020	-
dc.identifier.citation	34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, 7-12 February 2020. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 2020, p. 9193-9200	-
dc.identifier.isbn	9781577358350	-
dc.identifier.uri	http://hdl.handle.net/10722/322791	-
dc.description.abstract	Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasksi. e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.	-
dc.language	eng	-
dc.publisher	AAAI press	-
dc.relation.ispartof	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence	-
dc.title	Multi-task self-supervised learning for disfluency detection	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1609/aaai.v34i05.6456	-
dc.identifier.scopus	eid_2-s2.0-85106599788	-
dc.identifier.hkuros	700004137	-
dc.identifier.spage	9193	-
dc.identifier.epage	9200	-
dc.identifier.isi	WOS:000668126801078	-
dc.publisher.place	Washington, D.C.	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Multi-task self-supervised learning for disfluency detection

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats