Multitask learning with CTC and segmental CRF for speech recognition

Lu, Liang; Kong, Lingpeng; Dyer, Chris; Smith, Noah A.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.21437/Interspeech.2017-71
Scopus: eid_2-s2.0-85039158481
WOS: WOS:000457505000201
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

See more details

Conference Paper: Multitask learning with CTC and segmental CRF for speech recognition

Title	Multitask learning with CTC and segmental CRF for speech recognition
Authors	Lu, Liang Kong, Lingpeng Dyer, Chris Smith, Noah A.
Keywords	Speech recognition End-to-end training CTC Segmental RNN
Issue Date	2017
Citation	INTERSPEECH 2017, Stockholm, Sweden, 20-24 August 2017. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), 2017, p. 954-958 How to Cite? DOI: http://dx.doi.org/10.21437/Interspeech.2017-71
Abstract	Copyright © 2017 ISCA. Segmental conditional random fields (SCRFs) and connection-ist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.
Persistent Identifier	http://hdl.handle.net/10722/296159
ISSN	2308-457X 2020 SCImago Journal Rankings: 0.689
ISI Accession Number ID	WOS:000457505000201

DC Field	Value	Language
dc.contributor.author	Lu, Liang	-
dc.contributor.author	Kong, Lingpeng	-
dc.contributor.author	Dyer, Chris	-
dc.contributor.author	Smith, Noah A.	-
dc.date.accessioned	2021-02-11T04:52:58Z	-
dc.date.available	2021-02-11T04:52:58Z	-
dc.date.issued	2017	-
dc.identifier.citation	INTERSPEECH 2017, Stockholm, Sweden, 20-24 August 2017. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), 2017, p. 954-958	-
dc.identifier.issn	2308-457X	-
dc.identifier.uri	http://hdl.handle.net/10722/296159	-
dc.description.abstract	Copyright © 2017 ISCA. Segmental conditional random fields (SCRFs) and connection-ist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017)	-
dc.subject	Speech recognition	-
dc.subject	End-to-end training	-
dc.subject	CTC	-
dc.subject	Segmental RNN	-
dc.title	Multitask learning with CTC and segmental CRF for speech recognition	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.21437/Interspeech.2017-71	-
dc.identifier.scopus	eid_2-s2.0-85039158481	-
dc.identifier.spage	954	-
dc.identifier.epage	958	-
dc.identifier.eissn	1990-9772	-
dc.identifier.isi	WOS:000457505000201	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Multitask learning with CTC and segmental CRF for speech recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats