File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Multitask learning with CTC and segmental CRF for speech recognition

TitleMultitask learning with CTC and segmental CRF for speech recognition
Authors
KeywordsSpeech recognition
End-to-end training
CTC
Segmental RNN
Issue Date2017
Citation
INTERSPEECH 2017, Stockholm, Sweden, 20-24 August 2017. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), 2017, p. 954-958 How to Cite?
AbstractCopyright © 2017 ISCA. Segmental conditional random fields (SCRFs) and connection-ist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.
Persistent Identifierhttp://hdl.handle.net/10722/296159
ISSN
2020 SCImago Journal Rankings: 0.689
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLu, Liang-
dc.contributor.authorKong, Lingpeng-
dc.contributor.authorDyer, Chris-
dc.contributor.authorSmith, Noah A.-
dc.date.accessioned2021-02-11T04:52:58Z-
dc.date.available2021-02-11T04:52:58Z-
dc.date.issued2017-
dc.identifier.citationINTERSPEECH 2017, Stockholm, Sweden, 20-24 August 2017. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017), 2017, p. 954-958-
dc.identifier.issn2308-457X-
dc.identifier.urihttp://hdl.handle.net/10722/296159-
dc.description.abstractCopyright © 2017 ISCA. Segmental conditional random fields (SCRFs) and connection-ist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.-
dc.languageeng-
dc.relation.ispartofProceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017)-
dc.subjectSpeech recognition-
dc.subjectEnd-to-end training-
dc.subjectCTC-
dc.subjectSegmental RNN-
dc.titleMultitask learning with CTC and segmental CRF for speech recognition-
dc.typeConference_Paper-
dc.description.naturelink_to_OA_fulltext-
dc.identifier.doi10.21437/Interspeech.2017-71-
dc.identifier.scopuseid_2-s2.0-85039158481-
dc.identifier.spage954-
dc.identifier.epage958-
dc.identifier.eissn1990-9772-
dc.identifier.isiWOS:000457505000201-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats