File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/JSTSP.2017.2752462
- Scopus: eid_2-s2.0-85030313902
- WOS: WOS:000416226000003
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: End-to-End Neural Segmental Models for Speech Recognition
Title | End-to-End Neural Segmental Models for Speech Recognition |
---|---|
Authors | |
Keywords | Connectionist temporal classification end-to-end training segmental models multitask training |
Issue Date | 2017 |
Citation | IEEE Journal on Selected Topics in Signal Processing, 2017, v. 11, n. 8, p. 1254-1264 How to Cite? |
Abstract | Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses. |
Persistent Identifier | http://hdl.handle.net/10722/296158 |
ISSN | 2023 Impact Factor: 8.7 2023 SCImago Journal Rankings: 3.818 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Tang, Hao | - |
dc.contributor.author | Lu, Liang | - |
dc.contributor.author | Kong, Lingpeng | - |
dc.contributor.author | Gimpel, Kevin | - |
dc.contributor.author | Livescu, Karen | - |
dc.contributor.author | Dyer, Chris | - |
dc.contributor.author | Smith, Noah A. | - |
dc.contributor.author | Renals, Steve | - |
dc.date.accessioned | 2021-02-11T04:52:57Z | - |
dc.date.available | 2021-02-11T04:52:57Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | IEEE Journal on Selected Topics in Signal Processing, 2017, v. 11, n. 8, p. 1254-1264 | - |
dc.identifier.issn | 1932-4553 | - |
dc.identifier.uri | http://hdl.handle.net/10722/296158 | - |
dc.description.abstract | Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses. | - |
dc.language | eng | - |
dc.relation.ispartof | IEEE Journal on Selected Topics in Signal Processing | - |
dc.subject | Connectionist temporal classification | - |
dc.subject | end-to-end training | - |
dc.subject | segmental models | - |
dc.subject | multitask training | - |
dc.title | End-to-End Neural Segmental Models for Speech Recognition | - |
dc.type | Article | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/JSTSP.2017.2752462 | - |
dc.identifier.scopus | eid_2-s2.0-85030313902 | - |
dc.identifier.volume | 11 | - |
dc.identifier.issue | 8 | - |
dc.identifier.spage | 1254 | - |
dc.identifier.epage | 1264 | - |
dc.identifier.isi | WOS:000416226000003 | - |
dc.identifier.issnl | 1932-4553 | - |