File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/ICCV.2019.00037
- Scopus: eid_2-s2.0-85081934248
- WOS: WOS:000531438100029
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Vision-infused deep audio inpainting
Title | Vision-infused deep audio inpainting |
---|---|
Authors | |
Issue Date | 2019 |
Publisher | Institute of Electrical and Electronics Engineers. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000149 |
Citation | Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October - 2 November 2019, p. 283-292 How to Cite? |
Abstract | Multi-modality perception is essential to develop interactive intelligence. In this work, we consider a new
task of visual information-infused audio inpainting, i.e. synthesizing missing audio segments that correspond to their accompanying videos. We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios. Recent advances in deep semantic image inpainting could be leveraged to go beyond the limitations of traditional audio inpainting. (2) To synthesize visually indicated audio, a visual-audio joint feature space needs to be learned with synchronization of audio and video. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset [51]. Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts. More importantly, our synthesized audio segments are
coherent with their video counterparts, showing the effectiveness of our proposed Vision-Infused Audio Inpainter (VIAI). Code, models, dataset and video results are available at https://github.com/Hangz-nju-cuhk/Vision-Infused-Audio-Inpainter-VIAI. |
Persistent Identifier | http://hdl.handle.net/10722/284154 |
ISSN | 2020 SCImago Journal Rankings: 4.133 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhou, H | - |
dc.contributor.author | Liu, Z | - |
dc.contributor.author | Xu, X | - |
dc.contributor.author | Luo, P | - |
dc.contributor.author | Wang, X | - |
dc.date.accessioned | 2020-07-20T05:56:31Z | - |
dc.date.available | 2020-07-20T05:56:31Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October - 2 November 2019, p. 283-292 | - |
dc.identifier.issn | 1550-5499 | - |
dc.identifier.uri | http://hdl.handle.net/10722/284154 | - |
dc.description.abstract | Multi-modality perception is essential to develop interactive intelligence. In this work, we consider a new task of visual information-infused audio inpainting, i.e. synthesizing missing audio segments that correspond to their accompanying videos. We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios. Recent advances in deep semantic image inpainting could be leveraged to go beyond the limitations of traditional audio inpainting. (2) To synthesize visually indicated audio, a visual-audio joint feature space needs to be learned with synchronization of audio and video. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset [51]. Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts. More importantly, our synthesized audio segments are coherent with their video counterparts, showing the effectiveness of our proposed Vision-Infused Audio Inpainter (VIAI). Code, models, dataset and video results are available at https://github.com/Hangz-nju-cuhk/Vision-Infused-Audio-Inpainter-VIAI. | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000149 | - |
dc.relation.ispartof | IEEE International Conference on Computer Vision (ICCV) Proceedings | - |
dc.rights | IEEE International Conference on Computer Vision (ICCV) Proceedings. Copyright © Institute of Electrical and Electronics Engineers. | - |
dc.rights | ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | - |
dc.title | Vision-infused deep audio inpainting | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Luo, P: pluo@hku.hk | - |
dc.identifier.authority | Luo, P=rp02575 | - |
dc.identifier.doi | 10.1109/ICCV.2019.00037 | - |
dc.identifier.scopus | eid_2-s2.0-85081934248 | - |
dc.identifier.hkuros | 311013 | - |
dc.identifier.spage | 283 | - |
dc.identifier.epage | 292 | - |
dc.identifier.isi | WOS:000531438100029 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 1550-5499 | - |