Vision-infused deep audio inpainting

Zhou, H; Liu, Z; Xu, X; Luo, P; Wang, X

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICCV.2019.00037
Scopus: eid_2-s2.0-85081934248
WOS: WOS:000531438100029
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Vision-infused deep audio inpainting

Title	Vision-infused deep audio inpainting
Authors	Zhou, H Liu, Z Xu, X Luo, P Wang, X
Issue Date	2019
Publisher	Institute of Electrical and Electronics Engineers. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000149
Citation	Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October - 2 November 2019, p. 283-292 How to Cite? DOI: http://dx.doi.org/10.1109/ICCV.2019.00037
Abstract	Multi-modality perception is essential to develop interactive intelligence. In this work, we consider a new task of visual information-infused audio inpainting, i.e. synthesizing missing audio segments that correspond to their accompanying videos. We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios. Recent advances in deep semantic image inpainting could be leveraged to go beyond the limitations of traditional audio inpainting. (2) To synthesize visually indicated audio, a visual-audio joint feature space needs to be learned with synchronization of audio and video. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset [51]. Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts. More importantly, our synthesized audio segments are coherent with their video counterparts, showing the effectiveness of our proposed Vision-Infused Audio Inpainter (VIAI). Code, models, dataset and video results are available at https://github.com/Hangz-nju-cuhk/Vision-Infused-Audio-Inpainter-VIAI.
Persistent Identifier	http://hdl.handle.net/10722/284154
ISSN	1550-5499 2020 SCImago Journal Rankings: 4.133
ISI Accession Number ID	WOS:000531438100029

DC Field	Value	Language
dc.contributor.author	Zhou, H	-
dc.contributor.author	Liu, Z	-
dc.contributor.author	Xu, X	-
dc.contributor.author	Luo, P	-
dc.contributor.author	Wang, X	-
dc.date.accessioned	2020-07-20T05:56:31Z	-
dc.date.available	2020-07-20T05:56:31Z	-
dc.date.issued	2019	-
dc.identifier.citation	Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October - 2 November 2019, p. 283-292	-
dc.identifier.issn	1550-5499	-
dc.identifier.uri	http://hdl.handle.net/10722/284154	-
dc.description.abstract	Multi-modality perception is essential to develop interactive intelligence. In this work, we consider a new task of visual information-infused audio inpainting, i.e. synthesizing missing audio segments that correspond to their accompanying videos. We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios. Recent advances in deep semantic image inpainting could be leveraged to go beyond the limitations of traditional audio inpainting. (2) To synthesize visually indicated audio, a visual-audio joint feature space needs to be learned with synchronization of audio and video. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset [51]. Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts. More importantly, our synthesized audio segments are coherent with their video counterparts, showing the effectiveness of our proposed Vision-Infused Audio Inpainter (VIAI). Code, models, dataset and video results are available at https://github.com/Hangz-nju-cuhk/Vision-Infused-Audio-Inpainter-VIAI.	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000149	-
dc.relation.ispartof	IEEE International Conference on Computer Vision (ICCV) Proceedings	-
dc.rights	IEEE International Conference on Computer Vision (ICCV) Proceedings. Copyright © Institute of Electrical and Electronics Engineers.	-
dc.rights	©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.title	Vision-infused deep audio inpainting	-
dc.type	Conference_Paper	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.authority	Luo, P=rp02575	-
dc.identifier.doi	10.1109/ICCV.2019.00037	-
dc.identifier.scopus	eid_2-s2.0-85081934248	-
dc.identifier.hkuros	311013	-
dc.identifier.spage	283	-
dc.identifier.epage	292	-
dc.identifier.isi	WOS:000531438100029	-
dc.publisher.place	United States	-
dc.identifier.issnl	1550-5499	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Vision-infused deep audio inpainting

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats