File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Visually informed binaural audio generation without binaural audios

TitleVisually informed binaural audio generation without binaural audios
Authors
Issue Date2021
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 15480-15489 How to Cite?
AbstractStereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating visually guided stereophonic audios supervised by multi-channel audio collections. However, due to the requirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised methods in real-world scenarios. In this work, we propose PseudoBinaural, an effective pipeline that is free of binaural recordings. The key insight is to carefully build pseudo visual-stereo pairs with mono data for training. Specifically, we leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between spatial locations and received binaural audios. Then in the visual modality, corresponding visual cues of the mono data are manually placed at sound source positions to form the pairs. Compared to fully-supervised paradigms, our binaural-recording-free pipeline shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference. Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.
Persistent Identifierhttp://hdl.handle.net/10722/352266
ISSN
2023 SCImago Journal Rankings: 10.331

 

DC FieldValueLanguage
dc.contributor.authorXu, Xudong-
dc.contributor.authorZhou, Hang-
dc.contributor.authorLiu, Ziwei-
dc.contributor.authorDai, Bo-
dc.contributor.authorWang, Xiaogang-
dc.contributor.authorLin, Dahua-
dc.date.accessioned2024-12-16T03:57:41Z-
dc.date.available2024-12-16T03:57:41Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 15480-15489-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/352266-
dc.description.abstractStereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating visually guided stereophonic audios supervised by multi-channel audio collections. However, due to the requirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised methods in real-world scenarios. In this work, we propose PseudoBinaural, an effective pipeline that is free of binaural recordings. The key insight is to carefully build pseudo visual-stereo pairs with mono data for training. Specifically, we leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between spatial locations and received binaural audios. Then in the visual modality, corresponding visual cues of the mono data are manually placed at sound source positions to form the pairs. Compared to fully-supervised paradigms, our binaural-recording-free pipeline shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference. Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.titleVisually informed binaural audio generation without binaural audios-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR46437.2021.01523-
dc.identifier.scopuseid_2-s2.0-85123178071-
dc.identifier.spage15480-
dc.identifier.epage15489-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats