Spherical DNNs and Their Applications in 360° Images and Videos

Xu, Yanyu; Zhang, Ziheng; Gao, Shenghua

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPAMI.2021.3100259
Scopus: eid_2-s2.0-85138447189
PMID: 34314354
Find via

Supplementary

Citations:
- Scopus: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Spherical DNNs and Their Applications in 360° Images and Videos

Title	Spherical DNNs and Their Applications in 360° Images and Videos
Authors	Xu, Yanyu Zhang, Ziheng Gao, Shenghua
Keywords	360° videos gaze prediction saliency detection Spherical deep neural networks
Issue Date	2022
Citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, v. 44, n. 10, p. 7235-7252 How to Cite? DOI: http://dx.doi.org/10.1109/TPAMI.2021.3100259
Abstract	Spherical images or videos, as typical non-euclidean data, are usually stored in the form of 2D panoramas obtained through an equirectangular projection, which is neither equal area nor conformal. The distortion caused by the projection limits the performance of vanilla Deep Neural Networks (DNNs) designed for traditional euclidean data. In this paper, we design a novel Spherical Deep Neural Network (DNN) to deal with the distortion caused by the equirectangular projection. Specifically, we customize a set of components, including a spherical convolution, a spherical pooling, a spherical ConvLSTM cell and a spherical MSE loss, as the replacements of their counterparts in vanilla DNNs for spherical data. The core idea is to change the identical behavior of the conventional operations in vanilla DNNs across different feature patches so that they will be adjusted to the distortion caused by the variance of sampling rate among different feature patches. We demonstrate the effectiveness of our Spherical DNNs for saliency detection and gaze estimation in 360°360° videos. For saliency detection, we take the temporal coherence of an observer's viewing process into consideration and propose to use a Spherical U-Net and a Spherical ConvLSTM to predict the saliency maps for each frame sequentially. As for gaze prediction, we propose to leverage a Spherical Encoder Module to extract spatial panoramic features, then we combine them with the gaze trajectory feature extracted by an LSTM for future gaze prediction. To facilitate the study of the 360° videos saliency detection, we further construct a large-scale 360° video saliency detection dataset that consists of 104 360 360° videos viewed by 20+ human subjects. Comprehensive experiments validate the effectiveness of our proposed Spherical DNNs for 360 ° handwritten digit classification and sport classification, saliency detection and gaze tracking in 360° videos. We also visualize the regions contributing to the classification decisions in our proposed Spherical DNNs via the Grad-CAM technique in the classification task, and the results show that our Spherical DNNs constantly leverage reasonable and important regions for decision making, regardless the large distortions. All codes and dataset are available on https://github.com/svip-lab/SphericalDNNs.
Persistent Identifier	http://hdl.handle.net/10722/345276
ISSN	0162-8828 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158

DC Field	Value	Language
dc.contributor.author	Xu, Yanyu	-
dc.contributor.author	Zhang, Ziheng	-
dc.contributor.author	Gao, Shenghua	-
dc.date.accessioned	2024-08-15T09:26:20Z	-
dc.date.available	2024-08-15T09:26:20Z	-
dc.date.issued	2022	-
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, v. 44, n. 10, p. 7235-7252	-
dc.identifier.issn	0162-8828	-
dc.identifier.uri	http://hdl.handle.net/10722/345276	-
dc.description.abstract	Spherical images or videos, as typical non-euclidean data, are usually stored in the form of 2D panoramas obtained through an equirectangular projection, which is neither equal area nor conformal. The distortion caused by the projection limits the performance of vanilla Deep Neural Networks (DNNs) designed for traditional euclidean data. In this paper, we design a novel Spherical Deep Neural Network (DNN) to deal with the distortion caused by the equirectangular projection. Specifically, we customize a set of components, including a spherical convolution, a spherical pooling, a spherical ConvLSTM cell and a spherical MSE loss, as the replacements of their counterparts in vanilla DNNs for spherical data. The core idea is to change the identical behavior of the conventional operations in vanilla DNNs across different feature patches so that they will be adjusted to the distortion caused by the variance of sampling rate among different feature patches. We demonstrate the effectiveness of our Spherical DNNs for saliency detection and gaze estimation in 360°360° videos. For saliency detection, we take the temporal coherence of an observer's viewing process into consideration and propose to use a Spherical U-Net and a Spherical ConvLSTM to predict the saliency maps for each frame sequentially. As for gaze prediction, we propose to leverage a Spherical Encoder Module to extract spatial panoramic features, then we combine them with the gaze trajectory feature extracted by an LSTM for future gaze prediction. To facilitate the study of the 360° videos saliency detection, we further construct a large-scale 360° video saliency detection dataset that consists of 104 360 360° videos viewed by 20+ human subjects. Comprehensive experiments validate the effectiveness of our proposed Spherical DNNs for 360 ° handwritten digit classification and sport classification, saliency detection and gaze tracking in 360° videos. We also visualize the regions contributing to the classification decisions in our proposed Spherical DNNs via the Grad-CAM technique in the classification task, and the results show that our Spherical DNNs constantly leverage reasonable and important regions for decision making, regardless the large distortions. All codes and dataset are available on https://github.com/svip-lab/SphericalDNNs.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	-
dc.subject	360° videos	-
dc.subject	gaze prediction	-
dc.subject	saliency detection	-
dc.subject	Spherical deep neural networks	-
dc.title	Spherical DNNs and Their Applications in 360° Images and Videos	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TPAMI.2021.3100259	-
dc.identifier.pmid	34314354	-
dc.identifier.scopus	eid_2-s2.0-85138447189	-
dc.identifier.volume	44	-
dc.identifier.issue	10	-
dc.identifier.spage	7235	-
dc.identifier.epage	7252	-
dc.identifier.eissn	1939-3539	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Spherical DNNs and Their Applications in 360° Images and Videos

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats