PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

Yu, Zitong; Shen, Yuming; Shi, Jingang; Zhao, Hengshuang; Torr, Philip; Zhao, Guoying

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR52688.2022.00415
Scopus: eid_2-s2.0-85136131218
WOS: WOS:000867754204043
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

Title	PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Authors	Yu, Zitong Shen, Yuming Shi, Jingang Zhao, Hengshuang Torr, Philip Zhao, Guoying
Keywords	Biometrics Face and gestures Video analysis and understanding
Issue Date	2022
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 4176-4186 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR52688.2022.00415
Abstract	Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications. Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. Furthermore, we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. One highlight is that, unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community. The codes are available at https://github.com/ZitongYu/PhysFormer.
Persistent Identifier	http://hdl.handle.net/10722/333549
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331
ISI Accession Number ID	WOS:000867754204043

DC Field	Value	Language
dc.contributor.author	Yu, Zitong	-
dc.contributor.author	Shen, Yuming	-
dc.contributor.author	Shi, Jingang	-
dc.contributor.author	Zhao, Hengshuang	-
dc.contributor.author	Torr, Philip	-
dc.contributor.author	Zhao, Guoying	-
dc.date.accessioned	2023-10-06T05:20:24Z	-
dc.date.available	2023-10-06T05:20:24Z	-
dc.date.issued	2022	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 4176-4186	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/333549	-
dc.description.abstract	Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications. Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. Furthermore, we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. One highlight is that, unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community. The codes are available at https://github.com/ZitongYu/PhysFormer.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.subject	Biometrics	-
dc.subject	Face and gestures	-
dc.subject	Video analysis and understanding	-
dc.title	PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR52688.2022.00415	-
dc.identifier.scopus	eid_2-s2.0-85136131218	-
dc.identifier.volume	2022-June	-
dc.identifier.spage	4176	-
dc.identifier.epage	4186	-
dc.identifier.isi	WOS:000867754204043	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats