File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

TitlePhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Authors
KeywordsBiometrics
Face and gestures
Video analysis and understanding
Issue Date2022
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 4176-4186 How to Cite?
AbstractRemote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications. Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. Furthermore, we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. One highlight is that, unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community. The codes are available at https://github.com/ZitongYu/PhysFormer.
Persistent Identifierhttp://hdl.handle.net/10722/333549
ISSN
2023 SCImago Journal Rankings: 10.331
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorYu, Zitong-
dc.contributor.authorShen, Yuming-
dc.contributor.authorShi, Jingang-
dc.contributor.authorZhao, Hengshuang-
dc.contributor.authorTorr, Philip-
dc.contributor.authorZhao, Guoying-
dc.date.accessioned2023-10-06T05:20:24Z-
dc.date.available2023-10-06T05:20:24Z-
dc.date.issued2022-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 4176-4186-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/333549-
dc.description.abstractRemote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications. Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. Furthermore, we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. One highlight is that, unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community. The codes are available at https://github.com/ZitongYu/PhysFormer.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.subjectBiometrics-
dc.subjectFace and gestures-
dc.subjectVideo analysis and understanding-
dc.titlePhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR52688.2022.00415-
dc.identifier.scopuseid_2-s2.0-85136131218-
dc.identifier.volume2022-June-
dc.identifier.spage4176-
dc.identifier.epage4186-
dc.identifier.isiWOS:000867754204043-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats