PhysFormer plus plus : Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer

Yu, ZT; Shen, YM; Shi, JA; Zhao, HS; Cui, YW; Zhang, JH; Torr, P; Zhao, GY

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s11263-023-01758-1
Scopus: eid_2-s2.0-85148065784
WOS: WOS:000933348800002
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: PhysFormer plus plus : Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer

Title	PhysFormer plus plus : Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer
Authors	Yu, ZT Shen, YM Shi, JA Zhao, HS Cui, YW Zhang, JH Torr, P Zhao, GY
Keywords	Cross-attention Periodic-attention RPPG SlowFast Temporal difference transformer
Issue Date	15-Feb-2023
Publisher	Springer
Citation	International Journal of Computer Vision, 2023, v. 131, n. 6, p. 1307-1330 How to Cite? DOI: http://dx.doi.org/10.1007/s11263-023-01758-1
Abstract	Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose two end-to-end video transformer based architectures, namely PhysFormer and PhysFormer++, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. To better exploit the temporal contextual and periodic rPPG clues, we also extend the PhysFormer to the two-pathway SlowFast based PhysFormer++ with temporal difference periodic and cross-attention transformers. Furthermore, we propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. Unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer family can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community.
Persistent Identifier	http://hdl.handle.net/10722/331841
ISSN	0920-5691 2023 Impact Factor: 11.6 2023 SCImago Journal Rankings: 6.668
ISI Accession Number ID	WOS:000933348800002

DC Field	Value	Language
dc.contributor.author	Yu, ZT	-
dc.contributor.author	Shen, YM	-
dc.contributor.author	Shi, JA	-
dc.contributor.author	Zhao, HS	-
dc.contributor.author	Cui, YW	-
dc.contributor.author	Zhang, JH	-
dc.contributor.author	Torr, P	-
dc.contributor.author	Zhao, GY	-
dc.date.accessioned	2023-09-21T06:59:23Z	-
dc.date.available	2023-09-21T06:59:23Z	-
dc.date.issued	2023-02-15	-
dc.identifier.citation	International Journal of Computer Vision, 2023, v. 131, n. 6, p. 1307-1330	-
dc.identifier.issn	0920-5691	-
dc.identifier.uri	http://hdl.handle.net/10722/331841	-
dc.description.abstract	Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose two end-to-end video transformer based architectures, namely PhysFormer and PhysFormer++, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. To better exploit the temporal contextual and periodic rPPG clues, we also extend the PhysFormer to the two-pathway SlowFast based PhysFormer++ with temporal difference periodic and cross-attention transformers. Furthermore, we propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and PhysFormer++ and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. Unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer family can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community.	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation.ispartof	International Journal of Computer Vision	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Cross-attention	-
dc.subject	Periodic-attention	-
dc.subject	RPPG	-
dc.subject	SlowFast	-
dc.subject	Temporal difference transformer	-
dc.title	PhysFormer plus plus : Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer	-
dc.type	Article	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1007/s11263-023-01758-1	-
dc.identifier.scopus	eid_2-s2.0-85148065784	-
dc.identifier.volume	131	-
dc.identifier.issue	6	-
dc.identifier.spage	1307	-
dc.identifier.epage	1330	-
dc.identifier.eissn	1573-1405	-
dc.identifier.isi	WOS:000933348800002	-
dc.publisher.place	DORDRECHT	-
dc.identifier.issnl	0920-5691	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: PhysFormer plus plus : Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats