RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Zhang, Ziheng; Lian, Dongze; Gao, Shenghua

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s00371-020-01934-1
Scopus: eid_2-s2.0-85094638954
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Title	RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization
Authors	Zhang, Ziheng Lian, Dongze Gao, Shenghua
Keywords	Gaze tracking Human–computer interaction Multi-column CNNs
Issue Date	2021
Citation	Visual Computer, 2021, v. 37, n. 7, p. 1731-1741 How to Cite? DOI: http://dx.doi.org/10.1007/s00371-020-01934-1
Abstract	In this work, we utilize a multi-column CNNs framework to estimate the gaze point of a person sitting in front of a display from an RGB-D image of the person. Given that gaze points are determined by head poses, eyeball poses, and 3D eye positions, we propose to infer the three components separately and then integrate them for gaze point estimation. The captured depth images, however, usually contain noises and black holes which prevent us from acquiring reliable head pose and 3D eye position estimation. Therefore, we propose to refine the raw depth for 68 facial keypoints by first estimating their relative depths from RGB face images, which along with the captured raw depths are then used to solve the absolute depth for all facial keypoints through global optimization. The refined depths will provide us reliable estimation for both head pose and 3D eye position. Given that existing publicly available RGB-D gaze tracking datasets are small, we also build a new dataset for training and validating our method. To the best of our knowledge, it is the largest RGB-D gaze tracking dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the Eyediap dataset.
Persistent Identifier	http://hdl.handle.net/10722/345016
ISSN	0178-2789 2023 Impact Factor: 3.0 2023 SCImago Journal Rankings: 0.778

DC Field	Value	Language
dc.contributor.author	Zhang, Ziheng	-
dc.contributor.author	Lian, Dongze	-
dc.contributor.author	Gao, Shenghua	-
dc.date.accessioned	2024-08-15T09:24:41Z	-
dc.date.available	2024-08-15T09:24:41Z	-
dc.date.issued	2021	-
dc.identifier.citation	Visual Computer, 2021, v. 37, n. 7, p. 1731-1741	-
dc.identifier.issn	0178-2789	-
dc.identifier.uri	http://hdl.handle.net/10722/345016	-
dc.description.abstract	In this work, we utilize a multi-column CNNs framework to estimate the gaze point of a person sitting in front of a display from an RGB-D image of the person. Given that gaze points are determined by head poses, eyeball poses, and 3D eye positions, we propose to infer the three components separately and then integrate them for gaze point estimation. The captured depth images, however, usually contain noises and black holes which prevent us from acquiring reliable head pose and 3D eye position estimation. Therefore, we propose to refine the raw depth for 68 facial keypoints by first estimating their relative depths from RGB face images, which along with the captured raw depths are then used to solve the absolute depth for all facial keypoints through global optimization. The refined depths will provide us reliable estimation for both head pose and 3D eye position. Given that existing publicly available RGB-D gaze tracking datasets are small, we also build a new dataset for training and validating our method. To the best of our knowledge, it is the largest RGB-D gaze tracking dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the Eyediap dataset.	-
dc.language	eng	-
dc.relation.ispartof	Visual Computer	-
dc.subject	Gaze tracking	-
dc.subject	Human–computer interaction	-
dc.subject	Multi-column CNNs	-
dc.title	RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/s00371-020-01934-1	-
dc.identifier.scopus	eid_2-s2.0-85094638954	-
dc.identifier.volume	37	-
dc.identifier.issue	7	-
dc.identifier.spage	1731	-
dc.identifier.epage	1741	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: RGB-D-based gaze point estimation via multi-column CNNs and facial landmarks global optimization

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats