File Download
Supplementary

postgraduate thesis: Facial expression : from recognition to animation

TitleFacial expression : from recognition to animation
Authors
Advisors
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Fan, Y. [樊應若]. (2022). Facial expression : from recognition to animation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractFacial expression is one of the most important human behaviors (body gesture, gaze, voice, etc.) that convey the emotional states of human beings. Facial expression recognition and animation systems are designed to automatically recognize and synthesize the facial expression of emotion. They admit a variety of applications in human-computer interaction, healthcare, virtual assistants, telepresence and computer games. In this thesis, we focus on the recognition and the animation of facial expressions, which are two important research directions in computer vision and graphics. Firstly, we propose the facial action unit (AU) intensity estimation system capable of learning the AU co-occurrence relationship automatically. The AUs are subtle facial muscle movements, which are fundamental elements of the facial expression of emotion and can be used for the recognition and synthesis of facial expressions. Most existing AU intensity estimation algorithms manually pre-define the AU co-occurrence relationship. In contrast, our proposed model is able to capture the AU co-occurrence relationship automatically via the dynamic graph convolution. We empirically show that the proposed algorithm performs favorably against the previous methods. Secondly, we investigate knowledge distillation (KD) for the AU intensity estimation task and develop a lightweight model. Recent deep learning-based AU intensity estimation methods still face limitations in terms of the requirements of high computational resources. To address this limitation, we propose to distill deep structural facial relationships, including the region-wise and channel-wise relationships, in a heatmap regression framework. The proposed model achieves comparable performance compared to the state-of-the-art methods, while requiring substantially fewer parameters and much less computational cost. Thirdly, we develop a speech-driven facial expression animation system based on audio and text signals. Synthesizing realistic facial expressions for the whole face from speech has not been much explored. To improve the expressiveness of the upper facial expression, we exploit the contextual text representations extracted from the powerful pre-trained language model. Our model is shown to generate more expressive facial expressions compared to other baselines. Lastly, we propose a Transformer-based model for speech-driven facial expression animation, which achieves highly realistic animation of facial expressions, as well as accurate lip synchronization. To tackle the problem of limited context and data scarcity, we design a Transformer-based architecture to encode the long-term audio context and utilize the self-supervised speech representations. Comprehensive experimental results and analyses on two 3D audio-visual datasets demonstrate that the proposed method achieves the state-of-the-art performance in terms of realism and lip synchronization.
DegreeDoctor of Philosophy
SubjectFacial expression - Computer simulation
Computer animation
Facial expression in art
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/322850

 

DC FieldValueLanguage
dc.contributor.advisorKomura, T-
dc.contributor.advisorLuo, P-
dc.contributor.advisorWang, WP-
dc.contributor.authorFan, Yingruo-
dc.contributor.author樊應若-
dc.date.accessioned2022-11-18T10:41:06Z-
dc.date.available2022-11-18T10:41:06Z-
dc.date.issued2022-
dc.identifier.citationFan, Y. [樊應若]. (2022). Facial expression : from recognition to animation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/322850-
dc.description.abstractFacial expression is one of the most important human behaviors (body gesture, gaze, voice, etc.) that convey the emotional states of human beings. Facial expression recognition and animation systems are designed to automatically recognize and synthesize the facial expression of emotion. They admit a variety of applications in human-computer interaction, healthcare, virtual assistants, telepresence and computer games. In this thesis, we focus on the recognition and the animation of facial expressions, which are two important research directions in computer vision and graphics. Firstly, we propose the facial action unit (AU) intensity estimation system capable of learning the AU co-occurrence relationship automatically. The AUs are subtle facial muscle movements, which are fundamental elements of the facial expression of emotion and can be used for the recognition and synthesis of facial expressions. Most existing AU intensity estimation algorithms manually pre-define the AU co-occurrence relationship. In contrast, our proposed model is able to capture the AU co-occurrence relationship automatically via the dynamic graph convolution. We empirically show that the proposed algorithm performs favorably against the previous methods. Secondly, we investigate knowledge distillation (KD) for the AU intensity estimation task and develop a lightweight model. Recent deep learning-based AU intensity estimation methods still face limitations in terms of the requirements of high computational resources. To address this limitation, we propose to distill deep structural facial relationships, including the region-wise and channel-wise relationships, in a heatmap regression framework. The proposed model achieves comparable performance compared to the state-of-the-art methods, while requiring substantially fewer parameters and much less computational cost. Thirdly, we develop a speech-driven facial expression animation system based on audio and text signals. Synthesizing realistic facial expressions for the whole face from speech has not been much explored. To improve the expressiveness of the upper facial expression, we exploit the contextual text representations extracted from the powerful pre-trained language model. Our model is shown to generate more expressive facial expressions compared to other baselines. Lastly, we propose a Transformer-based model for speech-driven facial expression animation, which achieves highly realistic animation of facial expressions, as well as accurate lip synchronization. To tackle the problem of limited context and data scarcity, we design a Transformer-based architecture to encode the long-term audio context and utilize the self-supervised speech representations. Comprehensive experimental results and analyses on two 3D audio-visual datasets demonstrate that the proposed method achieves the state-of-the-art performance in terms of realism and lip synchronization.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshFacial expression - Computer simulation-
dc.subject.lcshComputer animation-
dc.subject.lcshFacial expression in art-
dc.titleFacial expression : from recognition to animation-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044609105703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats