File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Facial expression : from recognition to animation
Title | Facial expression : from recognition to animation |
---|---|
Authors | |
Advisors | |
Issue Date | 2022 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Fan, Y. [樊應若]. (2022). Facial expression : from recognition to animation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Facial expression is one of the most important human behaviors (body gesture, gaze, voice, etc.) that convey the emotional states of human beings. Facial expression recognition and animation systems are designed to automatically recognize and synthesize the facial expression of emotion. They admit a variety of applications in human-computer interaction, healthcare, virtual assistants, telepresence and computer games. In this thesis, we focus on the recognition and the animation of facial expressions, which are two important research directions in computer vision and graphics.
Firstly, we propose the facial action unit (AU) intensity estimation system capable of learning the AU co-occurrence relationship automatically. The AUs are subtle facial muscle movements, which are fundamental elements of the facial expression of emotion and can be used for the recognition and synthesis of facial expressions. Most existing AU intensity estimation algorithms manually pre-define the AU co-occurrence relationship. In contrast, our proposed model is able to capture the AU co-occurrence relationship automatically via the dynamic graph convolution. We empirically show that the proposed algorithm performs favorably against the previous methods.
Secondly, we investigate knowledge distillation (KD) for the AU intensity estimation task and develop a lightweight model. Recent deep learning-based AU intensity estimation methods still face limitations in terms of the requirements of high computational resources. To address this limitation, we propose to distill deep structural facial relationships, including the region-wise and channel-wise relationships, in a heatmap regression framework. The proposed model achieves comparable performance compared to the state-of-the-art methods, while requiring substantially fewer parameters and much less computational cost.
Thirdly, we develop a speech-driven facial expression animation system based on audio and text signals. Synthesizing realistic facial expressions for the whole face from speech has not been much explored. To improve the expressiveness of the upper facial expression, we exploit the contextual text representations extracted from the powerful pre-trained language model. Our model is shown to generate more expressive facial expressions compared to other baselines.
Lastly, we propose a Transformer-based model for speech-driven facial expression animation, which achieves highly realistic animation of facial expressions, as well as accurate lip synchronization. To tackle the problem of limited context and data scarcity, we design a Transformer-based architecture to encode the long-term audio context and utilize the self-supervised speech representations. Comprehensive experimental results and analyses on two 3D audio-visual datasets demonstrate that the proposed method achieves the state-of-the-art performance in terms of realism and lip synchronization. |
Degree | Doctor of Philosophy |
Subject | Facial expression - Computer simulation Computer animation Facial expression in art |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/322850 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Komura, T | - |
dc.contributor.advisor | Luo, P | - |
dc.contributor.advisor | Wang, WP | - |
dc.contributor.author | Fan, Yingruo | - |
dc.contributor.author | 樊應若 | - |
dc.date.accessioned | 2022-11-18T10:41:06Z | - |
dc.date.available | 2022-11-18T10:41:06Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Fan, Y. [樊應若]. (2022). Facial expression : from recognition to animation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/322850 | - |
dc.description.abstract | Facial expression is one of the most important human behaviors (body gesture, gaze, voice, etc.) that convey the emotional states of human beings. Facial expression recognition and animation systems are designed to automatically recognize and synthesize the facial expression of emotion. They admit a variety of applications in human-computer interaction, healthcare, virtual assistants, telepresence and computer games. In this thesis, we focus on the recognition and the animation of facial expressions, which are two important research directions in computer vision and graphics. Firstly, we propose the facial action unit (AU) intensity estimation system capable of learning the AU co-occurrence relationship automatically. The AUs are subtle facial muscle movements, which are fundamental elements of the facial expression of emotion and can be used for the recognition and synthesis of facial expressions. Most existing AU intensity estimation algorithms manually pre-define the AU co-occurrence relationship. In contrast, our proposed model is able to capture the AU co-occurrence relationship automatically via the dynamic graph convolution. We empirically show that the proposed algorithm performs favorably against the previous methods. Secondly, we investigate knowledge distillation (KD) for the AU intensity estimation task and develop a lightweight model. Recent deep learning-based AU intensity estimation methods still face limitations in terms of the requirements of high computational resources. To address this limitation, we propose to distill deep structural facial relationships, including the region-wise and channel-wise relationships, in a heatmap regression framework. The proposed model achieves comparable performance compared to the state-of-the-art methods, while requiring substantially fewer parameters and much less computational cost. Thirdly, we develop a speech-driven facial expression animation system based on audio and text signals. Synthesizing realistic facial expressions for the whole face from speech has not been much explored. To improve the expressiveness of the upper facial expression, we exploit the contextual text representations extracted from the powerful pre-trained language model. Our model is shown to generate more expressive facial expressions compared to other baselines. Lastly, we propose a Transformer-based model for speech-driven facial expression animation, which achieves highly realistic animation of facial expressions, as well as accurate lip synchronization. To tackle the problem of limited context and data scarcity, we design a Transformer-based architecture to encode the long-term audio context and utilize the self-supervised speech representations. Comprehensive experimental results and analyses on two 3D audio-visual datasets demonstrate that the proposed method achieves the state-of-the-art performance in terms of realism and lip synchronization. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Facial expression - Computer simulation | - |
dc.subject.lcsh | Computer animation | - |
dc.subject.lcsh | Facial expression in art | - |
dc.title | Facial expression : from recognition to animation | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2022 | - |
dc.identifier.mmsid | 991044609105703414 | - |