File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: Visual Intelligence Based on Deep Learning
Title | Visual Intelligence Based on Deep Learning |
---|---|
Authors | |
Issue Date | 2016 |
Citation | The International Conference on Artificial Intelligence and Robots (AIR2016), Sanya, China, 28-30 November 2016 How to Cite? |
Abstract | Deep learning is a powerful machine learning paradigm that involves deep neural network architectures, and is capable of extracting high-level representations from multi-dimensional sensory data. Such high-level representations are essential for many intelligence related tasks, including visual recognition, speech perception, and language understanding. In this talk, I first give an overview of deep learning and its applications in computer vision and visual perception. Then I present one of the deep learning projects for visual intelligence carried out in my research group. This project addresses scene labeling, which is also known as semantic scene segmentation. It is one of the most fundamental problems in computer vision, and refers to associating every pixel in an image with a semantic object category label, such as `building’, `car’, and `table’. High-quality scene labeling can be beneficial to many intelligent tasks, including robot task planning, pose estimation, context-based image retrieval, and automatic photo adjustment. Our project focuses on semantic labeling of RGB-D scenes, and generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Specifically, we tackle this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) model that captures image contexts from a global perspective and deeply fuses contextual information from multiple sources (i.e. photometric and depth channels), and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. It has been demonstrated on the large-scale SUNRGBD benchmark and the canonical NYUDv2 benchmark that our method outperforms existing state-of-the-art methods. In addition, it has been found that our scene labeling results can be leveraged to improve the ground-truth annotations of newly captured RGB-D images in the SUNRGBD dataset. |
Description | Invited Speech |
Persistent Identifier | http://hdl.handle.net/10722/254117 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yu, Y | - |
dc.date.accessioned | 2018-06-06T08:54:50Z | - |
dc.date.available | 2018-06-06T08:54:50Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | The International Conference on Artificial Intelligence and Robots (AIR2016), Sanya, China, 28-30 November 2016 | - |
dc.identifier.uri | http://hdl.handle.net/10722/254117 | - |
dc.description | Invited Speech | - |
dc.description.abstract | Deep learning is a powerful machine learning paradigm that involves deep neural network architectures, and is capable of extracting high-level representations from multi-dimensional sensory data. Such high-level representations are essential for many intelligence related tasks, including visual recognition, speech perception, and language understanding. In this talk, I first give an overview of deep learning and its applications in computer vision and visual perception. Then I present one of the deep learning projects for visual intelligence carried out in my research group. This project addresses scene labeling, which is also known as semantic scene segmentation. It is one of the most fundamental problems in computer vision, and refers to associating every pixel in an image with a semantic object category label, such as `building’, `car’, and `table’. High-quality scene labeling can be beneficial to many intelligent tasks, including robot task planning, pose estimation, context-based image retrieval, and automatic photo adjustment. Our project focuses on semantic labeling of RGB-D scenes, and generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Specifically, we tackle this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) model that captures image contexts from a global perspective and deeply fuses contextual information from multiple sources (i.e. photometric and depth channels), and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. It has been demonstrated on the large-scale SUNRGBD benchmark and the canonical NYUDv2 benchmark that our method outperforms existing state-of-the-art methods. In addition, it has been found that our scene labeling results can be leveraged to improve the ground-truth annotations of newly captured RGB-D images in the SUNRGBD dataset. | - |
dc.language | eng | - |
dc.relation.ispartof | International Conference on Artificial Intelligence and Robots (AIR 2016) | - |
dc.title | Visual Intelligence Based on Deep Learning | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Yu, Y: yzyu@cs.hku.hk | - |
dc.identifier.authority | Yu, Y=rp01415 | - |
dc.identifier.hkuros | 276550 | - |