File Download

There are no files associated with this item.

Supplementary

Conference Paper: Visual Intelligence Based on Deep Learning

TitleVisual Intelligence Based on Deep Learning
Authors
Issue Date2016
Citation
The International Conference on Artificial Intelligence and Robots (AIR2016), Sanya, China, 28-30 November 2016 How to Cite?
AbstractDeep learning is a powerful machine learning paradigm that involves deep neural network architectures, and is capable of extracting high-level representations from multi-dimensional sensory data. Such high-level representations are essential for many intelligence related tasks, including visual recognition, speech perception, and language understanding. In this talk, I first give an overview of deep learning and its applications in computer vision and visual perception. Then I present one of the deep learning projects for visual intelligence carried out in my research group. This project addresses scene labeling, which is also known as semantic scene segmentation. It is one of the most fundamental problems in computer vision, and refers to associating every pixel in an image with a semantic object category label, such as `building’, `car’, and `table’. High-quality scene labeling can be beneficial to many intelligent tasks, including robot task planning, pose estimation, context-based image retrieval, and automatic photo adjustment. Our project focuses on semantic labeling of RGB-D scenes, and generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Specifically, we tackle this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) model that captures image contexts from a global perspective and deeply fuses contextual information from multiple sources (i.e. photometric and depth channels), and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. It has been demonstrated on the large-scale SUNRGBD benchmark and the canonical NYUDv2 benchmark that our method outperforms existing state-of-the-art methods. In addition, it has been found that our scene labeling results can be leveraged to improve the ground-truth annotations of newly captured RGB-D images in the SUNRGBD dataset.
DescriptionInvited Speech
Persistent Identifierhttp://hdl.handle.net/10722/254117

 

DC FieldValueLanguage
dc.contributor.authorYu, Y-
dc.date.accessioned2018-06-06T08:54:50Z-
dc.date.available2018-06-06T08:54:50Z-
dc.date.issued2016-
dc.identifier.citationThe International Conference on Artificial Intelligence and Robots (AIR2016), Sanya, China, 28-30 November 2016-
dc.identifier.urihttp://hdl.handle.net/10722/254117-
dc.descriptionInvited Speech-
dc.description.abstractDeep learning is a powerful machine learning paradigm that involves deep neural network architectures, and is capable of extracting high-level representations from multi-dimensional sensory data. Such high-level representations are essential for many intelligence related tasks, including visual recognition, speech perception, and language understanding. In this talk, I first give an overview of deep learning and its applications in computer vision and visual perception. Then I present one of the deep learning projects for visual intelligence carried out in my research group. This project addresses scene labeling, which is also known as semantic scene segmentation. It is one of the most fundamental problems in computer vision, and refers to associating every pixel in an image with a semantic object category label, such as `building’, `car’, and `table’. High-quality scene labeling can be beneficial to many intelligent tasks, including robot task planning, pose estimation, context-based image retrieval, and automatic photo adjustment. Our project focuses on semantic labeling of RGB-D scenes, and generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Specifically, we tackle this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) model that captures image contexts from a global perspective and deeply fuses contextual information from multiple sources (i.e. photometric and depth channels), and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. It has been demonstrated on the large-scale SUNRGBD benchmark and the canonical NYUDv2 benchmark that our method outperforms existing state-of-the-art methods. In addition, it has been found that our scene labeling results can be leveraged to improve the ground-truth annotations of newly captured RGB-D images in the SUNRGBD dataset.-
dc.languageeng-
dc.relation.ispartofInternational Conference on Artificial Intelligence and Robots (AIR 2016)-
dc.titleVisual Intelligence Based on Deep Learning-
dc.typeConference_Paper-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.identifier.hkuros276550-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats