Visual Intelligence Based on Deep Learning

Yu, Y

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Visual Intelligence Based on Deep Learning

Title	Visual Intelligence Based on Deep Learning
Authors	Yu, Y
Issue Date	2016
Citation	The International Conference on Artificial Intelligence and Robots (AIR2016), Sanya, China, 28-30 November 2016 How to Cite?
Abstract	Deep learning is a powerful machine learning paradigm that involves deep neural network architectures, and is capable of extracting high-level representations from multi-dimensional sensory data. Such high-level representations are essential for many intelligence related tasks, including visual recognition, speech perception, and language understanding. In this talk, I first give an overview of deep learning and its applications in computer vision and visual perception. Then I present one of the deep learning projects for visual intelligence carried out in my research group. This project addresses scene labeling, which is also known as semantic scene segmentation. It is one of the most fundamental problems in computer vision, and refers to associating every pixel in an image with a semantic object category label, such as `building’, `car’, and `table’. High-quality scene labeling can be beneficial to many intelligent tasks, including robot task planning, pose estimation, context-based image retrieval, and automatic photo adjustment. Our project focuses on semantic labeling of RGB-D scenes, and generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Specifically, we tackle this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) model that captures image contexts from a global perspective and deeply fuses contextual information from multiple sources (i.e. photometric and depth channels), and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. It has been demonstrated on the large-scale SUNRGBD benchmark and the canonical NYUDv2 benchmark that our method outperforms existing state-of-the-art methods. In addition, it has been found that our scene labeling results can be leveraged to improve the ground-truth annotations of newly captured RGB-D images in the SUNRGBD dataset.
Description	Invited Speech
Persistent Identifier	http://hdl.handle.net/10722/254117

DC Field	Value	Language
dc.contributor.author	Yu, Y	-
dc.date.accessioned	2018-06-06T08:54:50Z	-
dc.date.available	2018-06-06T08:54:50Z	-
dc.date.issued	2016	-
dc.identifier.citation	The International Conference on Artificial Intelligence and Robots (AIR2016), Sanya, China, 28-30 November 2016	-
dc.identifier.uri	http://hdl.handle.net/10722/254117	-
dc.description	Invited Speech	-
dc.description.abstract	Deep learning is a powerful machine learning paradigm that involves deep neural network architectures, and is capable of extracting high-level representations from multi-dimensional sensory data. Such high-level representations are essential for many intelligence related tasks, including visual recognition, speech perception, and language understanding. In this talk, I first give an overview of deep learning and its applications in computer vision and visual perception. Then I present one of the deep learning projects for visual intelligence carried out in my research group. This project addresses scene labeling, which is also known as semantic scene segmentation. It is one of the most fundamental problems in computer vision, and refers to associating every pixel in an image with a semantic object category label, such as `building’, `car’, and `table’. High-quality scene labeling can be beneficial to many intelligent tasks, including robot task planning, pose estimation, context-based image retrieval, and automatic photo adjustment. Our project focuses on semantic labeling of RGB-D scenes, and generates pixel-wise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. Specifically, we tackle this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) model that captures image contexts from a global perspective and deeply fuses contextual information from multiple sources (i.e. photometric and depth channels), and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. It has been demonstrated on the large-scale SUNRGBD benchmark and the canonical NYUDv2 benchmark that our method outperforms existing state-of-the-art methods. In addition, it has been found that our scene labeling results can be leveraged to improve the ground-truth annotations of newly captured RGB-D images in the SUNRGBD dataset.	-
dc.language	eng	-
dc.relation.ispartof	International Conference on Artificial Intelligence and Robots (AIR 2016)	-
dc.title	Visual Intelligence Based on Deep Learning	-
dc.type	Conference_Paper	-
dc.identifier.email	Yu, Y: yzyu@cs.hku.hk	-
dc.identifier.authority	Yu, Y=rp01415	-
dc.identifier.hkuros	276550	-

File Download

Supplementary

Conference Paper: Visual Intelligence Based on Deep Learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats