File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

TitleLSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling
Authors
KeywordsDepth and photometric data fusion
Image context modeling
Long short-term memory
RGB-D scene labeling
Issue Date2016
PublisherSpringer.
Citation
14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11-14 October 2016, Proceedings, Part II, p. 541-557 How to Cite?
AbstractSemantic labeling of RGB-D scenes is crucial to many intelligent applications including perceptual robotics. It generates pixelwise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. This paper addresses this problem by (i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and (ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. Specifically, contexts in photometric and depth channels are, respectively, captured by stacking several convolutional layers and a long short-term memory layer; the memory layer encodes both short-range and long-range spatial dependencies in an image along the vertical direction. Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts. At last, the fused contextual representation is concatenated with the convolutional features extracted from the photometric channels in order to improve the accuracy of fine-scale semantic labeling. Our proposed model has set a new state of the art, i.e., 48.1% and 49.4% average class accuracy over 37 categories (2.2% and 5.4% improvement) on the large-scale SUNRGBD dataset and the NYUDv2 dataset, respectively.
DescriptionLecture Notes in Computer Science, vol. 9906
Persistent Identifierhttp://hdl.handle.net/10722/243233
ISBN
ISSN
2023 SCImago Journal Rankings: 0.606
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLi, Z-
dc.contributor.authorGan, Y-
dc.contributor.authorLiang, X-
dc.contributor.authorYu, Y-
dc.contributor.authorCheng, H-
dc.contributor.authorLin, L-
dc.date.accessioned2017-08-25T02:51:59Z-
dc.date.available2017-08-25T02:51:59Z-
dc.date.issued2016-
dc.identifier.citation14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11-14 October 2016, Proceedings, Part II, p. 541-557-
dc.identifier.isbn9783319464749-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10722/243233-
dc.descriptionLecture Notes in Computer Science, vol. 9906-
dc.description.abstractSemantic labeling of RGB-D scenes is crucial to many intelligent applications including perceptual robotics. It generates pixelwise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. This paper addresses this problem by (i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and (ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. Specifically, contexts in photometric and depth channels are, respectively, captured by stacking several convolutional layers and a long short-term memory layer; the memory layer encodes both short-range and long-range spatial dependencies in an image along the vertical direction. Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts. At last, the fused contextual representation is concatenated with the convolutional features extracted from the photometric channels in order to improve the accuracy of fine-scale semantic labeling. Our proposed model has set a new state of the art, i.e., 48.1% and 49.4% average class accuracy over 37 categories (2.2% and 5.4% improvement) on the large-scale SUNRGBD dataset and the NYUDv2 dataset, respectively.-
dc.languageeng-
dc.publisherSpringer.-
dc.relation.ispartofEuropean Conference on Computer Vision (ECCV)-
dc.subjectDepth and photometric data fusion-
dc.subjectImage context modeling-
dc.subjectLong short-term memory-
dc.subjectRGB-D scene labeling-
dc.titleLSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling-
dc.typeConference_Paper-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.identifier.doi10.1007/978-3-319-46475-6_34-
dc.identifier.scopuseid_2-s2.0-84990849812-
dc.identifier.hkuros273676-
dc.identifier.volume2-
dc.identifier.spage541-
dc.identifier.epage557-
dc.identifier.eissn1611-3349-
dc.identifier.isiWOS:000389383900034-
dc.publisher.placeCham-
dc.identifier.issnl0302-9743-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats