LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

Li, Z; Gan, Y; Liang, X; Yu, Y; Cheng, H; Lin, L

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-319-46475-6_34
Scopus: eid_2-s2.0-84990849812
WOS: WOS:000389383900034
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

Title	LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling
Authors	Li, Z Gan, Y Liang, X Yu, Y Cheng, H Lin, L
Keywords	Depth and photometric data fusion Image context modeling Long short-term memory RGB-D scene labeling
Issue Date	2016
Publisher	Springer.
Citation	14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11-14 October 2016, Proceedings, Part II, p. 541-557 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-319-46475-6_34
Abstract	Semantic labeling of RGB-D scenes is crucial to many intelligent applications including perceptual robotics. It generates pixelwise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. This paper addresses this problem by (i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and (ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. Specifically, contexts in photometric and depth channels are, respectively, captured by stacking several convolutional layers and a long short-term memory layer; the memory layer encodes both short-range and long-range spatial dependencies in an image along the vertical direction. Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts. At last, the fused contextual representation is concatenated with the convolutional features extracted from the photometric channels in order to improve the accuracy of fine-scale semantic labeling. Our proposed model has set a new state of the art, i.e., 48.1% and 49.4% average class accuracy over 37 categories (2.2% and 5.4% improvement) on the large-scale SUNRGBD dataset and the NYUDv2 dataset, respectively.
Description	Lecture Notes in Computer Science, vol. 9906
Persistent Identifier	http://hdl.handle.net/10722/243233
ISBN	9783319464749
ISSN	0302-9743 2023 SCImago Journal Rankings: 0.606
ISI Accession Number ID	WOS:000389383900034

DC Field	Value	Language
dc.contributor.author	Li, Z	-
dc.contributor.author	Gan, Y	-
dc.contributor.author	Liang, X	-
dc.contributor.author	Yu, Y	-
dc.contributor.author	Cheng, H	-
dc.contributor.author	Lin, L	-
dc.date.accessioned	2017-08-25T02:51:59Z	-
dc.date.available	2017-08-25T02:51:59Z	-
dc.date.issued	2016	-
dc.identifier.citation	14th European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 11-14 October 2016, Proceedings, Part II, p. 541-557	-
dc.identifier.isbn	9783319464749	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10722/243233	-
dc.description	Lecture Notes in Computer Science, vol. 9906	-
dc.description.abstract	Semantic labeling of RGB-D scenes is crucial to many intelligent applications including perceptual robotics. It generates pixelwise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. This paper addresses this problem by (i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and (ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. Specifically, contexts in photometric and depth channels are, respectively, captured by stacking several convolutional layers and a long short-term memory layer; the memory layer encodes both short-range and long-range spatial dependencies in an image along the vertical direction. Another long short-term memorized fusion layer is set up to integrate the contexts along the vertical direction from different channels, and perform bi-directional propagation of the fused vertical contexts along the horizontal direction to obtain true 2D global contexts. At last, the fused contextual representation is concatenated with the convolutional features extracted from the photometric channels in order to improve the accuracy of fine-scale semantic labeling. Our proposed model has set a new state of the art, i.e., 48.1% and 49.4% average class accuracy over 37 categories (2.2% and 5.4% improvement) on the large-scale SUNRGBD dataset and the NYUDv2 dataset, respectively.	-
dc.language	eng	-
dc.publisher	Springer.	-
dc.relation.ispartof	European Conference on Computer Vision (ECCV)	-
dc.subject	Depth and photometric data fusion	-
dc.subject	Image context modeling	-
dc.subject	Long short-term memory	-
dc.subject	RGB-D scene labeling	-
dc.title	LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling	-
dc.type	Conference_Paper	-
dc.identifier.email	Yu, Y: yzyu@cs.hku.hk	-
dc.identifier.authority	Yu, Y=rp01415	-
dc.identifier.doi	10.1007/978-3-319-46475-6_34	-
dc.identifier.scopus	eid_2-s2.0-84990849812	-
dc.identifier.hkuros	273676	-
dc.identifier.volume	2	-
dc.identifier.spage	541	-
dc.identifier.epage	557	-
dc.identifier.eissn	1611-3349	-
dc.identifier.isi	WOS:000389383900034	-
dc.publisher.place	Cham	-
dc.identifier.issnl	0302-9743	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats