Managing uncertainty of XML schema matching

Cheng, R; Gong, J; Cheung, DW

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICDE.2010.5447868
Scopus: eid_2-s2.0-77952781219
WOS: WOS:000286933100031
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Managing uncertainty of XML schema matching

Title	Managing uncertainty of XML schema matching
Authors	Cheng, R Gong, J Cheung, DW
Issue Date	2010
Publisher	IEEE, Computer Society.
Citation	The IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA., 1-6 March 2010. In International Conference on Data Engineering. Proceedings, 2010, p. 297-308 How to Cite? DOI: http://dx.doi.org/10.1109/ICDE.2010.5447868
Abstract	Despite of advances in machine learning technologies, a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study the problem of managing possible mappings between two heterogeneous XML schemas. We observe that for XML schemas, their possible mappings have a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner, and can be generated efficiently. Moreover, it supports the evaluation of probabilistic twig query (PTQ), which returns the probability of portions of an XML document that match the query pattern. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ, and present an efficient solution for it. The second challenge we have tackled is to efficiently generate possible mappings for a given schema matching. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-andconquer approach. An extensive evaluation on realistic datasets show that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. © 2010 IEEE.
Persistent Identifier	http://hdl.handle.net/10722/144828
ISSN	1084-4627 2023 SCImago Journal Rankings: 1.306
ISI Accession Number ID	WOS:000286933100031
References	References in Scopus

DC Field	Value	Language
dc.contributor.author	Cheng, R	en_HK
dc.contributor.author	Gong, J	en_HK
dc.contributor.author	Cheung, DW	en_HK
dc.date.accessioned	2012-02-07T08:23:19Z	-
dc.date.available	2012-02-07T08:23:19Z	-
dc.date.issued	2010	en_HK
dc.identifier.citation	The IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA., 1-6 March 2010. In International Conference on Data Engineering. Proceedings, 2010, p. 297-308	en_HK
dc.identifier.issn	1084-4627	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/144828	-
dc.description.abstract	Despite of advances in machine learning technologies, a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be derived from the matching result. In this paper, we study the problem of managing possible mappings between two heterogeneous XML schemas. We observe that for XML schemas, their possible mappings have a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner, and can be generated efficiently. Moreover, it supports the evaluation of probabilistic twig query (PTQ), which returns the probability of portions of an XML document that match the query pattern. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ, and present an efficient solution for it. The second challenge we have tackled is to efficiently generate possible mappings for a given schema matching. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-andconquer approach. An extensive evaluation on realistic datasets show that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. © 2010 IEEE.	en_HK
dc.language	eng	-
dc.publisher	IEEE, Computer Society.	-
dc.relation.ispartof	International Conference on Data Engineering. Proceedings	en_HK
dc.rights	©2010 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.	-
dc.title	Managing uncertainty of XML schema matching	en_HK
dc.type	Conference_Paper	en_HK
dc.identifier.email	Cheng, R:ckcheng@cs.hku.hk	en_HK
dc.identifier.email	Cheung, DW:dcheung@cs.hku.hk	en_HK
dc.identifier.authority	Cheng, R=rp00074	en_HK
dc.identifier.authority	Cheung, DW=rp00101	en_HK
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1109/ICDE.2010.5447868	en_HK
dc.identifier.scopus	eid_2-s2.0-77952781219	en_HK
dc.identifier.hkuros	176463	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-77952781219&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.spage	297	en_HK
dc.identifier.epage	308	en_HK
dc.identifier.isi	WOS:000286933100031	-
dc.publisher.place	United States	en_HK
dc.description.other	The IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA., 1-6 March 2010. In International Conference on Data Engineering. Proceedings, 2010, p. 297-308	-
dc.identifier.scopusauthorid	Cheng, R=7201955416	en_HK
dc.identifier.scopusauthorid	Gong, J=47961908400	en_HK
dc.identifier.scopusauthorid	Cheung, DW=34567902600	en_HK
dc.identifier.issnl	1084-4627	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Managing uncertainty of XML schema matching

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats