File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Scaling Similarity Joins over Tree-Structured Data

TitleScaling Similarity Joins over Tree-Structured Data
Authors
Issue Date2015
PublisherVery Large Data Base (VLDB) Endowment Inc. The Journal's web site is located at http://vldb.org/pvldb/index.html
Citation
Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii, 31 August-4th September 2015. In Proceedings of the VLDB Endowment, 2015, v. 8 n. 11, p. 1130-1141 How to Cite?
AbstractGiven a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude.
Persistent Identifierhttp://hdl.handle.net/10722/213720
ISSN

 

DC FieldValueLanguage
dc.contributor.authorTang, Y-
dc.contributor.authorCai, Y-
dc.contributor.authorMamoulis, N-
dc.date.accessioned2015-08-13T01:43:40Z-
dc.date.available2015-08-13T01:43:40Z-
dc.date.issued2015-
dc.identifier.citationProceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii, 31 August-4th September 2015. In Proceedings of the VLDB Endowment, 2015, v. 8 n. 11, p. 1130-1141-
dc.identifier.issn2150-8097-
dc.identifier.urihttp://hdl.handle.net/10722/213720-
dc.description.abstractGiven a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude.-
dc.languageeng-
dc.publisherVery Large Data Base (VLDB) Endowment Inc. The Journal's web site is located at http://vldb.org/pvldb/index.html-
dc.relation.ispartofProceedings of the VLDB Endowment-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.titleScaling Similarity Joins over Tree-Structured Data-
dc.typeConference_Paper-
dc.identifier.emailTang, Y: ytang@cs.hku.hk-
dc.identifier.emailCai, Y: ylcai@cs.hku.hk-
dc.identifier.emailMamoulis, N: nikos@cs.hku.hk-
dc.identifier.authorityMamoulis, N=rp00155-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.14778/2809974.2809976-
dc.identifier.hkuros246267-
dc.identifier.volume8-
dc.identifier.issue11-
dc.identifier.spage1130-
dc.identifier.epage1141-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats