File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Ranking and similarity queries on complex data types

TitleRanking and similarity queries on complex data types
Authors
Issue Date2014
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Cai, Y. [蔡奕倫]. (2014). Ranking and similarity queries on complex data types. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5435640
AbstractRanking queries and similarity queries are elementary operations with many important applications. There are lots of research works investigating efficient evaluation of various ranking and similarity queries in databases over the past few decades. In this thesis, ranking and similarity queries on three interesting complex data types are studied, namely, multidimensional cube, object summary and tree. Efficient and effective solutions are proposed to solve their related applications. First, the evaluation of ranking queries on multidimensional cubes is studied. In exploratory data analysis, a relation can be considered as a multidimensional cube to investigate the relationship among its attributes. Given a relation with records that can be ranked, an interesting problem is to identify selection conditions for the relation, which result in sub-relations qualified by an input record and render the ranking of the input record as high as possible among the qualifying tuples. The ranking of the input record in a sub-relation measures the quality of the corresponding multidimensional cube of this sub-relation. In this thesis, a standing maximization problem, which aims to identify a multidimensional cube of high quality, is extensively studied. As an immediate consequence of its NP-hardness, three greedy methods are proposed to explore the search space only partially, while striving to identify sub-optimal solutions of high quality. Next, the efficient evaluation of ranking queries on object summaries is investigated. An object summary is a tree structure of tuples that summarizes the context of a particular data subject tuple. The object summary has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the data subject tuples relevant to the keywords and their corresponding object summaries. However, a keyword search result may return a large number of object summaries, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this thesis, a model that ranks object summaries according to their relevance to a set of input thematic keywords is introduced. Efficient algorithms are proposed to answer the proposed thematic ranking query. Finally, the similarity join query on tree-structured data is studied. Treestructured data are ubiquitous nowadays and a number of applications require efficient management of such data. Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this thesis, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph.
DegreeDoctor of Philosophy
SubjectDatabase management
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/209507

 

DC FieldValueLanguage
dc.contributor.authorCai, Yilun-
dc.contributor.author蔡奕倫-
dc.date.accessioned2015-04-23T23:10:55Z-
dc.date.available2015-04-23T23:10:55Z-
dc.date.issued2014-
dc.identifier.citationCai, Y. [蔡奕倫]. (2014). Ranking and similarity queries on complex data types. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5435640-
dc.identifier.urihttp://hdl.handle.net/10722/209507-
dc.description.abstractRanking queries and similarity queries are elementary operations with many important applications. There are lots of research works investigating efficient evaluation of various ranking and similarity queries in databases over the past few decades. In this thesis, ranking and similarity queries on three interesting complex data types are studied, namely, multidimensional cube, object summary and tree. Efficient and effective solutions are proposed to solve their related applications. First, the evaluation of ranking queries on multidimensional cubes is studied. In exploratory data analysis, a relation can be considered as a multidimensional cube to investigate the relationship among its attributes. Given a relation with records that can be ranked, an interesting problem is to identify selection conditions for the relation, which result in sub-relations qualified by an input record and render the ranking of the input record as high as possible among the qualifying tuples. The ranking of the input record in a sub-relation measures the quality of the corresponding multidimensional cube of this sub-relation. In this thesis, a standing maximization problem, which aims to identify a multidimensional cube of high quality, is extensively studied. As an immediate consequence of its NP-hardness, three greedy methods are proposed to explore the search space only partially, while striving to identify sub-optimal solutions of high quality. Next, the efficient evaluation of ranking queries on object summaries is investigated. An object summary is a tree structure of tuples that summarizes the context of a particular data subject tuple. The object summary has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the data subject tuples relevant to the keywords and their corresponding object summaries. However, a keyword search result may return a large number of object summaries, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this thesis, a model that ranks object summaries according to their relevance to a set of input thematic keywords is introduced. Efficient algorithms are proposed to answer the proposed thematic ranking query. Finally, the similarity join query on tree-structured data is studied. Treestructured data are ubiquitous nowadays and a number of applications require efficient management of such data. Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this thesis, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshDatabase management-
dc.titleRanking and similarity queries on complex data types-
dc.typePG_Thesis-
dc.identifier.hkulb5435640-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5435640-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats