File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Evaluating nearest neighbor queries over uncertain databases

TitleEvaluating nearest neighbor queries over uncertain databases
Authors
Advisors
Advisor(s):Cheng, CK
Issue Date2012
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Xie, X. [谢希科]. (2012). Evaluating nearest neighbor queries over uncertain databases. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4784954
AbstractNearest Neighbor (NN in short) queries are important in emerging applications, such as wireless networks, location-based services, and data stream applications, where the data obtained are often imprecise. The imprecision or imperfection of the data sources is modeled by uncertain data in recent research works. Handling uncertainty is important because this issue affects the quality of query answers. Although queries on uncertain data are useful, evaluating the queries on them can be costly, in terms of I/O or computational efficiency. In this thesis, we study how to efficiently evaluate NN queries on uncertain data. Given a query point q and a set of uncertain objects O, the possible nearest neighbor query returns a set of candidates which have non-zero probabilities to be the query answer. It is also interesting to ask \which region has the same set of possible nearest neighbors", and \which region has one specific object as its possible nearest neighbor". To reveal the relationship between the query space and nearest neighbor answers, we propose the UV-diagram, where the query space is split into disjoint partitions, such that each partition is associated with a set of objects. If a query point is located inside the partition, its possible nearest neighbors could be directly retrieved. However, the number of such partitions is exponential and the construction effort can be expensive. To tackle this problem, we propose an alternative concept, called UV-cell, and efficient algorithms for constructing it. The UV-cell has an irregular shape, which incurs difficulties in storage, maintenance, and query evaluation. We design an index structure, called UV-index, which is an approximated version of the UV-diagram. Extensive experiments show that the UV-index could efficiently answer different variants of NN queries, such as Probabilistic Nearest Neighbor Queries, Continuous Probabilistic Nearest Neighbor Queries. Another problem studied in this thesis is the trajectory nearest neighbor query. Here the query point is restricted to a pre-known trajectory. In applications (e.g. monitoring potential threats along a flight/vessel's trajectory), it is useful to derive nearest neighbors for all points on the query trajectory. Simple solutions, such as sampling or approximating the locations of uncertain objects as points, fails to achieve a good query quality. To handle this problem, we design efficient algorithms and optimization methods for this query. Experiments show that our solution can efficiently and accurately answer this query. Our solution is also scalable to large datasets and long trajectories.
DegreeDoctor of Philosophy
SubjectNearest neighbor analysis (Statistics)
Uncertainty (Information theory)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/174512
HKU Library Item IDb4784954

 

DC FieldValueLanguage
dc.contributor.advisorCheng, CK-
dc.contributor.authorXie, Xike.-
dc.contributor.author谢希科.-
dc.date.issued2012-
dc.identifier.citationXie, X. [谢希科]. (2012). Evaluating nearest neighbor queries over uncertain databases. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4784954-
dc.identifier.urihttp://hdl.handle.net/10722/174512-
dc.description.abstractNearest Neighbor (NN in short) queries are important in emerging applications, such as wireless networks, location-based services, and data stream applications, where the data obtained are often imprecise. The imprecision or imperfection of the data sources is modeled by uncertain data in recent research works. Handling uncertainty is important because this issue affects the quality of query answers. Although queries on uncertain data are useful, evaluating the queries on them can be costly, in terms of I/O or computational efficiency. In this thesis, we study how to efficiently evaluate NN queries on uncertain data. Given a query point q and a set of uncertain objects O, the possible nearest neighbor query returns a set of candidates which have non-zero probabilities to be the query answer. It is also interesting to ask \which region has the same set of possible nearest neighbors", and \which region has one specific object as its possible nearest neighbor". To reveal the relationship between the query space and nearest neighbor answers, we propose the UV-diagram, where the query space is split into disjoint partitions, such that each partition is associated with a set of objects. If a query point is located inside the partition, its possible nearest neighbors could be directly retrieved. However, the number of such partitions is exponential and the construction effort can be expensive. To tackle this problem, we propose an alternative concept, called UV-cell, and efficient algorithms for constructing it. The UV-cell has an irregular shape, which incurs difficulties in storage, maintenance, and query evaluation. We design an index structure, called UV-index, which is an approximated version of the UV-diagram. Extensive experiments show that the UV-index could efficiently answer different variants of NN queries, such as Probabilistic Nearest Neighbor Queries, Continuous Probabilistic Nearest Neighbor Queries. Another problem studied in this thesis is the trajectory nearest neighbor query. Here the query point is restricted to a pre-known trajectory. In applications (e.g. monitoring potential threats along a flight/vessel's trajectory), it is useful to derive nearest neighbors for all points on the query trajectory. Simple solutions, such as sampling or approximating the locations of uncertain objects as points, fails to achieve a good query quality. To handle this problem, we design efficient algorithms and optimization methods for this query. Experiments show that our solution can efficiently and accurately answer this query. Our solution is also scalable to large datasets and long trajectories.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.source.urihttp://hub.hku.hk/bib/B4784954X-
dc.subject.lcshNearest neighbor analysis (Statistics)-
dc.subject.lcshUncertainty (Information theory)-
dc.titleEvaluating nearest neighbor queries over uncertain databases-
dc.typePG_Thesis-
dc.identifier.hkulb4784954-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b4784954-
dc.date.hkucongregation2012-
dc.identifier.mmsid991033485619703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats