Evaluating nearest neighbor queries over uncertain databases

Xie, Xike.; 谢希科.

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b4784954

Supplementary

Citations:
Appears in Collections:
- Computer Science & Information Systems: Theses
- HKU Theses Online

postgraduate thesis: Evaluating nearest neighbor queries over uncertain databases

Title	Evaluating nearest neighbor queries over uncertain databases
Authors	Xie, Xike.谢希科.
Advisors	Advisor(s):Cheng, CK
Issue Date	2012
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Xie, X. [谢希科]. (2012). Evaluating nearest neighbor queries over uncertain databases. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4784954
Abstract	Nearest Neighbor (NN in short) queries are important in emerging applications, such as wireless networks, location-based services, and data stream applications, where the data obtained are often imprecise. The imprecision or imperfection of the data sources is modeled by uncertain data in recent research works. Handling uncertainty is important because this issue affects the quality of query answers. Although queries on uncertain data are useful, evaluating the queries on them can be costly, in terms of I/O or computational efficiency. In this thesis, we study how to efficiently evaluate NN queries on uncertain data. Given a query point q and a set of uncertain objects O, the possible nearest neighbor query returns a set of candidates which have non-zero probabilities to be the query answer. It is also interesting to ask \which region has the same set of possible nearest neighbors", and \which region has one specific object as its possible nearest neighbor". To reveal the relationship between the query space and nearest neighbor answers, we propose the UV-diagram, where the query space is split into disjoint partitions, such that each partition is associated with a set of objects. If a query point is located inside the partition, its possible nearest neighbors could be directly retrieved. However, the number of such partitions is exponential and the construction effort can be expensive. To tackle this problem, we propose an alternative concept, called UV-cell, and efficient algorithms for constructing it. The UV-cell has an irregular shape, which incurs difficulties in storage, maintenance, and query evaluation. We design an index structure, called UV-index, which is an approximated version of the UV-diagram. Extensive experiments show that the UV-index could efficiently answer different variants of NN queries, such as Probabilistic Nearest Neighbor Queries, Continuous Probabilistic Nearest Neighbor Queries. Another problem studied in this thesis is the trajectory nearest neighbor query. Here the query point is restricted to a pre-known trajectory. In applications (e.g. monitoring potential threats along a flight/vessel's trajectory), it is useful to derive nearest neighbors for all points on the query trajectory. Simple solutions, such as sampling or approximating the locations of uncertain objects as points, fails to achieve a good query quality. To handle this problem, we design efficient algorithms and optimization methods for this query. Experiments show that our solution can efficiently and accurately answer this query. Our solution is also scalable to large datasets and long trajectories.
Degree	Doctor of Philosophy
Subject	Nearest neighbor analysis (Statistics) Uncertainty (Information theory)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/174512
HKU Library Item ID	b4784954

DC Field	Value	Language
dc.contributor.advisor	Cheng, CK	-
dc.contributor.author	Xie, Xike.	-
dc.contributor.author	谢希科.	-
dc.date.issued	2012	-
dc.identifier.citation	Xie, X. [谢希科]. (2012). Evaluating nearest neighbor queries over uncertain databases. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4784954	-
dc.identifier.uri	http://hdl.handle.net/10722/174512	-
dc.description.abstract	Nearest Neighbor (NN in short) queries are important in emerging applications, such as wireless networks, location-based services, and data stream applications, where the data obtained are often imprecise. The imprecision or imperfection of the data sources is modeled by uncertain data in recent research works. Handling uncertainty is important because this issue affects the quality of query answers. Although queries on uncertain data are useful, evaluating the queries on them can be costly, in terms of I/O or computational efficiency. In this thesis, we study how to efficiently evaluate NN queries on uncertain data. Given a query point q and a set of uncertain objects O, the possible nearest neighbor query returns a set of candidates which have non-zero probabilities to be the query answer. It is also interesting to ask \which region has the same set of possible nearest neighbors", and \which region has one specific object as its possible nearest neighbor". To reveal the relationship between the query space and nearest neighbor answers, we propose the UV-diagram, where the query space is split into disjoint partitions, such that each partition is associated with a set of objects. If a query point is located inside the partition, its possible nearest neighbors could be directly retrieved. However, the number of such partitions is exponential and the construction effort can be expensive. To tackle this problem, we propose an alternative concept, called UV-cell, and efficient algorithms for constructing it. The UV-cell has an irregular shape, which incurs difficulties in storage, maintenance, and query evaluation. We design an index structure, called UV-index, which is an approximated version of the UV-diagram. Extensive experiments show that the UV-index could efficiently answer different variants of NN queries, such as Probabilistic Nearest Neighbor Queries, Continuous Probabilistic Nearest Neighbor Queries. Another problem studied in this thesis is the trajectory nearest neighbor query. Here the query point is restricted to a pre-known trajectory. In applications (e.g. monitoring potential threats along a flight/vessel's trajectory), it is useful to derive nearest neighbors for all points on the query trajectory. Simple solutions, such as sampling or approximating the locations of uncertain objects as points, fails to achieve a good query quality. To handle this problem, we design efficient algorithms and optimization methods for this query. Experiments show that our solution can efficiently and accurately answer this query. Our solution is also scalable to large datasets and long trajectories.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.source.uri	http://hub.hku.hk/bib/B4784954X	-
dc.subject.lcsh	Nearest neighbor analysis (Statistics)	-
dc.subject.lcsh	Uncertainty (Information theory)	-
dc.title	Evaluating nearest neighbor queries over uncertain databases	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b4784954	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b4784954	-
dc.date.hkucongregation	2012	-
dc.identifier.mmsid	991033485619703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Evaluating nearest neighbor queries over uncertain databases

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats