Effective algorithms for processing crowdsourced data

Shan, Caihua; 单才华

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Effective algorithms for processing crowdsourced data

Title	Effective algorithms for processing crowdsourced data
Authors	Shan, Caihua 单才华
Advisors	Advisor(s):Cheng, CKR Mamoulis, N
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Shan, C. [单才华]. (2019). Effective algorithms for processing crowdsourced data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Crowdsourcing is an effective way to harness human effort to address computer-hard problems. These problems, such as entity resolution, sentiment analysis, data collection/cleaning and object ranking, cannot be handled by machines automatically and can be enhanced through the use of human cognitive ability. Typically, problems to be solved are first uploaded to a crowdsourcing platform by requesters. The platform then transforms each problem into small tasks and invites workers/the crowd to complete the tasks with monetary incentives. In this thesis, we address three challenging issues in crowdsourcing related to its applications and platform design. In the first problem, we study the use of crowdsourcing for data collection. Particularly, we focus on crowdsourcing tabular data, i.e., a collection of related items that are structured in a tabular form. Each row corresponds to an entity, and each column represents a particular attribute of entities. Existing work often treats related attributes independently, leading to suboptimal performance. Therefore, we present T-Crowd, which integrates each worker’s answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. In the second problem, we design a general early-stopping criterion in the crowdsourced ranking problem (i.e., infer an order of an object set based on crowd opinions). In a general ranking process, a set of tasks is generated based on objects (e.g., pairwise comparisons between two objects), then passed to workers to answer. After the collection, all the answers are aggregated to infer the ranking. Intuitively, the higher the number of collected answers, the more accurate is the final ranking. However, it is often hard to decide the number of collected answers required (i.e., budget); if a very large budget is used, a lot of ranking effort will be wasted. To terminate a ranking process early while achieving a high-quality ranking result, we use a set of statistical tools that can estimate the quality of the ranking result at any stage of the crowdsourcing process, and terminate as soon as the desired quality is achieved. In the third problem, we study the problem of assigning tasks in commercial crowdsourcing platforms (i.e., how to sort tasks for a coming worker). Previous works conduct the personalized recommendation of tasks to workers via supervised learning methods. However, they cannot handle the dynamics of the environment and may produce suboptimal results. To address this issue, we utilize Deep Q-Network (DQN), a reinforcement learning-based method combined with a neural network to estimate the expected long-term return of recommending a task. DQN inherently considers the immediate and future reward simultaneously and can be updated in real-time to deal with evolving data and dynamic changes. We further design two DQNs that capture the benefit of both workers and requesters and maximize the profit of the platform. To learn value functions in DQN effectively, we also propose novel state representations, carefully design the computation of Q values, and predict transition probabilities and future states.
Degree	Doctor of Philosophy
Subject	Database management Crowdsourcing
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/282065

DC Field	Value	Language
dc.contributor.advisor	Cheng, CKR	-
dc.contributor.advisor	Mamoulis, N	-
dc.contributor.author	Shan, Caihua	-
dc.contributor.author	单才华	-
dc.date.accessioned	2020-04-26T03:00:55Z	-
dc.date.available	2020-04-26T03:00:55Z	-
dc.date.issued	2019	-
dc.identifier.citation	Shan, C. [单才华]. (2019). Effective algorithms for processing crowdsourced data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/282065	-
dc.description.abstract	Crowdsourcing is an effective way to harness human effort to address computer-hard problems. These problems, such as entity resolution, sentiment analysis, data collection/cleaning and object ranking, cannot be handled by machines automatically and can be enhanced through the use of human cognitive ability. Typically, problems to be solved are first uploaded to a crowdsourcing platform by requesters. The platform then transforms each problem into small tasks and invites workers/the crowd to complete the tasks with monetary incentives. In this thesis, we address three challenging issues in crowdsourcing related to its applications and platform design. In the first problem, we study the use of crowdsourcing for data collection. Particularly, we focus on crowdsourcing tabular data, i.e., a collection of related items that are structured in a tabular form. Each row corresponds to an entity, and each column represents a particular attribute of entities. Existing work often treats related attributes independently, leading to suboptimal performance. Therefore, we present T-Crowd, which integrates each worker’s answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. In the second problem, we design a general early-stopping criterion in the crowdsourced ranking problem (i.e., infer an order of an object set based on crowd opinions). In a general ranking process, a set of tasks is generated based on objects (e.g., pairwise comparisons between two objects), then passed to workers to answer. After the collection, all the answers are aggregated to infer the ranking. Intuitively, the higher the number of collected answers, the more accurate is the final ranking. However, it is often hard to decide the number of collected answers required (i.e., budget); if a very large budget is used, a lot of ranking effort will be wasted. To terminate a ranking process early while achieving a high-quality ranking result, we use a set of statistical tools that can estimate the quality of the ranking result at any stage of the crowdsourcing process, and terminate as soon as the desired quality is achieved. In the third problem, we study the problem of assigning tasks in commercial crowdsourcing platforms (i.e., how to sort tasks for a coming worker). Previous works conduct the personalized recommendation of tasks to workers via supervised learning methods. However, they cannot handle the dynamics of the environment and may produce suboptimal results. To address this issue, we utilize Deep Q-Network (DQN), a reinforcement learning-based method combined with a neural network to estimate the expected long-term return of recommending a task. DQN inherently considers the immediate and future reward simultaneously and can be updated in real-time to deal with evolving data and dynamic changes. We further design two DQNs that capture the benefit of both workers and requesters and maximize the profit of the platform. To learn value functions in DQN effectively, we also propose novel state representations, carefully design the computation of Q values, and predict transition probabilities and future states.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Database management	-
dc.subject.lcsh	Crowdsourcing	-
dc.title	Effective algorithms for processing crowdsourced data	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2020	-
dc.identifier.mmsid	991044220086303414	-

File Download

Supplementary

postgraduate thesis: Effective algorithms for processing crowdsourced data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats