File Download
Supplementary

postgraduate thesis: Effective algorithms for processing crowdsourced data

TitleEffective algorithms for processing crowdsourced data
Authors
Advisors
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Shan, C. [单才华]. (2019). Effective algorithms for processing crowdsourced data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractCrowdsourcing is an effective way to harness human effort to address computer-hard problems. These problems, such as entity resolution, sentiment analysis, data collection/cleaning and object ranking, cannot be handled by machines automatically and can be enhanced through the use of human cognitive ability. Typically, problems to be solved are first uploaded to a crowdsourcing platform by requesters. The platform then transforms each problem into small tasks and invites workers/the crowd to complete the tasks with monetary incentives. In this thesis, we address three challenging issues in crowdsourcing related to its applications and platform design. In the first problem, we study the use of crowdsourcing for data collection. Particularly, we focus on crowdsourcing tabular data, i.e., a collection of related items that are structured in a tabular form. Each row corresponds to an entity, and each column represents a particular attribute of entities. Existing work often treats related attributes independently, leading to suboptimal performance. Therefore, we present T-Crowd, which integrates each worker’s answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. In the second problem, we design a general early-stopping criterion in the crowdsourced ranking problem (i.e., infer an order of an object set based on crowd opinions). In a general ranking process, a set of tasks is generated based on objects (e.g., pairwise comparisons between two objects), then passed to workers to answer. After the collection, all the answers are aggregated to infer the ranking. Intuitively, the higher the number of collected answers, the more accurate is the final ranking. However, it is often hard to decide the number of collected answers required (i.e., budget); if a very large budget is used, a lot of ranking effort will be wasted. To terminate a ranking process early while achieving a high-quality ranking result, we use a set of statistical tools that can estimate the quality of the ranking result at any stage of the crowdsourcing process, and terminate as soon as the desired quality is achieved. In the third problem, we study the problem of assigning tasks in commercial crowdsourcing platforms (i.e., how to sort tasks for a coming worker). Previous works conduct the personalized recommendation of tasks to workers via supervised learning methods. However, they cannot handle the dynamics of the environment and may produce suboptimal results. To address this issue, we utilize Deep Q-Network (DQN), a reinforcement learning-based method combined with a neural network to estimate the expected long-term return of recommending a task. DQN inherently considers the immediate and future reward simultaneously and can be updated in real-time to deal with evolving data and dynamic changes. We further design two DQNs that capture the benefit of both workers and requesters and maximize the profit of the platform. To learn value functions in DQN effectively, we also propose novel state representations, carefully design the computation of Q values, and predict transition probabilities and future states.
DegreeDoctor of Philosophy
SubjectDatabase management
Crowdsourcing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/282065

 

DC FieldValueLanguage
dc.contributor.advisorCheng, CKR-
dc.contributor.advisorMamoulis, N-
dc.contributor.authorShan, Caihua-
dc.contributor.author单才华-
dc.date.accessioned2020-04-26T03:00:55Z-
dc.date.available2020-04-26T03:00:55Z-
dc.date.issued2019-
dc.identifier.citationShan, C. [单才华]. (2019). Effective algorithms for processing crowdsourced data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/282065-
dc.description.abstractCrowdsourcing is an effective way to harness human effort to address computer-hard problems. These problems, such as entity resolution, sentiment analysis, data collection/cleaning and object ranking, cannot be handled by machines automatically and can be enhanced through the use of human cognitive ability. Typically, problems to be solved are first uploaded to a crowdsourcing platform by requesters. The platform then transforms each problem into small tasks and invites workers/the crowd to complete the tasks with monetary incentives. In this thesis, we address three challenging issues in crowdsourcing related to its applications and platform design. In the first problem, we study the use of crowdsourcing for data collection. Particularly, we focus on crowdsourcing tabular data, i.e., a collection of related items that are structured in a tabular form. Each row corresponds to an entity, and each column represents a particular attribute of entities. Existing work often treats related attributes independently, leading to suboptimal performance. Therefore, we present T-Crowd, which integrates each worker’s answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. In the second problem, we design a general early-stopping criterion in the crowdsourced ranking problem (i.e., infer an order of an object set based on crowd opinions). In a general ranking process, a set of tasks is generated based on objects (e.g., pairwise comparisons between two objects), then passed to workers to answer. After the collection, all the answers are aggregated to infer the ranking. Intuitively, the higher the number of collected answers, the more accurate is the final ranking. However, it is often hard to decide the number of collected answers required (i.e., budget); if a very large budget is used, a lot of ranking effort will be wasted. To terminate a ranking process early while achieving a high-quality ranking result, we use a set of statistical tools that can estimate the quality of the ranking result at any stage of the crowdsourcing process, and terminate as soon as the desired quality is achieved. In the third problem, we study the problem of assigning tasks in commercial crowdsourcing platforms (i.e., how to sort tasks for a coming worker). Previous works conduct the personalized recommendation of tasks to workers via supervised learning methods. However, they cannot handle the dynamics of the environment and may produce suboptimal results. To address this issue, we utilize Deep Q-Network (DQN), a reinforcement learning-based method combined with a neural network to estimate the expected long-term return of recommending a task. DQN inherently considers the immediate and future reward simultaneously and can be updated in real-time to deal with evolving data and dynamic changes. We further design two DQNs that capture the benefit of both workers and requesters and maximize the profit of the platform. To learn value functions in DQN effectively, we also propose novel state representations, carefully design the computation of Q values, and predict transition probabilities and future states.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDatabase management-
dc.subject.lcshCrowdsourcing-
dc.titleEffective algorithms for processing crowdsourced data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044220086303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats