File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Mining multi-faceted data

TitleMining multi-faceted data
Authors
Advisors
Issue Date2013
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wan, C. [萬暢]. (2013). Mining multi-faceted data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194751
AbstractMulti-faceted data contains different types of objects and relationships between them. With rapid growth of web-based services, multi-faceted data are increasing (e.g. Flickr, Yago, IMDB), which offers us richer information to infer users’ preferences and provide them better services. In this study, we look at two types of multi-faceted data: social tagging system and heterogeneous information network and how to improve service such as resources retrieving and classification on them. In social tagging systems, resources such as images and videos are annotated with descriptive words called tags. It has been shown that tag-based resource searching and retrieval is much more effective than content-based retrieval. With the advances in mobile technology, many resources are also geo-tagged with location information. We observe that a traditional tag (word) can carry different semantics at different locations. We study how location information can be used to help distinguish the different semantics of a resource’s tags and thus to improve retrieval accuracy. Given a search query, we propose a location-partitioning method that partitions all locations into regions such that the user query carries distinguishing semantics in each region. Based on the identified regions, we utilize location information in estimating the ranking scores of resources for the given query. These ranking scores are learned using the Bayesian Personalized Ranking (BPR) framework. Two algorithms, namely, LTD and LPITF, which apply Tucker Decomposition and Pairwise Interaction Tensor Factorization, respectively for modeling the ranking score tensor are proposed. Through experiments on real datasets, we show that LTD and LPITF outperform other tag-based resource retrieval methods. A heterogeneous information network (HIN) is used to model objects of different types and their relationships. Meta-paths are sequences of object types. They are used to represent complex relationships between objects beyond what links in a homogeneous network capture. We study the problem of classifying objects in an HIN. We propose class-level meta-paths and study how they can be used to (1) build more accurate classifiers and (2) improve active learning in identifying objects for which training labels should be obtained. We show that class-level meta-paths and object classification exhibit interesting synergy. Our experimental results show that the use of class-level meta-paths results in very effective active learning and good classification performance in HINs.
DegreeMaster of Philosophy
SubjectData mining
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/197527

 

DC FieldValueLanguage
dc.contributor.advisorCheung, DWL-
dc.contributor.advisorKao, CM-
dc.contributor.authorWan, Chang-
dc.contributor.author萬暢-
dc.date.accessioned2014-05-27T23:16:41Z-
dc.date.available2014-05-27T23:16:41Z-
dc.date.issued2013-
dc.identifier.citationWan, C. [萬暢]. (2013). Mining multi-faceted data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194751-
dc.identifier.urihttp://hdl.handle.net/10722/197527-
dc.description.abstractMulti-faceted data contains different types of objects and relationships between them. With rapid growth of web-based services, multi-faceted data are increasing (e.g. Flickr, Yago, IMDB), which offers us richer information to infer users’ preferences and provide them better services. In this study, we look at two types of multi-faceted data: social tagging system and heterogeneous information network and how to improve service such as resources retrieving and classification on them. In social tagging systems, resources such as images and videos are annotated with descriptive words called tags. It has been shown that tag-based resource searching and retrieval is much more effective than content-based retrieval. With the advances in mobile technology, many resources are also geo-tagged with location information. We observe that a traditional tag (word) can carry different semantics at different locations. We study how location information can be used to help distinguish the different semantics of a resource’s tags and thus to improve retrieval accuracy. Given a search query, we propose a location-partitioning method that partitions all locations into regions such that the user query carries distinguishing semantics in each region. Based on the identified regions, we utilize location information in estimating the ranking scores of resources for the given query. These ranking scores are learned using the Bayesian Personalized Ranking (BPR) framework. Two algorithms, namely, LTD and LPITF, which apply Tucker Decomposition and Pairwise Interaction Tensor Factorization, respectively for modeling the ranking score tensor are proposed. Through experiments on real datasets, we show that LTD and LPITF outperform other tag-based resource retrieval methods. A heterogeneous information network (HIN) is used to model objects of different types and their relationships. Meta-paths are sequences of object types. They are used to represent complex relationships between objects beyond what links in a homogeneous network capture. We study the problem of classifying objects in an HIN. We propose class-level meta-paths and study how they can be used to (1) build more accurate classifiers and (2) improve active learning in identifying objects for which training labels should be obtained. We show that class-level meta-paths and object classification exhibit interesting synergy. Our experimental results show that the use of class-level meta-paths results in very effective active learning and good classification performance in HINs.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshData mining-
dc.titleMining multi-faceted data-
dc.typePG_Thesis-
dc.identifier.hkulb5194751-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5194751-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats