Mining heterogeneous information networks

Huang, Zhipeng; 黄智鹏

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044146571903414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Mining heterogeneous information networks

Title	Mining heterogeneous information networks
Authors	Huang, Zhipeng 黄智鹏
Advisors	Advisor(s):Kao, CM Cheng, CK Mamoulis, N
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Huang, Z. [黄智鹏]. (2019). Mining heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Heterogeneous information networks (HINs), such as DBLP, YAGO, DBpedia and Freebase, have recently received a lot of attention. These graph data sources contain a vast number of inter-related facts, and they are used to facilitate the discovery of interesting knowledge. In this thesis, we address three challenging problems of mining HINs, i.e., (i) relevance search, (ii) entity embedding, and (iii) query recommendation with HINs. First, relevance search on large-scale HINs is studied. We propose a model named meta structure, which is essentially an extension of meta path, to capture the relationship among two entities in a HIN. For example, a researcher may want to find out two authors that have published papers in the same venue, and have also mentioned the same topic. Then he can specify his query using our meta structure to efficiently retrieve such entity pairs in a large HIN. We also propose a data structure named i-LTable to boost the efficiency of query evaluation. Our experiments on real HINs show that meta structure is more effective than meta path in various tasks, such as classification, clustering and kNN search, etc. Next, we study entity embedding on HINs. Basically, our goal is to represent each entity of a HIN as a vector, such that the proximity in the original HIN is preserved. Specifically, we propose an objective function, which aims at minimizing the distance between two probability distributions, one modeling the meta path-based proximities, the other modeling the proximities in the embedded vector space. We also investigate the use of negative sampling to accelerate the optimization process. As shown in our extensive experimental evaluation, our method creates embeddings of high quality and has superior performance in several data mining tasks compared to state-of-the-art network embedding methods. Finally, we study how to use a knowledge base modeled as an HIN, in order to improve the quality of query recommendation for search engines. Specifically, we examine two information sources: (1) a knowledge base HIN, such as YAGO and Freebase; and (2) a query log from a search engine. We study how to use these sources to find new entities useful for query recommendation. We further study a hybrid framework that integrates different query recommendation methods effectively. As shown in the experiments, our proposed approaches provide better recommendations than existing solutions for long-tail queries. In addition, our implemented system needs less than 100ms to generate query recommendations. Thus, our solution is suitable for providing online query recommendation services for search engines.
Degree	Doctor of Philosophy
Subject	Computer networks Data mining
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/278410

DC Field	Value	Language
dc.contributor.advisor	Kao, CM	-
dc.contributor.advisor	Cheng, CK	-
dc.contributor.advisor	Mamoulis, N	-
dc.contributor.author	Huang, Zhipeng	-
dc.contributor.author	黄智鹏	-
dc.date.accessioned	2019-10-09T01:17:36Z	-
dc.date.available	2019-10-09T01:17:36Z	-
dc.date.issued	2019	-
dc.identifier.citation	Huang, Z. [黄智鹏]. (2019). Mining heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/278410	-
dc.description.abstract	Heterogeneous information networks (HINs), such as DBLP, YAGO, DBpedia and Freebase, have recently received a lot of attention. These graph data sources contain a vast number of inter-related facts, and they are used to facilitate the discovery of interesting knowledge. In this thesis, we address three challenging problems of mining HINs, i.e., (i) relevance search, (ii) entity embedding, and (iii) query recommendation with HINs. First, relevance search on large-scale HINs is studied. We propose a model named meta structure, which is essentially an extension of meta path, to capture the relationship among two entities in a HIN. For example, a researcher may want to find out two authors that have published papers in the same venue, and have also mentioned the same topic. Then he can specify his query using our meta structure to efficiently retrieve such entity pairs in a large HIN. We also propose a data structure named i-LTable to boost the efficiency of query evaluation. Our experiments on real HINs show that meta structure is more effective than meta path in various tasks, such as classification, clustering and kNN search, etc. Next, we study entity embedding on HINs. Basically, our goal is to represent each entity of a HIN as a vector, such that the proximity in the original HIN is preserved. Specifically, we propose an objective function, which aims at minimizing the distance between two probability distributions, one modeling the meta path-based proximities, the other modeling the proximities in the embedded vector space. We also investigate the use of negative sampling to accelerate the optimization process. As shown in our extensive experimental evaluation, our method creates embeddings of high quality and has superior performance in several data mining tasks compared to state-of-the-art network embedding methods. Finally, we study how to use a knowledge base modeled as an HIN, in order to improve the quality of query recommendation for search engines. Specifically, we examine two information sources: (1) a knowledge base HIN, such as YAGO and Freebase; and (2) a query log from a search engine. We study how to use these sources to find new entities useful for query recommendation. We further study a hybrid framework that integrates different query recommendation methods effectively. As shown in the experiments, our proposed approaches provide better recommendations than existing solutions for long-tail queries. In addition, our implemented system needs less than 100ms to generate query recommendations. Thus, our solution is suitable for providing online query recommendation services for search engines.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Computer networks	-
dc.subject.lcsh	Data mining	-
dc.title	Mining heterogeneous information networks	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044146571903414	-
dc.date.hkucongregation	2019	-
dc.identifier.mmsid	991044146571903414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Mining heterogeneous information networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats