File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Mining heterogeneous information networks
Title | Mining heterogeneous information networks |
---|---|
Authors | |
Advisors | |
Issue Date | 2019 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Huang, Z. [黄智鹏]. (2019). Mining heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Heterogeneous information networks (HINs), such as DBLP, YAGO, DBpedia and
Freebase, have recently received a lot of attention. These graph data sources contain
a vast number of inter-related facts, and they are used to facilitate the discovery of
interesting knowledge. In this thesis, we address three challenging problems of mining
HINs, i.e., (i) relevance search, (ii) entity embedding, and (iii) query recommendation
with HINs.
First, relevance search on large-scale HINs is studied. We propose a model named
meta structure, which is essentially an extension of meta path, to capture the relationship
among two entities in a HIN. For example, a researcher may want to find out two authors
that have published papers in the same venue, and have also mentioned the same topic.
Then he can specify his query using our meta structure to efficiently retrieve such entity
pairs in a large HIN. We also propose a data structure named i-LTable to boost the
efficiency of query evaluation. Our experiments on real HINs show that meta structure
is more effective than meta path in various tasks, such as classification, clustering and
kNN search, etc.
Next, we study entity embedding on HINs. Basically, our goal is to represent each
entity of a HIN as a vector, such that the proximity in the original HIN is preserved.
Specifically, we propose an objective function, which aims at minimizing the distance
between two probability distributions, one modeling the meta path-based proximities,
the other modeling the proximities in the embedded vector space. We also investigate
the use of negative sampling to accelerate the optimization process. As shown in our extensive
experimental evaluation, our method creates embeddings of high quality and has
superior performance in several data mining tasks compared to state-of-the-art network
embedding methods.
Finally, we study how to use a knowledge base modeled as an HIN, in order to improve
the quality of query recommendation for search engines. Specifically, we examine
two information sources: (1) a knowledge base HIN, such as YAGO and Freebase; and
(2) a query log from a search engine. We study how to use these sources to find new
entities useful for query recommendation. We further study a hybrid framework that
integrates different query recommendation methods effectively. As shown in the experiments,
our proposed approaches provide better recommendations than existing solutions
for long-tail queries. In addition, our implemented system needs less than 100ms to generate
query recommendations. Thus, our solution is suitable for providing online query
recommendation services for search engines. |
Degree | Doctor of Philosophy |
Subject | Computer networks Data mining |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/278410 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kao, CM | - |
dc.contributor.advisor | Cheng, CK | - |
dc.contributor.advisor | Mamoulis, N | - |
dc.contributor.author | Huang, Zhipeng | - |
dc.contributor.author | 黄智鹏 | - |
dc.date.accessioned | 2019-10-09T01:17:36Z | - |
dc.date.available | 2019-10-09T01:17:36Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Huang, Z. [黄智鹏]. (2019). Mining heterogeneous information networks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/278410 | - |
dc.description.abstract | Heterogeneous information networks (HINs), such as DBLP, YAGO, DBpedia and Freebase, have recently received a lot of attention. These graph data sources contain a vast number of inter-related facts, and they are used to facilitate the discovery of interesting knowledge. In this thesis, we address three challenging problems of mining HINs, i.e., (i) relevance search, (ii) entity embedding, and (iii) query recommendation with HINs. First, relevance search on large-scale HINs is studied. We propose a model named meta structure, which is essentially an extension of meta path, to capture the relationship among two entities in a HIN. For example, a researcher may want to find out two authors that have published papers in the same venue, and have also mentioned the same topic. Then he can specify his query using our meta structure to efficiently retrieve such entity pairs in a large HIN. We also propose a data structure named i-LTable to boost the efficiency of query evaluation. Our experiments on real HINs show that meta structure is more effective than meta path in various tasks, such as classification, clustering and kNN search, etc. Next, we study entity embedding on HINs. Basically, our goal is to represent each entity of a HIN as a vector, such that the proximity in the original HIN is preserved. Specifically, we propose an objective function, which aims at minimizing the distance between two probability distributions, one modeling the meta path-based proximities, the other modeling the proximities in the embedded vector space. We also investigate the use of negative sampling to accelerate the optimization process. As shown in our extensive experimental evaluation, our method creates embeddings of high quality and has superior performance in several data mining tasks compared to state-of-the-art network embedding methods. Finally, we study how to use a knowledge base modeled as an HIN, in order to improve the quality of query recommendation for search engines. Specifically, we examine two information sources: (1) a knowledge base HIN, such as YAGO and Freebase; and (2) a query log from a search engine. We study how to use these sources to find new entities useful for query recommendation. We further study a hybrid framework that integrates different query recommendation methods effectively. As shown in the experiments, our proposed approaches provide better recommendations than existing solutions for long-tail queries. In addition, our implemented system needs less than 100ms to generate query recommendations. Thus, our solution is suitable for providing online query recommendation services for search engines. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Computer networks | - |
dc.subject.lcsh | Data mining | - |
dc.title | Mining heterogeneous information networks | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044146571903414 | - |
dc.date.hkucongregation | 2019 | - |
dc.identifier.mmsid | 991044146571903414 | - |