Keyword and fact identification and annotation : machine learning approaches

Liang, Yuzhi; 梁予之

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044014362103414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Keyword and fact identification and annotation : machine learning approaches

Title	Keyword and fact identification and annotation : machine learning approaches
Authors	Liang, Yuzhi 梁予之
Advisors	Advisor(s):Hui, CK Yiu, SM
Issue Date	2017
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liang, Y. [梁予之]. (2017). Keyword and fact identification and annotation : machine learning approaches. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	In the big data era, automatic retrieving useful information from various documents is a hot topic in artificial intelligence study. The popularity of social media makes people overwhelmed by the massive amount of information, it is necessary to develop some techniques to identify useful information efficiently. To better analysis this problem, we divide the documents into two types, one is the traditional document and the other is social media document. The traditional documents include news and articles, etc., which using formal language structures and strictly obey the grammatical rules. On the other hand, social media documents are such as posts on Facebook or Twitter, which are often short and informal. In this dissertation, we investigate keyword and key sentence identification and annotation in various types of documents. Machine learning is used in the information retrieval process. For social media documents, we first propose two novel frameworks of new word detection in Chinese tweets. Then we design a word annotation mechanism which interprets the Tweet-born words by automatic tagging with text labels. In addition, a hierarchical clustering algorithm is introduced to realize relevant words clustering. For the traditional documents, we propose a relevant sentence selection method which can improve the performance of question-answering systems.
Degree	Doctor of Philosophy
Subject	Data mining Machine learning
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/255035

DC Field	Value	Language
dc.contributor.advisor	Hui, CK	-
dc.contributor.advisor	Yiu, SM	-
dc.contributor.author	Liang, Yuzhi	-
dc.contributor.author	梁予之	-
dc.date.accessioned	2018-06-21T03:42:00Z	-
dc.date.available	2018-06-21T03:42:00Z	-
dc.date.issued	2017	-
dc.identifier.citation	Liang, Y. [梁予之]. (2017). Keyword and fact identification and annotation : machine learning approaches. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/255035	-
dc.description.abstract	In the big data era, automatic retrieving useful information from various documents is a hot topic in artificial intelligence study. The popularity of social media makes people overwhelmed by the massive amount of information, it is necessary to develop some techniques to identify useful information efficiently. To better analysis this problem, we divide the documents into two types, one is the traditional document and the other is social media document. The traditional documents include news and articles, etc., which using formal language structures and strictly obey the grammatical rules. On the other hand, social media documents are such as posts on Facebook or Twitter, which are often short and informal. In this dissertation, we investigate keyword and key sentence identification and annotation in various types of documents. Machine learning is used in the information retrieval process. For social media documents, we first propose two novel frameworks of new word detection in Chinese tweets. Then we design a word annotation mechanism which interprets the Tweet-born words by automatic tagging with text labels. In addition, a hierarchical clustering algorithm is introduced to realize relevant words clustering. For the traditional documents, we propose a relevant sentence selection method which can improve the performance of question-answering systems.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Data mining	-
dc.subject.lcsh	Machine learning	-
dc.title	Keyword and fact identification and annotation : machine learning approaches	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044014362103414	-
dc.date.hkucongregation	2018	-
dc.identifier.mmsid	991044014362103414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Keyword and fact identification and annotation : machine learning approaches

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats