File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Keyword extraction and headline generation using novel word features

TitleKeyword extraction and headline generation using novel word features
Authors
Issue Date2010
Citation
Proceedings Of The National Conference On Artificial Intelligence, 2010, v. 3, p. 1461-1466 How to Cite?
AbstractWe introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Persistent Identifierhttp://hdl.handle.net/10722/151980
References

 

DC FieldValueLanguage
dc.contributor.authorXu, Sen_US
dc.contributor.authorYang, Sen_US
dc.contributor.authorLau, FCMen_US
dc.date.accessioned2012-06-26T06:31:51Z-
dc.date.available2012-06-26T06:31:51Z-
dc.date.issued2010en_US
dc.identifier.citationProceedings Of The National Conference On Artificial Intelligence, 2010, v. 3, p. 1461-1466en_US
dc.identifier.urihttp://hdl.handle.net/10722/151980-
dc.description.abstractWe introduce several novel word features for keyword extraction and headline generation. These new word features are derived according to the background knowledge of a document as supplied by Wikipedia. Given a document, to acquire its background knowledge from Wikipedia, we first generat e a query for searching the Wikipedia corpus based on the key facts present in the document. We then use the query to find articles in the Wikipedia corpus that are closely related to the contents of the document. With the Wikipedia search result article set, we extract the inlink, outlink, category and infobox information in each article to derive a set of novel word features which reflect the document's background knowledge. These newly introduced word features of fer valuable indications on individual words' importance in the input document. They serve as nice complements to the traditional word features derivable from explicit information of a document. In addition, we also introduce a word-document fitness feat ure to characterize the influence of a document's genre on the keyword extraction and headline generation process. We study the effectiveness of these novel word features for keyword extraction and headline generation by experiments and have obtained very encouraging results. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.en_US
dc.languageengen_US
dc.relation.ispartofProceedings of the National Conference on Artificial Intelligenceen_US
dc.titleKeyword extraction and headline generation using novel word featuresen_US
dc.typeConference_Paperen_US
dc.identifier.emailLau, FCM:fcmlau@cs.hku.hken_US
dc.identifier.authorityLau, FCM=rp00221en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.scopuseid_2-s2.0-77958586107en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-77958586107&selection=ref&src=s&origin=recordpageen_US
dc.identifier.volume3en_US
dc.identifier.spage1461en_US
dc.identifier.epage1466en_US
dc.identifier.scopusauthoridXu, S=7404439278en_US
dc.identifier.scopusauthoridYang, S=36620658700en_US
dc.identifier.scopusauthoridLau, FCM=7102749723en_US

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats