File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Interesting-phrase mining for ad-hoc text analytics

TitleInteresting-phrase mining for ad-hoc text analytics
Authors
KeywordsBusiness-intelligence
Indexing methods
Multi-word
Named entities
New York Time
Issue Date2010
PublisherVery Large Data Base (VLDB) Endowment Inc.. The Journal's web site is located at http://vldb.org/pvldb/index.html
Citation
The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357 How to Cite?
AbstractLarge text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles. © 2010 VLDB Endowment.
DescriptionResearch Session 41: Data Mining, Copy Detection and Data Publication
Persistent Identifierhttp://hdl.handle.net/10722/129557
ISSN

 

DC FieldValueLanguage
dc.contributor.authorBedathur, Sen_US
dc.contributor.authorBerberich, Ken_US
dc.contributor.authorDittrich, Jen_US
dc.contributor.authorMamoulis, Nen_US
dc.contributor.authorWeikum, Gen_US
dc.date.accessioned2010-12-23T08:39:16Z-
dc.date.available2010-12-23T08:39:16Z-
dc.date.issued2010en_US
dc.identifier.citationThe 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357en_US
dc.identifier.issn2150-8097-
dc.identifier.urihttp://hdl.handle.net/10722/129557-
dc.descriptionResearch Session 41: Data Mining, Copy Detection and Data Publication-
dc.description.abstractLarge text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles. © 2010 VLDB Endowment.-
dc.languageengen_US
dc.publisherVery Large Data Base (VLDB) Endowment Inc.. The Journal's web site is located at http://vldb.org/pvldb/index.html-
dc.relation.ispartofProceedings of the VLDB Endowment-
dc.subjectBusiness-intelligence-
dc.subjectIndexing methods-
dc.subjectMulti-word-
dc.subjectNamed entities-
dc.subjectNew York Time-
dc.titleInteresting-phrase mining for ad-hoc text analyticsen_US
dc.typeConference_Paperen_US
dc.identifier.emailBedathur, S: bedathur@mpi-inf.mpg.deen_US
dc.identifier.emailBerberich, K: kberberi@mpi-inf.mpg.de-
dc.identifier.emailDittrich, J: jens.dittrich@cs.uni-saarland.de-
dc.identifier.emailMamoulis, N: nikos@cs.hku.hk-
dc.identifier.emailWeikum, G: weikum@mpi-inf.mpg.de-
dc.description.naturelink_to_OA_fulltext-
dc.identifier.scopuseid_2-s2.0-84859240198-
dc.identifier.hkuros176424en_US
dc.identifier.volume3-
dc.identifier.issue1-2-
dc.identifier.spage1348-
dc.identifier.epage1357-
dc.publisher.placeUnited States-
dc.description.otherThe 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats