File Download
Links for fulltext
(May Require Subscription)
- Scopus: eid_2-s2.0-84859240198
- WOS: WOS:000219665100123
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Interesting-phrase mining for ad-hoc text analytics
Title | Interesting-phrase mining for ad-hoc text analytics |
---|---|
Authors | |
Keywords | Business-intelligence Indexing methods Multi-word Named entities New York Time |
Issue Date | 2010 |
Publisher | Very Large Data Base (VLDB) Endowment Inc.. The Journal's web site is located at http://vldb.org/pvldb/index.html |
Citation | The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357 How to Cite? |
Abstract | Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles. © 2010 VLDB Endowment. |
Description | Research Session 41: Data Mining, Copy Detection and Data Publication |
Persistent Identifier | http://hdl.handle.net/10722/129557 |
ISSN | 2023 Impact Factor: 2.6 2023 SCImago Journal Rankings: 2.666 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Bedathur, S | en_US |
dc.contributor.author | Berberich, K | en_US |
dc.contributor.author | Dittrich, J | en_US |
dc.contributor.author | Mamoulis, N | en_US |
dc.contributor.author | Weikum, G | en_US |
dc.date.accessioned | 2010-12-23T08:39:16Z | - |
dc.date.available | 2010-12-23T08:39:16Z | - |
dc.date.issued | 2010 | en_US |
dc.identifier.citation | The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357 | en_US |
dc.identifier.issn | 2150-8097 | - |
dc.identifier.uri | http://hdl.handle.net/10722/129557 | - |
dc.description | Research Session 41: Data Mining, Copy Detection and Data Publication | - |
dc.description.abstract | Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles. © 2010 VLDB Endowment. | - |
dc.language | eng | en_US |
dc.publisher | Very Large Data Base (VLDB) Endowment Inc.. The Journal's web site is located at http://vldb.org/pvldb/index.html | - |
dc.relation.ispartof | Proceedings of the VLDB Endowment | - |
dc.subject | Business-intelligence | - |
dc.subject | Indexing methods | - |
dc.subject | Multi-word | - |
dc.subject | Named entities | - |
dc.subject | New York Time | - |
dc.title | Interesting-phrase mining for ad-hoc text analytics | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Bedathur, S: bedathur@mpi-inf.mpg.de | en_US |
dc.identifier.email | Berberich, K: kberberi@mpi-inf.mpg.de | - |
dc.identifier.email | Dittrich, J: jens.dittrich@cs.uni-saarland.de | - |
dc.identifier.email | Mamoulis, N: nikos@cs.hku.hk | - |
dc.identifier.email | Weikum, G: weikum@mpi-inf.mpg.de | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-84859240198 | - |
dc.identifier.hkuros | 176424 | en_US |
dc.identifier.volume | 3 | - |
dc.identifier.issue | 1-2 | - |
dc.identifier.spage | 1348 | - |
dc.identifier.epage | 1357 | - |
dc.identifier.isi | WOS:000219665100123 | - |
dc.publisher.place | United States | - |
dc.description.other | The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357 | - |
dc.identifier.issnl | 2150-8097 | - |