Interesting-phrase mining for ad-hoc text analytics

Bedathur, S; Berberich, K; Dittrich, J; Mamoulis, N; Weikum, G

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-84859240198
WOS: WOS:000219665100123
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Interesting-phrase mining for ad-hoc text analytics

Title	Interesting-phrase mining for ad-hoc text analytics
Authors	Bedathur, S Berberich, K Dittrich, J Mamoulis, N Weikum, G
Keywords	Business-intelligence Indexing methods Multi-word Named entities New York Time
Issue Date	2010
Publisher	Very Large Data Base (VLDB) Endowment Inc.. The Journal's web site is located at http://vldb.org/pvldb/index.html
Citation	The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357 How to Cite?
Abstract	Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles. © 2010 VLDB Endowment.
Description	Research Session 41: Data Mining, Copy Detection and Data Publication
Persistent Identifier	http://hdl.handle.net/10722/129557
ISSN	2150-8097 2023 Impact Factor: 2.6 2023 SCImago Journal Rankings: 2.666
ISI Accession Number ID	WOS:000219665100123

DC Field	Value	Language
dc.contributor.author	Bedathur, S	en_US
dc.contributor.author	Berberich, K	en_US
dc.contributor.author	Dittrich, J	en_US
dc.contributor.author	Mamoulis, N	en_US
dc.contributor.author	Weikum, G	en_US
dc.date.accessioned	2010-12-23T08:39:16Z	-
dc.date.available	2010-12-23T08:39:16Z	-
dc.date.issued	2010	en_US
dc.identifier.citation	The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357	en_US
dc.identifier.issn	2150-8097	-
dc.identifier.uri	http://hdl.handle.net/10722/129557	-
dc.description	Research Session 41: Data Mining, Copy Detection and Data Publication	-
dc.description.abstract	Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles. © 2010 VLDB Endowment.	-
dc.language	eng	en_US
dc.publisher	Very Large Data Base (VLDB) Endowment Inc.. The Journal's web site is located at http://vldb.org/pvldb/index.html	-
dc.relation.ispartof	Proceedings of the VLDB Endowment	-
dc.subject	Business-intelligence	-
dc.subject	Indexing methods	-
dc.subject	Multi-word	-
dc.subject	Named entities	-
dc.subject	New York Time	-
dc.title	Interesting-phrase mining for ad-hoc text analytics	en_US
dc.type	Conference_Paper	en_US
dc.identifier.email	Bedathur, S: bedathur@mpi-inf.mpg.de	en_US
dc.identifier.email	Berberich, K: kberberi@mpi-inf.mpg.de	-
dc.identifier.email	Dittrich, J: jens.dittrich@cs.uni-saarland.de	-
dc.identifier.email	Mamoulis, N: nikos@cs.hku.hk	-
dc.identifier.email	Weikum, G: weikum@mpi-inf.mpg.de	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.scopus	eid_2-s2.0-84859240198	-
dc.identifier.hkuros	176424	en_US
dc.identifier.volume	3	-
dc.identifier.issue	1-2	-
dc.identifier.spage	1348	-
dc.identifier.epage	1357	-
dc.identifier.isi	WOS:000219665100123	-
dc.publisher.place	United States	-
dc.description.other	The 36th International Conference on Very Large Data Bases, Singapore, 13-17 September 2010. In Proceedings of the VLDB Endowment, 2010, v. 3 n. 1-2, p. 1348-1357	-
dc.identifier.issnl	2150-8097	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Interesting-phrase mining for ad-hoc text analytics

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats