File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Mining correlated bursty topic patterns from coordinated text streams

TitleMining correlated bursty topic patterns from coordinated text streams
Authors
KeywordsClustering
Coordinated streams
Correlated bursty patterns
Reinforcement
Data sets
Probabilistic algorithms
Text mining
Clustering algorithms
Correlation methods
Database systems
Probabilistic logics
Problem solving
Set theory
Data mining
Issue Date2007
Citation
The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA., 12-15 August 2007. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, p. 784-793 How to Cite?
AbstractPrevious work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (called coordinated text streams), which offer new opportunities for text mining. For example, when a major event happens, all the news articles published by different agencies in different languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining correlated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we define and study this novel text mining problem. We propose a general probabilistic algorithm which can effectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely different vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can effectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events covered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the major research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period. © 2007 ACM.
Persistent Identifierhttp://hdl.handle.net/10722/180712
ISBN

 

DC FieldValueLanguage
dc.contributor.authorWang, Xen_US
dc.contributor.authorZhai, Cen_US
dc.contributor.authorHu, Xen_US
dc.contributor.authorSproat, Ren_US
dc.date.accessioned2013-01-28T01:41:33Z-
dc.date.available2013-01-28T01:41:33Z-
dc.date.issued2007en_US
dc.identifier.citationThe 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA., 12-15 August 2007. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, p. 784-793en_US
dc.identifier.isbn1595936092en_US
dc.identifier.isbn9781595936097en_US
dc.identifier.urihttp://hdl.handle.net/10722/180712-
dc.description.abstractPrevious work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (called coordinated text streams), which offer new opportunities for text mining. For example, when a major event happens, all the news articles published by different agencies in different languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining correlated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we define and study this novel text mining problem. We propose a general probabilistic algorithm which can effectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely different vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can effectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events covered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the major research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period. © 2007 ACM.en_US
dc.languageengen_US
dc.relation.ispartofProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-
dc.subjectClusteringen_US
dc.subjectCoordinated streamsen_US
dc.subjectCorrelated bursty patternsen_US
dc.subjectReinforcementen_US
dc.subjectData setsen_US
dc.subjectProbabilistic algorithmsen_US
dc.subjectText miningen_US
dc.subjectClustering algorithmsen_US
dc.subjectCorrelation methodsen_US
dc.subjectDatabase systemsen_US
dc.subjectProbabilistic logicsen_US
dc.subjectProblem solvingen_US
dc.subjectSet theoryen_US
dc.subjectData miningen_US
dc.titleMining correlated bursty topic patterns from coordinated text streamsen_US
dc.typeConference_Paperen_US
dc.identifier.emailHu, X: xiaoxhu@hku.hken_US
dc.identifier.authorityHu, X=rp01711en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.doi10.1145/1281192.1281276en_US
dc.identifier.scopuseid_2-s2.0-36849036336-
dc.identifier.spage784en_US
dc.identifier.epage793en_US
dc.customcontrol.immutablesml 160129 - amend-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats