File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction

TitleAdaptive bayesian HMM for fully unsupervised chinese part-of-speech induction
Authors
KeywordsBayesian HMM
Chinese language model
Dirichlet distribution
Part-of-speech induction
Variational inference
Issue Date2012
PublisherAssociation for Computing Machinery, Inc. The Journal's web site is located at http://talip.acm.org
Citation
ACM Transactions on Asian Language Information Processing, 2012, v. 11 n. 3, article no. 9 How to Cite?
AbstractWe propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank. © 2012 ACM.
Persistent Identifierhttp://hdl.handle.net/10722/165866
ISSN

 

DC FieldValueLanguage
dc.contributor.authorZhang, Len_US
dc.contributor.authorChan, KPen_US
dc.date.accessioned2012-09-20T08:24:38Z-
dc.date.available2012-09-20T08:24:38Z-
dc.date.issued2012en_US
dc.identifier.citationACM Transactions on Asian Language Information Processing, 2012, v. 11 n. 3, article no. 9en_US
dc.identifier.issn1530-0226-
dc.identifier.urihttp://hdl.handle.net/10722/165866-
dc.description.abstractWe propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank. © 2012 ACM.-
dc.languageengen_US
dc.publisherAssociation for Computing Machinery, Inc. The Journal's web site is located at http://talip.acm.org-
dc.relation.ispartofACM Transactions on Asian Language Information Processingen_US
dc.rightsACM Transactions on Asian Language Information Processing. Copyright © Association for Computing Machinery, Inc.-
dc.subjectBayesian HMM-
dc.subjectChinese language model-
dc.subjectDirichlet distribution-
dc.subjectPart-of-speech induction-
dc.subjectVariational inference-
dc.titleAdaptive bayesian HMM for fully unsupervised chinese part-of-speech inductionen_US
dc.typeArticleen_US
dc.identifier.emailZhang, L: lzhang@cs.hku.hken_US
dc.identifier.emailChan, KP: kpchan@cs.hku.hk-
dc.identifier.authorityChan, KP=rp00092en_US
dc.description.naturelink_to_OA_fulltext-
dc.identifier.doi10.1145/2334801.2334803-
dc.identifier.scopuseid_2-s2.0-84866491555-
dc.identifier.hkuros210965en_US
dc.identifier.volume11en_US
dc.identifier.issue3-
dc.identifier.eissn1558-3430-
dc.publisher.placeUnited States-
dc.identifier.issnl1530-0226-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats