Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction

Zhang, L; Chan, KP

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/2334801.2334803
Scopus: eid_2-s2.0-84866491555
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction

Title	Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction
Authors	Zhang, L Chan, KP
Keywords	Bayesian HMM Chinese language model Dirichlet distribution Part-of-speech induction Variational inference
Issue Date	2012
Publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://talip.acm.org
Citation	ACM Transactions on Asian Language Information Processing, 2012, v. 11 n. 3, article no. 9 How to Cite? DOI: http://dx.doi.org/10.1145/2334801.2334803
Abstract	We propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank. © 2012 ACM.
Persistent Identifier	http://hdl.handle.net/10722/165866
ISSN	1530-0226

DC Field	Value	Language
dc.contributor.author	Zhang, L	en_US
dc.contributor.author	Chan, KP	en_US
dc.date.accessioned	2012-09-20T08:24:38Z	-
dc.date.available	2012-09-20T08:24:38Z	-
dc.date.issued	2012	en_US
dc.identifier.citation	ACM Transactions on Asian Language Information Processing, 2012, v. 11 n. 3, article no. 9	en_US
dc.identifier.issn	1530-0226	-
dc.identifier.uri	http://hdl.handle.net/10722/165866	-
dc.description.abstract	We propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank. © 2012 ACM.	-
dc.language	eng	en_US
dc.publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://talip.acm.org	-
dc.relation.ispartof	ACM Transactions on Asian Language Information Processing	en_US
dc.rights	ACM Transactions on Asian Language Information Processing. Copyright © Association for Computing Machinery, Inc.	-
dc.subject	Bayesian HMM	-
dc.subject	Chinese language model	-
dc.subject	Dirichlet distribution	-
dc.subject	Part-of-speech induction	-
dc.subject	Variational inference	-
dc.title	Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction	en_US
dc.type	Article	en_US
dc.identifier.email	Zhang, L: lzhang@cs.hku.hk	en_US
dc.identifier.email	Chan, KP: kpchan@cs.hku.hk	-
dc.identifier.authority	Chan, KP=rp00092	en_US
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.doi	10.1145/2334801.2334803	-
dc.identifier.scopus	eid_2-s2.0-84866491555	-
dc.identifier.hkuros	210965	en_US
dc.identifier.volume	11	en_US
dc.identifier.issue	3	-
dc.identifier.eissn	1558-3430	-
dc.publisher.place	United States	-
dc.identifier.issnl	1530-0226	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Adaptive bayesian HMM for fully unsupervised chinese part-of-speech induction

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats