File Download
Supplementary

postgraduate thesis: Selection of the number of components in mixture regression models by instability

TitleSelection of the number of components in mixture regression models by instability
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Huang, B. [黄北雪]. (2015). Selection of the number of components in mixture regression models by instability. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5576790
AbstractMixture regression model has been proven to be a useful tool in the study of heterogeneous populations arising in many fields. An important question in model building is the selection of the number of components, which is not only a unique problem for mixture model but also one with special challenge. Therefore, a lot of studies have been devoted to this topic and various approaches have been discussed. However, the majority emphasize the goodness of fit and do not differentiate this problem with diagnosis of other aspects of a mixture model. This thesis intends to propose a method originated from cluster analysis which offers a different perspective to the problem. The classification probability instability method based on the concept of instability is introduced to select the number of components in mixture regression models. Using crossvalidation, it estimates the instability in the posterior probabilities of the validation set under different candidate number of components and selects the one with the smallest instability. Consistency is established following the theorem in Wang (2010). To cater for the possible multilevels existing in components' structure, a stage selection scheme is devised that can detect major structures first and then minor structures. The output of such scheme can be represented by a tree diagram. Simulations are conducted to show the high accuracies of this method compared with several other information criteria methods in a number of settings. Smaller samples are also considered and two variations on the original method are examined. One is to consider only the top part of the tree diagram produced by the original method. The other is to do sampling with replacement so that the instability method is applicable in smaller samples. Simulations show that the two variations could effectively complement the original one and the different features of these methods are illustrated by examples. A real data set on the plasma beta-carotene concentration is analyzed using the proposed methods.
DegreeMaster of Philosophy
SubjectRegression analysis
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/221109

 

DC FieldValueLanguage
dc.contributor.authorHuang, Beixue-
dc.contributor.author黄北雪-
dc.date.accessioned2015-10-26T23:12:00Z-
dc.date.available2015-10-26T23:12:00Z-
dc.date.issued2015-
dc.identifier.citationHuang, B. [黄北雪]. (2015). Selection of the number of components in mixture regression models by instability. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5576790-
dc.identifier.urihttp://hdl.handle.net/10722/221109-
dc.description.abstractMixture regression model has been proven to be a useful tool in the study of heterogeneous populations arising in many fields. An important question in model building is the selection of the number of components, which is not only a unique problem for mixture model but also one with special challenge. Therefore, a lot of studies have been devoted to this topic and various approaches have been discussed. However, the majority emphasize the goodness of fit and do not differentiate this problem with diagnosis of other aspects of a mixture model. This thesis intends to propose a method originated from cluster analysis which offers a different perspective to the problem. The classification probability instability method based on the concept of instability is introduced to select the number of components in mixture regression models. Using crossvalidation, it estimates the instability in the posterior probabilities of the validation set under different candidate number of components and selects the one with the smallest instability. Consistency is established following the theorem in Wang (2010). To cater for the possible multilevels existing in components' structure, a stage selection scheme is devised that can detect major structures first and then minor structures. The output of such scheme can be represented by a tree diagram. Simulations are conducted to show the high accuracies of this method compared with several other information criteria methods in a number of settings. Smaller samples are also considered and two variations on the original method are examined. One is to consider only the top part of the tree diagram produced by the original method. The other is to do sampling with replacement so that the instability method is applicable in smaller samples. Simulations show that the two variations could effectively complement the original one and the different features of these methods are illustrated by examples. A real data set on the plasma beta-carotene concentration is analyzed using the proposed methods.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsCreative Commons: Attribution 3.0 Hong Kong License-
dc.subject.lcshRegression analysis-
dc.titleSelection of the number of components in mixture regression models by instability-
dc.typePG_Thesis-
dc.identifier.hkulb5576790-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats