Some topics in modeling ranking data

Qi, Fang; 齊放

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5194731

Supplementary

Citations:
Appears in Collections:
- Statistics & Actuarial Science: Theses
- HKU Theses Online

postgraduate thesis: Some topics in modeling ranking data

Title	Some topics in modeling ranking data
Authors	Qi, Fang 齊放
Issue Date	2014
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Qi, F. [齊放]. (2014). Some topics in modeling ranking data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194731
Abstract	Many applications of analysis of ranking data arise from different fields of study, such as psychology, economics, and politics. Over the past decade, many ranking data models have been proposed. AdaBoost is proved to be a very successful technique to generate a stronger classifier from weak ones; it can be viewed as a forward stagewise additive modeling using the exponential loss function. Motivated by this, a new AdaBoost algorithm is developed for ranking data. Taking into consideration the ordinal structure of the ranking data, I propose measures based on the Spearman/Kendall distance to evaluate classifier instead of the usual misclassification rate. Some ranking datasets are tested by the new algorithm, and the results show that the new algorithm outperforms traditional algorithms. The distance-based model assumes that the probability of observing a ranking depends on the distance between the ranking and its central ranking. Prediction of ranking data can be made by combining distance-based model with the famous k-nearest-neighbor (kNN) method. This model can be improved by assigning weights to the neighbors according to their distances to the central ranking and assigning weights to the features according to their relative importance. For the feature weighting part, a revised version of the traditional ReliefF algorithm is proposed. From the experimental results we can see that the new algorithm is more suitable for ranking data problem. Error-correcting output codes (ECOC) is widely used in solving multi-class learning problems by decomposing the multi-class problem into several binary classification problems. Several ECOCs for ranking data are proposed and tested. By combining these ECOCs and some traditional binary classifiers, a predictive model for ranking data with high accuracy can be made. While the mixture of factor analyzers (MFA) is useful tool for analyzing heterogeneous data, it cannot be directly used for ranking data due to the special discrete ordinal structures of rankings. I fill in this gap by extending MFA to accommodate for complete and incomplete/partial ranking data. Both simulated and real examples are studied to illustrate the effectiveness of the proposed MFA methods.
Degree	Doctor of Philosophy
Subject	Ranking and selection (Statistics)
Dept/Program	Statistics and Actuarial Science
Persistent Identifier	http://hdl.handle.net/10722/209210
HKU Library Item ID	b5194731

DC Field	Value	Language
dc.contributor.author	Qi, Fang	-
dc.contributor.author	齊放	-
dc.date.accessioned	2015-04-11T23:10:03Z	-
dc.date.available	2015-04-11T23:10:03Z	-
dc.date.issued	2014	-
dc.identifier.citation	Qi, F. [齊放]. (2014). Some topics in modeling ranking data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194731	-
dc.identifier.uri	http://hdl.handle.net/10722/209210	-
dc.description.abstract	Many applications of analysis of ranking data arise from different fields of study, such as psychology, economics, and politics. Over the past decade, many ranking data models have been proposed. AdaBoost is proved to be a very successful technique to generate a stronger classifier from weak ones; it can be viewed as a forward stagewise additive modeling using the exponential loss function. Motivated by this, a new AdaBoost algorithm is developed for ranking data. Taking into consideration the ordinal structure of the ranking data, I propose measures based on the Spearman/Kendall distance to evaluate classifier instead of the usual misclassification rate. Some ranking datasets are tested by the new algorithm, and the results show that the new algorithm outperforms traditional algorithms. The distance-based model assumes that the probability of observing a ranking depends on the distance between the ranking and its central ranking. Prediction of ranking data can be made by combining distance-based model with the famous k-nearest-neighbor (kNN) method. This model can be improved by assigning weights to the neighbors according to their distances to the central ranking and assigning weights to the features according to their relative importance. For the feature weighting part, a revised version of the traditional ReliefF algorithm is proposed. From the experimental results we can see that the new algorithm is more suitable for ranking data problem. Error-correcting output codes (ECOC) is widely used in solving multi-class learning problems by decomposing the multi-class problem into several binary classification problems. Several ECOCs for ranking data are proposed and tested. By combining these ECOCs and some traditional binary classifiers, a predictive model for ranking data with high accuracy can be made. While the mixture of factor analyzers (MFA) is useful tool for analyzing heterogeneous data, it cannot be directly used for ranking data due to the special discrete ordinal structures of rankings. I fill in this gap by extending MFA to accommodate for complete and incomplete/partial ranking data. Both simulated and real examples are studied to illustrate the effectiveness of the proposed MFA methods.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Ranking and selection (Statistics)	-
dc.title	Some topics in modeling ranking data	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5194731	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Statistics and Actuarial Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5194731	-
dc.identifier.mmsid	991036876839703414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Some topics in modeling ranking data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats