File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Some topics in modeling ranking data
Title | Some topics in modeling ranking data |
---|---|
Authors | |
Issue Date | 2014 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Qi, F. [齊放]. (2014). Some topics in modeling ranking data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194731 |
Abstract | Many applications of analysis of ranking data arise from different fields of study, such as psychology, economics, and politics. Over the past decade, many ranking data models have been proposed. AdaBoost is proved to be a very successful technique to generate a stronger classifier from weak ones; it can be viewed as a forward stagewise additive modeling using the exponential loss function. Motivated by this, a new AdaBoost algorithm is developed for ranking data. Taking into consideration the ordinal structure of the ranking data, I propose measures based on the Spearman/Kendall distance to evaluate classifier instead of the usual misclassification rate. Some ranking datasets are tested by the new algorithm, and the results show that the new algorithm outperforms traditional algorithms.
The distance-based model assumes that the probability of observing a ranking depends on the distance between the ranking and its central ranking. Prediction of ranking data can be made by combining distance-based model with the famous k-nearest-neighbor (kNN) method. This model can be improved by assigning weights to the neighbors according to their distances to the central ranking and assigning weights to the features according to their relative importance. For the feature weighting part, a revised version of the traditional ReliefF algorithm is proposed. From the experimental results we can see that the new algorithm is more suitable for ranking data problem.
Error-correcting output codes (ECOC) is widely used in solving multi-class learning problems by decomposing the multi-class problem into several binary classification problems. Several ECOCs for ranking data are proposed and tested. By combining these ECOCs and some traditional binary classifiers, a predictive model for ranking data with high accuracy can be made.
While the mixture of factor analyzers (MFA) is useful tool for analyzing heterogeneous data, it cannot be directly used for ranking data due to the special discrete ordinal structures of rankings. I fill in this gap by extending MFA to accommodate for complete and incomplete/partial ranking data. Both simulated and real examples are studied to illustrate the effectiveness of the proposed MFA methods. |
Degree | Doctor of Philosophy |
Subject | Ranking and selection (Statistics) |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/209210 |
HKU Library Item ID | b5194731 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Qi, Fang | - |
dc.contributor.author | 齊放 | - |
dc.date.accessioned | 2015-04-11T23:10:03Z | - |
dc.date.available | 2015-04-11T23:10:03Z | - |
dc.date.issued | 2014 | - |
dc.identifier.citation | Qi, F. [齊放]. (2014). Some topics in modeling ranking data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5194731 | - |
dc.identifier.uri | http://hdl.handle.net/10722/209210 | - |
dc.description.abstract | Many applications of analysis of ranking data arise from different fields of study, such as psychology, economics, and politics. Over the past decade, many ranking data models have been proposed. AdaBoost is proved to be a very successful technique to generate a stronger classifier from weak ones; it can be viewed as a forward stagewise additive modeling using the exponential loss function. Motivated by this, a new AdaBoost algorithm is developed for ranking data. Taking into consideration the ordinal structure of the ranking data, I propose measures based on the Spearman/Kendall distance to evaluate classifier instead of the usual misclassification rate. Some ranking datasets are tested by the new algorithm, and the results show that the new algorithm outperforms traditional algorithms. The distance-based model assumes that the probability of observing a ranking depends on the distance between the ranking and its central ranking. Prediction of ranking data can be made by combining distance-based model with the famous k-nearest-neighbor (kNN) method. This model can be improved by assigning weights to the neighbors according to their distances to the central ranking and assigning weights to the features according to their relative importance. For the feature weighting part, a revised version of the traditional ReliefF algorithm is proposed. From the experimental results we can see that the new algorithm is more suitable for ranking data problem. Error-correcting output codes (ECOC) is widely used in solving multi-class learning problems by decomposing the multi-class problem into several binary classification problems. Several ECOCs for ranking data are proposed and tested. By combining these ECOCs and some traditional binary classifiers, a predictive model for ranking data with high accuracy can be made. While the mixture of factor analyzers (MFA) is useful tool for analyzing heterogeneous data, it cannot be directly used for ranking data due to the special discrete ordinal structures of rankings. I fill in this gap by extending MFA to accommodate for complete and incomplete/partial ranking data. Both simulated and real examples are studied to illustrate the effectiveness of the proposed MFA methods. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Ranking and selection (Statistics) | - |
dc.title | Some topics in modeling ranking data | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5194731 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5194731 | - |
dc.identifier.mmsid | 991036876839703414 | - |