File Download
Supplementary

postgraduate thesis: Data analytics in actuarial science

TitleData analytics in actuarial science
Authors
Advisors
Advisor(s):Cheung, KC
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Chen, Y. [陳永釗]. (2022). Data analytics in actuarial science. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractIn the era of Big Data, the importance of data analytics can never be overstated, particularly in actuarial science, as insurance companies rely heavily on the historical data of the financial market to predict their future profits, and claim history of policyholders to enhance the pricing of insurance products. To this end, in this thesis, I shall explore two major aspects in actuarial science from the perspective of data analytics. In the first part of the thesis, I shall conduct the first systematic study on the risk premium calibration under the celebrated evolutionary credibility models which had been studied in [2,33] but only for net premium, while this study now simultaneously estimates the process variance and the hypothetical mean. The objective is to minimize the mean square deviation of the empirical estimates from the respective theoretical mean and process variance, which leads to extending the set of classical normal equations. Although no more closed-form solutions of the normal equations can be obtained, An effective numerical scheme featuring a novel recursive LU algorithm is obtained for the progressively enlarging coefficient matrices, whose effectiveness is demonstrated through several common time series models, namely AR, MA and ARMA. The proposed method can also be viewed as a robust extension of the recent SURE estimator used in statistics literature, which assumes the underlying data being i.i.d. with the Normal-Inverse-Wishart structure, while a temporal dependence structure is allowed among the data without specifying the probability model. In the second part of the thesis, I shall focus on the important topic of risk classification in InsurTech. Quite a number of well-known classifiers are widely adopted for risk classification problems in practical use, including Random Forest, Classification and Regression Tree (CART), Logistic Regression, Shallow Neural Network (NN) and Support Vector Machine (SVM). However, among these classifiers, the application of SVM, NN and Logistic Regression to insurance datasets would lead to a potential substantial loss of information as these datasets involve a lot of categorical variables most of the time, yet none of these classifiers could handle them comprehensively, if not completely discarding them; on the other hand, while CART handles categorical and discrete feature variables well enough by its design, it lacks the mechanism to deal with continuous feature variables. Moreover, the relatively strong dependence structures among feature variables, especially among a great number of categorical feature variables, in insurance practices have not been explicitly accounted for in the aforementioned existing classifiers. I here propose to effectively model such an implicit strong enough dependence by comonotonicity. Particularly, both organized treatments on all categorical, continuous and discrete feature variables and efficient modelling of the dependence structure among them will be dealt with through the newly proposed Comonotone-Independence Bayes classifier (CIBer), leading to a far better clustering of the predictive feature variables facilitating a superior classification of insurable risks. I shall also demonstrate the effectiveness of CIBer as a tool in data analytics against those common classifiers through empirical studies upon several representative datasets in insurance.
DegreeDoctor of Philosophy
SubjectInsurance - Statistical methods
Insurance - Data processing
Big data
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/318321

 

DC FieldValueLanguage
dc.contributor.advisorCheung, KC-
dc.contributor.authorChen, Yongzhao-
dc.contributor.author陳永釗-
dc.date.accessioned2022-10-10T08:18:41Z-
dc.date.available2022-10-10T08:18:41Z-
dc.date.issued2022-
dc.identifier.citationChen, Y. [陳永釗]. (2022). Data analytics in actuarial science. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/318321-
dc.description.abstractIn the era of Big Data, the importance of data analytics can never be overstated, particularly in actuarial science, as insurance companies rely heavily on the historical data of the financial market to predict their future profits, and claim history of policyholders to enhance the pricing of insurance products. To this end, in this thesis, I shall explore two major aspects in actuarial science from the perspective of data analytics. In the first part of the thesis, I shall conduct the first systematic study on the risk premium calibration under the celebrated evolutionary credibility models which had been studied in [2,33] but only for net premium, while this study now simultaneously estimates the process variance and the hypothetical mean. The objective is to minimize the mean square deviation of the empirical estimates from the respective theoretical mean and process variance, which leads to extending the set of classical normal equations. Although no more closed-form solutions of the normal equations can be obtained, An effective numerical scheme featuring a novel recursive LU algorithm is obtained for the progressively enlarging coefficient matrices, whose effectiveness is demonstrated through several common time series models, namely AR, MA and ARMA. The proposed method can also be viewed as a robust extension of the recent SURE estimator used in statistics literature, which assumes the underlying data being i.i.d. with the Normal-Inverse-Wishart structure, while a temporal dependence structure is allowed among the data without specifying the probability model. In the second part of the thesis, I shall focus on the important topic of risk classification in InsurTech. Quite a number of well-known classifiers are widely adopted for risk classification problems in practical use, including Random Forest, Classification and Regression Tree (CART), Logistic Regression, Shallow Neural Network (NN) and Support Vector Machine (SVM). However, among these classifiers, the application of SVM, NN and Logistic Regression to insurance datasets would lead to a potential substantial loss of information as these datasets involve a lot of categorical variables most of the time, yet none of these classifiers could handle them comprehensively, if not completely discarding them; on the other hand, while CART handles categorical and discrete feature variables well enough by its design, it lacks the mechanism to deal with continuous feature variables. Moreover, the relatively strong dependence structures among feature variables, especially among a great number of categorical feature variables, in insurance practices have not been explicitly accounted for in the aforementioned existing classifiers. I here propose to effectively model such an implicit strong enough dependence by comonotonicity. Particularly, both organized treatments on all categorical, continuous and discrete feature variables and efficient modelling of the dependence structure among them will be dealt with through the newly proposed Comonotone-Independence Bayes classifier (CIBer), leading to a far better clustering of the predictive feature variables facilitating a superior classification of insurable risks. I shall also demonstrate the effectiveness of CIBer as a tool in data analytics against those common classifiers through empirical studies upon several representative datasets in insurance.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshInsurance - Statistical methods-
dc.subject.lcshInsurance - Data processing-
dc.subject.lcshBig data-
dc.titleData analytics in actuarial science-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044600194103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats