File Download
Supplementary

postgraduate thesis: An interpretable machine-learning approach for fine-grained income estimation in a developed context

TitleAn interpretable machine-learning approach for fine-grained income estimation in a developed context
Authors
Advisors
Advisor(s):Li, VOKLam, JCK
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Bai, R. [柏瑞乔]. (2023). An interpretable machine-learning approach for fine-grained income estimation in a developed context. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractEstimating income at a fine-grained geographical scale is critical and challenging, even in a developed context. Given greater welfare allocations, citizens in developed economies tend to spend more, making income level a good indicator of concurrent spending. In addition, as developed economies may create a higher potential for intra-city income inequality as compared to some low-income developing economies, higher data transparency and accurate fine-grained income data are important for facilitating evidence-based decision-making. Traditionally, income data of high spatial granularities are gathered via field surveys. However, due to higher information sensitivity, gathering accurate income data is more challenging when compared with the collection of other socio-economic variables. To address this challenge, different income estimation models have been developed. However, these models present weaknesses, such as high dependency on non-public datasets, issues of data privacy, and low estimation power, etc. Such model accuracy can be improved by domain-specific-oriented machine learning techniques and enhanced input feature combinations. Besides, injecting interpretability to fine-grained income estimation models can help us understand how models predict income and what predictors are having a stronger effect on the outcome. Chapter 3 develops the GP-Mixed-Siamese-like-Double-Ridge model, the Spatial-Information-GP model and the Mixed-Siamese-like model, making use of big data and machine learning techniques. Capitalising on Chapter 3, Chapter 4 develops a Socioeconomic & Spatial-Information-GP model, incorporating data obtained from both big data techniques and field surveys. Our model achieves outstanding income estimation accuracy and enables the income values estimated in fine-grained resolution to be coupled with other types of data on the same scale, to support future inequality-related studies and policy decision-making in developed contexts. To facilitate improved model interpretation, Chapter 4 investigates further the salient socio-economic features, such as education, sex and race, that influence fine-grained district-based income in New York City (a developed metropolis) via SHapley Additive exPlanations (SHAP) analysis. Their relative contributions to income estimation are discussed. Policy implications are made based on the results generated from the domain-specific machine learning model, including the need to address urban income inequality attributable to sex and race, while providing enhanced higher education opportunities to residents of lower-income districts to mend the district-based income divide, for improving urban sustainability. To highlight the importance of fine-grained income estimation in supporting sustainability-related studies in developed contexts, Chapter 5 reviews the health cost accounting of air pollution in China. It identifies the research gaps and significance for more accurate health cost accounting of air pollution in fine-grained resolution in China. This chapter rationalises the necessity of developing a fine-grained income estimation model in a developed context. In conclusion, this thesis demonstrates that interpretable machine learning methods can facilitate fine-grained district-based income estimation in a developed context, by means of integrating big data techniques with traditional field surveys, and by conducting SHAP analysis. Our income estimation results can inform fine-granularity income-related policy decisions, such as policies dealing with social and environmental inequality, and provide useful insights for future fine-grained urban sustainability-related studies.
DegreeDoctor of Philosophy
SubjectMachine learning
Income distribution - Mathematical models
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/327818

 

DC FieldValueLanguage
dc.contributor.advisorLi, VOK-
dc.contributor.advisorLam, JCK-
dc.contributor.authorBai, Ruiqiao-
dc.contributor.author柏瑞乔-
dc.date.accessioned2023-06-05T03:46:18Z-
dc.date.available2023-06-05T03:46:18Z-
dc.date.issued2023-
dc.identifier.citationBai, R. [柏瑞乔]. (2023). An interpretable machine-learning approach for fine-grained income estimation in a developed context. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/327818-
dc.description.abstractEstimating income at a fine-grained geographical scale is critical and challenging, even in a developed context. Given greater welfare allocations, citizens in developed economies tend to spend more, making income level a good indicator of concurrent spending. In addition, as developed economies may create a higher potential for intra-city income inequality as compared to some low-income developing economies, higher data transparency and accurate fine-grained income data are important for facilitating evidence-based decision-making. Traditionally, income data of high spatial granularities are gathered via field surveys. However, due to higher information sensitivity, gathering accurate income data is more challenging when compared with the collection of other socio-economic variables. To address this challenge, different income estimation models have been developed. However, these models present weaknesses, such as high dependency on non-public datasets, issues of data privacy, and low estimation power, etc. Such model accuracy can be improved by domain-specific-oriented machine learning techniques and enhanced input feature combinations. Besides, injecting interpretability to fine-grained income estimation models can help us understand how models predict income and what predictors are having a stronger effect on the outcome. Chapter 3 develops the GP-Mixed-Siamese-like-Double-Ridge model, the Spatial-Information-GP model and the Mixed-Siamese-like model, making use of big data and machine learning techniques. Capitalising on Chapter 3, Chapter 4 develops a Socioeconomic & Spatial-Information-GP model, incorporating data obtained from both big data techniques and field surveys. Our model achieves outstanding income estimation accuracy and enables the income values estimated in fine-grained resolution to be coupled with other types of data on the same scale, to support future inequality-related studies and policy decision-making in developed contexts. To facilitate improved model interpretation, Chapter 4 investigates further the salient socio-economic features, such as education, sex and race, that influence fine-grained district-based income in New York City (a developed metropolis) via SHapley Additive exPlanations (SHAP) analysis. Their relative contributions to income estimation are discussed. Policy implications are made based on the results generated from the domain-specific machine learning model, including the need to address urban income inequality attributable to sex and race, while providing enhanced higher education opportunities to residents of lower-income districts to mend the district-based income divide, for improving urban sustainability. To highlight the importance of fine-grained income estimation in supporting sustainability-related studies in developed contexts, Chapter 5 reviews the health cost accounting of air pollution in China. It identifies the research gaps and significance for more accurate health cost accounting of air pollution in fine-grained resolution in China. This chapter rationalises the necessity of developing a fine-grained income estimation model in a developed context. In conclusion, this thesis demonstrates that interpretable machine learning methods can facilitate fine-grained district-based income estimation in a developed context, by means of integrating big data techniques with traditional field surveys, and by conducting SHAP analysis. Our income estimation results can inform fine-granularity income-related policy decisions, such as policies dealing with social and environmental inequality, and provide useful insights for future fine-grained urban sustainability-related studies.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMachine learning-
dc.subject.lcshIncome distribution - Mathematical models-
dc.titleAn interpretable machine-learning approach for fine-grained income estimation in a developed context-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044683803803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats