File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: An interpretable machine-learning approach for fine-grained income estimation in a developed context
Title | An interpretable machine-learning approach for fine-grained income estimation in a developed context |
---|---|
Authors | |
Advisors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Bai, R. [柏瑞乔]. (2023). An interpretable machine-learning approach for fine-grained income estimation in a developed context. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Estimating income at a fine-grained geographical scale is critical and challenging, even in a developed context. Given greater welfare allocations, citizens in developed economies tend to spend more, making income level a good indicator of concurrent spending. In addition, as developed economies may create a higher potential for intra-city income inequality as compared to some low-income developing economies, higher data transparency and accurate fine-grained income data are important for facilitating evidence-based decision-making. Traditionally, income data of high spatial granularities are gathered via field surveys. However, due to higher information sensitivity, gathering accurate income data is more challenging when compared with the collection of other socio-economic variables. To address this challenge, different income estimation models have been developed. However, these models present weaknesses, such as high dependency on non-public datasets, issues of data privacy, and low estimation power, etc. Such model accuracy can be improved by domain-specific-oriented machine learning techniques and enhanced input feature combinations. Besides, injecting interpretability to fine-grained income estimation models can help us understand how models predict income and what predictors are having a stronger effect on the outcome.
Chapter 3 develops the GP-Mixed-Siamese-like-Double-Ridge model, the Spatial-Information-GP model and the Mixed-Siamese-like model, making use of big data and machine learning techniques.
Capitalising on Chapter 3, Chapter 4 develops a Socioeconomic & Spatial-Information-GP model, incorporating data obtained from both big data techniques and field surveys. Our model achieves outstanding income estimation accuracy and enables the income values estimated in fine-grained resolution to be coupled with other types of data on the same scale, to support future inequality-related studies and policy decision-making in developed contexts.
To facilitate improved model interpretation, Chapter 4 investigates further the salient socio-economic features, such as education, sex and race, that influence fine-grained district-based income in New York City (a developed metropolis) via SHapley Additive exPlanations (SHAP) analysis. Their relative contributions to income estimation are discussed. Policy implications are made based on the results generated from the domain-specific machine learning model, including the need to address urban income inequality attributable to sex and race, while providing enhanced higher education opportunities to residents of lower-income districts to mend the district-based income divide, for improving urban sustainability.
To highlight the importance of fine-grained income estimation in supporting sustainability-related studies in developed contexts, Chapter 5 reviews the health cost accounting of air pollution in China. It identifies the research gaps and significance for more accurate health cost accounting of air pollution in fine-grained resolution in China. This chapter rationalises the necessity of developing a fine-grained income estimation model in a developed context.
In conclusion, this thesis demonstrates that interpretable machine learning methods can facilitate fine-grained district-based income estimation in a developed context, by means of integrating big data techniques with traditional field surveys, and by conducting SHAP analysis. Our income estimation results can inform fine-granularity income-related policy decisions, such as policies dealing with social and environmental inequality, and provide useful insights for future fine-grained urban sustainability-related studies. |
Degree | Doctor of Philosophy |
Subject | Machine learning Income distribution - Mathematical models |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/327818 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Li, VOK | - |
dc.contributor.advisor | Lam, JCK | - |
dc.contributor.author | Bai, Ruiqiao | - |
dc.contributor.author | 柏瑞乔 | - |
dc.date.accessioned | 2023-06-05T03:46:18Z | - |
dc.date.available | 2023-06-05T03:46:18Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Bai, R. [柏瑞乔]. (2023). An interpretable machine-learning approach for fine-grained income estimation in a developed context. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/327818 | - |
dc.description.abstract | Estimating income at a fine-grained geographical scale is critical and challenging, even in a developed context. Given greater welfare allocations, citizens in developed economies tend to spend more, making income level a good indicator of concurrent spending. In addition, as developed economies may create a higher potential for intra-city income inequality as compared to some low-income developing economies, higher data transparency and accurate fine-grained income data are important for facilitating evidence-based decision-making. Traditionally, income data of high spatial granularities are gathered via field surveys. However, due to higher information sensitivity, gathering accurate income data is more challenging when compared with the collection of other socio-economic variables. To address this challenge, different income estimation models have been developed. However, these models present weaknesses, such as high dependency on non-public datasets, issues of data privacy, and low estimation power, etc. Such model accuracy can be improved by domain-specific-oriented machine learning techniques and enhanced input feature combinations. Besides, injecting interpretability to fine-grained income estimation models can help us understand how models predict income and what predictors are having a stronger effect on the outcome. Chapter 3 develops the GP-Mixed-Siamese-like-Double-Ridge model, the Spatial-Information-GP model and the Mixed-Siamese-like model, making use of big data and machine learning techniques. Capitalising on Chapter 3, Chapter 4 develops a Socioeconomic & Spatial-Information-GP model, incorporating data obtained from both big data techniques and field surveys. Our model achieves outstanding income estimation accuracy and enables the income values estimated in fine-grained resolution to be coupled with other types of data on the same scale, to support future inequality-related studies and policy decision-making in developed contexts. To facilitate improved model interpretation, Chapter 4 investigates further the salient socio-economic features, such as education, sex and race, that influence fine-grained district-based income in New York City (a developed metropolis) via SHapley Additive exPlanations (SHAP) analysis. Their relative contributions to income estimation are discussed. Policy implications are made based on the results generated from the domain-specific machine learning model, including the need to address urban income inequality attributable to sex and race, while providing enhanced higher education opportunities to residents of lower-income districts to mend the district-based income divide, for improving urban sustainability. To highlight the importance of fine-grained income estimation in supporting sustainability-related studies in developed contexts, Chapter 5 reviews the health cost accounting of air pollution in China. It identifies the research gaps and significance for more accurate health cost accounting of air pollution in fine-grained resolution in China. This chapter rationalises the necessity of developing a fine-grained income estimation model in a developed context. In conclusion, this thesis demonstrates that interpretable machine learning methods can facilitate fine-grained district-based income estimation in a developed context, by means of integrating big data techniques with traditional field surveys, and by conducting SHAP analysis. Our income estimation results can inform fine-granularity income-related policy decisions, such as policies dealing with social and environmental inequality, and provide useful insights for future fine-grained urban sustainability-related studies. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Machine learning | - |
dc.subject.lcsh | Income distribution - Mathematical models | - |
dc.title | An interpretable machine-learning approach for fine-grained income estimation in a developed context | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044683803803414 | - |