An interpretable machine-learning approach for fine-grained income estimation in a developed context

Bai, Ruiqiao; 柏瑞乔

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: An interpretable machine-learning approach for fine-grained income estimation in a developed context

Title	An interpretable machine-learning approach for fine-grained income estimation in a developed context
Authors	Bai, Ruiqiao 柏瑞乔
Advisors	Advisor(s):Li, VOK Lam, JCK
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Bai, R. [柏瑞乔]. (2023). An interpretable machine-learning approach for fine-grained income estimation in a developed context. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Estimating income at a fine-grained geographical scale is critical and challenging, even in a developed context. Given greater welfare allocations, citizens in developed economies tend to spend more, making income level a good indicator of concurrent spending. In addition, as developed economies may create a higher potential for intra-city income inequality as compared to some low-income developing economies, higher data transparency and accurate fine-grained income data are important for facilitating evidence-based decision-making. Traditionally, income data of high spatial granularities are gathered via field surveys. However, due to higher information sensitivity, gathering accurate income data is more challenging when compared with the collection of other socio-economic variables. To address this challenge, different income estimation models have been developed. However, these models present weaknesses, such as high dependency on non-public datasets, issues of data privacy, and low estimation power, etc. Such model accuracy can be improved by domain-specific-oriented machine learning techniques and enhanced input feature combinations. Besides, injecting interpretability to fine-grained income estimation models can help us understand how models predict income and what predictors are having a stronger effect on the outcome. Chapter 3 develops the GP-Mixed-Siamese-like-Double-Ridge model, the Spatial-Information-GP model and the Mixed-Siamese-like model, making use of big data and machine learning techniques. Capitalising on Chapter 3, Chapter 4 develops a Socioeconomic & Spatial-Information-GP model, incorporating data obtained from both big data techniques and field surveys. Our model achieves outstanding income estimation accuracy and enables the income values estimated in fine-grained resolution to be coupled with other types of data on the same scale, to support future inequality-related studies and policy decision-making in developed contexts. To facilitate improved model interpretation, Chapter 4 investigates further the salient socio-economic features, such as education, sex and race, that influence fine-grained district-based income in New York City (a developed metropolis) via SHapley Additive exPlanations (SHAP) analysis. Their relative contributions to income estimation are discussed. Policy implications are made based on the results generated from the domain-specific machine learning model, including the need to address urban income inequality attributable to sex and race, while providing enhanced higher education opportunities to residents of lower-income districts to mend the district-based income divide, for improving urban sustainability. To highlight the importance of fine-grained income estimation in supporting sustainability-related studies in developed contexts, Chapter 5 reviews the health cost accounting of air pollution in China. It identifies the research gaps and significance for more accurate health cost accounting of air pollution in fine-grained resolution in China. This chapter rationalises the necessity of developing a fine-grained income estimation model in a developed context. In conclusion, this thesis demonstrates that interpretable machine learning methods can facilitate fine-grained district-based income estimation in a developed context, by means of integrating big data techniques with traditional field surveys, and by conducting SHAP analysis. Our income estimation results can inform fine-granularity income-related policy decisions, such as policies dealing with social and environmental inequality, and provide useful insights for future fine-grained urban sustainability-related studies.
Degree	Doctor of Philosophy
Subject	Machine learning Income distribution - Mathematical models
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/327818

DC Field	Value	Language
dc.contributor.advisor	Li, VOK	-
dc.contributor.advisor	Lam, JCK	-
dc.contributor.author	Bai, Ruiqiao	-
dc.contributor.author	柏瑞乔	-
dc.date.accessioned	2023-06-05T03:46:18Z	-
dc.date.available	2023-06-05T03:46:18Z	-
dc.date.issued	2023	-
dc.identifier.citation	Bai, R. [柏瑞乔]. (2023). An interpretable machine-learning approach for fine-grained income estimation in a developed context. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/327818	-
dc.description.abstract	Estimating income at a fine-grained geographical scale is critical and challenging, even in a developed context. Given greater welfare allocations, citizens in developed economies tend to spend more, making income level a good indicator of concurrent spending. In addition, as developed economies may create a higher potential for intra-city income inequality as compared to some low-income developing economies, higher data transparency and accurate fine-grained income data are important for facilitating evidence-based decision-making. Traditionally, income data of high spatial granularities are gathered via field surveys. However, due to higher information sensitivity, gathering accurate income data is more challenging when compared with the collection of other socio-economic variables. To address this challenge, different income estimation models have been developed. However, these models present weaknesses, such as high dependency on non-public datasets, issues of data privacy, and low estimation power, etc. Such model accuracy can be improved by domain-specific-oriented machine learning techniques and enhanced input feature combinations. Besides, injecting interpretability to fine-grained income estimation models can help us understand how models predict income and what predictors are having a stronger effect on the outcome. Chapter 3 develops the GP-Mixed-Siamese-like-Double-Ridge model, the Spatial-Information-GP model and the Mixed-Siamese-like model, making use of big data and machine learning techniques. Capitalising on Chapter 3, Chapter 4 develops a Socioeconomic & Spatial-Information-GP model, incorporating data obtained from both big data techniques and field surveys. Our model achieves outstanding income estimation accuracy and enables the income values estimated in fine-grained resolution to be coupled with other types of data on the same scale, to support future inequality-related studies and policy decision-making in developed contexts. To facilitate improved model interpretation, Chapter 4 investigates further the salient socio-economic features, such as education, sex and race, that influence fine-grained district-based income in New York City (a developed metropolis) via SHapley Additive exPlanations (SHAP) analysis. Their relative contributions to income estimation are discussed. Policy implications are made based on the results generated from the domain-specific machine learning model, including the need to address urban income inequality attributable to sex and race, while providing enhanced higher education opportunities to residents of lower-income districts to mend the district-based income divide, for improving urban sustainability. To highlight the importance of fine-grained income estimation in supporting sustainability-related studies in developed contexts, Chapter 5 reviews the health cost accounting of air pollution in China. It identifies the research gaps and significance for more accurate health cost accounting of air pollution in fine-grained resolution in China. This chapter rationalises the necessity of developing a fine-grained income estimation model in a developed context. In conclusion, this thesis demonstrates that interpretable machine learning methods can facilitate fine-grained district-based income estimation in a developed context, by means of integrating big data techniques with traditional field surveys, and by conducting SHAP analysis. Our income estimation results can inform fine-granularity income-related policy decisions, such as policies dealing with social and environmental inequality, and provide useful insights for future fine-grained urban sustainability-related studies.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Machine learning	-
dc.subject.lcsh	Income distribution - Mathematical models	-
dc.title	An interpretable machine-learning approach for fine-grained income estimation in a developed context	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044683803803414	-

File Download

Supplementary

postgraduate thesis: An interpretable machine-learning approach for fine-grained income estimation in a developed context

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats