File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data
Title | Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data |
---|---|
Authors | |
Issue Date | 2016 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhu, Y. [朱益萱]. (2016). Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801614. |
Abstract | Air pollution is a major problem in China, impacting billions of peoples’ health. Many cities have built on-the-ground monitoring stations to inform people the hourly concentration of major pollutants. However, these stations are geograph-ically sparse (e.g. only 36 monitoring stations in Beijing and 16 in Hong Kong), severely limiting evidence-based air quality decision-making, and leading to severe criticisms about the transparency and public relevance of the official Air Quality Index (AQI). Urban big data may fill this gap. By analyzing the causality among the spatio-temporal (ST) heterogeneous big data (e.g., air quality, meteorology and traffic, etc.), one can estimate fine-grained air pollution, forecast the future AQI, and identify the causal pathways of pollutants to inform public policy. In this thesis, three inter-related projects are investigated.
The first project targets to estimate the fine-grained air pollution at locations not covered by monitoring stations. We proposed a Granger-causality-based model to deal with two challenges. The first challenge is due to the data diversity, i.e., there are different categories of urban dynamic data and some may be useless and even detrimental for the estimation. To overcome this, we extend the Granger causality model to the ST space to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. We verify our model with datasets in Shenzhen and Hong Kong, and determine it is indeed not necessary to process "all" the data. Better precision and time efficiency can be achieved when transforming "big data" into "the most influential data".
The second project aims at forecasting future AQI with urban big data. We study the uncertainty caused by complex dependencies among urban data, and the over-fitting problems caused by training models. We compare the performance of the causality-based model with the well-used supervised learning models, such as linear regression, neural networks, as well as deep learning methods, and conclude that the causality-based model achieves relatively high and stable forecasting precision compared to other methodologies.
The third project tries to identify the ST causal pathways for air pollutants, to inform public policy. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air pollution data; (2) the air pollution and meteorological data are usually huge; and (3) the causal pathways are complex in nature because of the interactions of multiple pollutants and the influence of en-vironmental factors. To accurately identify ST causal pathways for air pollutants, we present p-Causality, a novel pattern-aided causality analysis approach which combines the strengths of pattern mining and statistical modeling. The results based on three years’ worth of urban data in China show that our approach outperforms existing methods in time efficiency, inference accuracy, and interpretability. |
Degree | Doctor of Philosophy |
Subject | Air - Pollution |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/246689 |
HKU Library Item ID | b5801614 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhu, Yixuan | - |
dc.contributor.author | 朱益萱 | - |
dc.date.accessioned | 2017-09-22T03:40:13Z | - |
dc.date.available | 2017-09-22T03:40:13Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | Zhu, Y. [朱益萱]. (2016). Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801614. | - |
dc.identifier.uri | http://hdl.handle.net/10722/246689 | - |
dc.description.abstract | Air pollution is a major problem in China, impacting billions of peoples’ health. Many cities have built on-the-ground monitoring stations to inform people the hourly concentration of major pollutants. However, these stations are geograph-ically sparse (e.g. only 36 monitoring stations in Beijing and 16 in Hong Kong), severely limiting evidence-based air quality decision-making, and leading to severe criticisms about the transparency and public relevance of the official Air Quality Index (AQI). Urban big data may fill this gap. By analyzing the causality among the spatio-temporal (ST) heterogeneous big data (e.g., air quality, meteorology and traffic, etc.), one can estimate fine-grained air pollution, forecast the future AQI, and identify the causal pathways of pollutants to inform public policy. In this thesis, three inter-related projects are investigated. The first project targets to estimate the fine-grained air pollution at locations not covered by monitoring stations. We proposed a Granger-causality-based model to deal with two challenges. The first challenge is due to the data diversity, i.e., there are different categories of urban dynamic data and some may be useless and even detrimental for the estimation. To overcome this, we extend the Granger causality model to the ST space to analyze all the causalities among urban dynamics in a consistent manner. Then by implementing non-causality test, we rule out the urban dynamics that do not “Granger” cause air pollution. The second challenge is due to the time complexity when processing the massive volume of data. We propose to discover the region of influence (ROI) by selecting data with the highest causality levels spatially and temporally. We verify our model with datasets in Shenzhen and Hong Kong, and determine it is indeed not necessary to process "all" the data. Better precision and time efficiency can be achieved when transforming "big data" into "the most influential data". The second project aims at forecasting future AQI with urban big data. We study the uncertainty caused by complex dependencies among urban data, and the over-fitting problems caused by training models. We compare the performance of the causality-based model with the well-used supervised learning models, such as linear regression, neural networks, as well as deep learning methods, and conclude that the causality-based model achieves relatively high and stable forecasting precision compared to other methodologies. The third project tries to identify the ST causal pathways for air pollutants, to inform public policy. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air pollution data; (2) the air pollution and meteorological data are usually huge; and (3) the causal pathways are complex in nature because of the interactions of multiple pollutants and the influence of en-vironmental factors. To accurately identify ST causal pathways for air pollutants, we present p-Causality, a novel pattern-aided causality analysis approach which combines the strengths of pattern mining and statistical modeling. The results based on three years’ worth of urban data in China show that our approach outperforms existing methods in time efficiency, inference accuracy, and interpretability. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.subject.lcsh | Air - Pollution | - |
dc.title | Air pollution monitoring, forecasting, and causal pathway identification with spatio-temporal (ST) urban big data | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5801614 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5801614 | - |
dc.identifier.mmsid | 991043959797603414 | - |