File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/j.jclepro.2019.118955
- Scopus: eid_2-s2.0-85074426112
- WOS: WOS:000503172600073
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Identification of high impact factors of air quality on a national scale using big data and machine learning techniques
Title | Identification of high impact factors of air quality on a national scale using big data and machine learning techniques |
---|---|
Authors | |
Keywords | National scale Variable importance XGBoost GIS Big data Air quality index |
Issue Date | 2020 |
Citation | Journal of Cleaner Production, 2020, v. 244, article no. 118955 How to Cite? |
Abstract | © 2019 Elsevier Ltd To effectively control and prevent air pollution, it is necessary to study the influential factors of air quality. A number of previous studies have explored the relationships between air pollution and related factors. However, the methods currently used either cannot well address the multicollinearity problem or fail to explain the importance of the influential factors. Moreover, most of the existing literature limited their studied area in a city or a small region and studied factors in one aspect. There is a lack of studies that analyze the influential factors from the perspective of a country or take into consideration multiple variables. To fill the research gap, this paper proposes a multivariate analysis in the national scale to investigate the most important factors of air quality. In order to study as much influential factors as possible, 171 features ranging from environmental, demographical, economic, meteorological, and energy, were collected and analyzed. To tackle such a “big data” problem, a non-linear machine learning algorithm namely Extreme Gradient Boosting (XGBoost) is utilized to model the relationship and measure the variable importance. Geographical Information System (GIS) is employed to preprocess the diversified variables and visualize the results. Performance of XGBoost is compared with other models and its parameters are tuned using Bayesian Optimization. Experimental results of a case study in the U.S. show that our methodology framework can effectively uncover the important factors of air quality. Six kinds of factors are found to have the largest impact on air quality. Practical suggestions are also proposed from the six aspects to control and prevent air pollution. |
Persistent Identifier | http://hdl.handle.net/10722/287006 |
ISSN | 2023 Impact Factor: 9.7 2023 SCImago Journal Rankings: 2.058 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ma, Jun | - |
dc.contributor.author | Ding, Yuexiong | - |
dc.contributor.author | Cheng, Jack C.P. | - |
dc.contributor.author | Jiang, Feifeng | - |
dc.contributor.author | Tan, Yi | - |
dc.contributor.author | Gan, Vincent J.L. | - |
dc.contributor.author | Wan, Zhiwei | - |
dc.date.accessioned | 2020-09-07T11:46:14Z | - |
dc.date.available | 2020-09-07T11:46:14Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Journal of Cleaner Production, 2020, v. 244, article no. 118955 | - |
dc.identifier.issn | 0959-6526 | - |
dc.identifier.uri | http://hdl.handle.net/10722/287006 | - |
dc.description.abstract | © 2019 Elsevier Ltd To effectively control and prevent air pollution, it is necessary to study the influential factors of air quality. A number of previous studies have explored the relationships between air pollution and related factors. However, the methods currently used either cannot well address the multicollinearity problem or fail to explain the importance of the influential factors. Moreover, most of the existing literature limited their studied area in a city or a small region and studied factors in one aspect. There is a lack of studies that analyze the influential factors from the perspective of a country or take into consideration multiple variables. To fill the research gap, this paper proposes a multivariate analysis in the national scale to investigate the most important factors of air quality. In order to study as much influential factors as possible, 171 features ranging from environmental, demographical, economic, meteorological, and energy, were collected and analyzed. To tackle such a “big data” problem, a non-linear machine learning algorithm namely Extreme Gradient Boosting (XGBoost) is utilized to model the relationship and measure the variable importance. Geographical Information System (GIS) is employed to preprocess the diversified variables and visualize the results. Performance of XGBoost is compared with other models and its parameters are tuned using Bayesian Optimization. Experimental results of a case study in the U.S. show that our methodology framework can effectively uncover the important factors of air quality. Six kinds of factors are found to have the largest impact on air quality. Practical suggestions are also proposed from the six aspects to control and prevent air pollution. | - |
dc.language | eng | - |
dc.relation.ispartof | Journal of Cleaner Production | - |
dc.subject | National scale | - |
dc.subject | Variable importance | - |
dc.subject | XGBoost | - |
dc.subject | GIS | - |
dc.subject | Big data | - |
dc.subject | Air quality index | - |
dc.title | Identification of high impact factors of air quality on a national scale using big data and machine learning techniques | - |
dc.type | Article | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1016/j.jclepro.2019.118955 | - |
dc.identifier.scopus | eid_2-s2.0-85074426112 | - |
dc.identifier.volume | 244 | - |
dc.identifier.spage | article no. 118955 | - |
dc.identifier.epage | article no. 118955 | - |
dc.identifier.isi | WOS:000503172600073 | - |
dc.identifier.issnl | 0959-6526 | - |