Predicting the Danger of Particulate Matter Pollution from Wildfires Using Classification Models
DOI:
https://doi.org/10.47611/jsrhs.v11i3.3272Keywords:
Wildfires, PM2.5, GradientBoostingClassifierAbstract
Due to a mix of climate change and California’s mega-drought, California’s wildfire seasons have overall gotten progressively longer, more destructive, and more expensive. In 2020 alone, around 9,900 wildfires burned about 4.3 million acres, costing the state over $12 billion. (Kerlin, 2022) Larger and more numerous wildfires pollute billions of harmful particles into the atmosphere, including PM2.5. This study aims to use features of a wildfire and other factors to predict whether a wildfire pollutes enough PM2.5 particles to be detrimental to human health. The 8 features used in the model are the acres burned, the length in days of the fire, available green space within a 15-mile radius of the fire, the highest population density within a 15-mile radius of the fire, electricity usage, median income, temperature, and precipitation. A Gradient Boosting Classifier (GBC) was applied to the dataset to predict whether a wildfire’s emissions necessitated an evacuation. The GBC results achieved a high accuracy of 0.931 as well as a great Area Under the Curve (AUC) of 0.911. By far the most important feature in the GBC is Length, with a feature importance score of 0.109 +/- 0.009.
Downloads
References or Bibliography
Air Quality Data (PST) query tool. California Environmental Protection Agency Air Resources Board. (n.d.). Retrieved February 28, 2022, from https://www.arb.ca.gov/aqmis2/aqdselect.php
Ares. (2020, February 9). California wildfires (2013-2020). Kaggle. Retrieved February 28, 2022, from https://www.kaggle.com/ananthu017/california-wildfire-incidents-20132020
Bhandari, A. (2020, June 16). AUC-Roc Curve in machine learning clearly explained. Analytics Vidhya. (2022, June 14). Retrieved June 28, 2022, from https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/#:~:text=The%20Area%20Under%20the%20Curve,the%20positive%20and%20negative%20classes
Brownlee, J. (2020, August 14). Linear regression for machine learning. Machine Learning Mastery. Retrieved March 4, 2022, from https://machinelearningmastery.com/linear-regression-for-machine-learning/#:~:text=Linear%20regression%20is%20a%20linear,the%20input%20variables%20(x).
Chong, J. (2021, August 29). Battle of the ensemble - random forest vs gradient boosting. Medium. Retrieved March 3, 2022, from https://towardsdatascience.com/battle-of-the-ensemble-random-forest-vs-gradient-boosting-6fbfed14cb7
Environmental Protection Agency. (n.d.). EPA. Retrieved June 28, 2022, from https://www.epa.gov/pm-pollution/particulate-matter-pm-basics
Environmental Protection Agency. (n.d.). What is particulate matter? | urban environmental program in New England. EPA. Retrieved June 28, 2022, from https://www3.epa.gov/region1/eco/uep/particulatematter.html#:~:text=%22Particulate%20matter%2C%22%20also%20known,and%20soil%20or%20dust%20particles
Google. (n.d.). Classification: Roc curve and AUC | machine learning crash course | google developers. Google. Retrieved June 28, 2022, from https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
Kambhampati, D. D. (2020, October 5). California- electricity consumption by county. Kaggle. Retrieved March 2, 2022, from https://www.kaggle.com/devkambhampati/california-electricity-consumption-by-county/version/1
Kerlin, K. E. (2022, May 4). California's 2020 wildfire season. UC Davis. Retrieved June 25, 2022, from https://www.ucdavis.edu/climate/news/californias-2020-wildfire-season-numbers
Malik, A., Rao, M. R., Puppala, N., Koouri, P., Thota, V. A. K., Liu, Q., Chiao, S., & Gao, J. (2021, January 13). Data-driven wildfire risk prediction in Northern California. MDPI. Retrieved February 27, 2022, from https://www.mdpi.com/2073-4433/12/1/109/htm
Median family income, by family type (regions of 10,000 residents or more). Kidsdata.org. (n.d.). Retrieved March 2, 2022, from https://www.kidsdata.org/topic/545/income-family-type-10k/table
Miller, R. (2020, October 29). Climate change is central to California's wildfires. Scientific American. Retrieved June 28, 2022, from https://www.scientificamerican.com/article/climate-change-is-central-to-californias-wildfires/
Narkhede, S. (2021, June 15). Understanding AUC - roc curve. Medium. Retrieved July 23, 2022, from https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
National Geographic Society. (2019, July 15). Wildfires. National Geographic Society. Retrieved March 2, 2022, from https://www.nationalgeographic.org/encyclopedia/wildfires/
nikki2398@nikki2398. (2020, September 2). ML - gradient boosting. GeeksforGeeks. Retrieved June 28, 2022, from https://www.geeksforgeeks.org/ml-gradient-boosting/
Sklearn.ensemble.gradientboostingclassifier. scikit. (n.d.). Retrieved June 28, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
Sklearn.ensemble.gradientboostingregressor. scikit. (n.d.). Retrieved February 27, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
Sklearn.ensemble.randomforestregressor. scikit. (n.d.). Retrieved February 27, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
Sklearn.ensemble.gradientboostingclassifier. scikit. (n.d.). Retrieved June 28, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
United States Cities Database. simplemaps. (n.d.). Retrieved March 2, 2022, from https://simplemaps.com/data/us-cities
Urban Forest Data for California. State Urban Forest Data: California. (n.d.). Retrieved February 28, 2022, from https://www.nrs.fs.fed.us/data/urban/state/?state=CA
Wikimedia Foundation. (2022, June 20). Random Forest. Wikipedia. Retrieved June 28, 2022, from https://en.wikipedia.org/wiki/Random_forest
Published
How to Cite
Issue
Section
Copyright (c) 2022 William Lai
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.