Predicting the Danger of Particulate Matter Pollution from Wildfires Using Classification Models

Authors

  • William Lai The Bishop's School

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3272

Keywords:

Wildfires, PM2.5, GradientBoostingClassifier

Abstract

               Due to a mix of climate change and California’s mega-drought, California’s wildfire seasons have overall gotten progressively longer, more destructive, and more expensive. In 2020 alone, around 9,900 wildfires burned about 4.3 million acres, costing the state over $12 billion. (Kerlin, 2022) Larger and more numerous wildfires pollute billions of harmful particles into the atmosphere, including PM2.5. This study aims to use features of a wildfire and other factors to predict whether a wildfire pollutes enough PM2.5 particles to be detrimental to human health. The 8 features used in the model are the acres burned, the length in days of the fire, available green space within a 15-mile radius of the fire, the highest population density within a 15-mile radius of the fire, electricity usage, median income, temperature, and precipitation. A Gradient Boosting Classifier (GBC) was applied to the dataset to predict whether a wildfire’s emissions necessitated an evacuation. The GBC results achieved a high accuracy of 0.931 as well as a great Area Under the Curve (AUC) of 0.911. By far the most important feature in the GBC is Length, with a feature importance score of 0.109 +/- 0.009.

Downloads

Download data is not yet available.

Author Biography

William Lai, The Bishop's School

San Deigo, CA

References or Bibliography

Air Quality Data (PST) query tool. California Environmental Protection Agency Air Resources Board. (n.d.). Retrieved February 28, 2022, from https://www.arb.ca.gov/aqmis2/aqdselect.php

Ares. (2020, February 9). California wildfires (2013-2020). Kaggle. Retrieved February 28, 2022, from https://www.kaggle.com/ananthu017/california-wildfire-incidents-20132020

Bhandari, A. (2020, June 16). AUC-Roc Curve in machine learning clearly explained. Analytics Vidhya. (2022, June 14). Retrieved June 28, 2022, from https://www.analyticsvidhya.com/blog/2020/06/auc-roc-curve-machine-learning/#:~:text=The%20Area%20Under%20the%20Curve,the%20positive%20and%20negative%20classes

Brownlee, J. (2020, August 14). Linear regression for machine learning. Machine Learning Mastery. Retrieved March 4, 2022, from https://machinelearningmastery.com/linear-regression-for-machine-learning/#:~:text=Linear%20regression%20is%20a%20linear,the%20input%20variables%20(x).

Chong, J. (2021, August 29). Battle of the ensemble - random forest vs gradient boosting. Medium. Retrieved March 3, 2022, from https://towardsdatascience.com/battle-of-the-ensemble-random-forest-vs-gradient-boosting-6fbfed14cb7

Environmental Protection Agency. (n.d.). EPA. Retrieved June 28, 2022, from https://www.epa.gov/pm-pollution/particulate-matter-pm-basics

Environmental Protection Agency. (n.d.). What is particulate matter? | urban environmental program in New England. EPA. Retrieved June 28, 2022, from https://www3.epa.gov/region1/eco/uep/particulatematter.html#:~:text=%22Particulate%20matter%2C%22%20also%20known,and%20soil%20or%20dust%20particles

Google. (n.d.). Classification: Roc curve and AUC | machine learning crash course | google developers. Google. Retrieved June 28, 2022, from https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

Kambhampati, D. D. (2020, October 5). California- electricity consumption by county. Kaggle. Retrieved March 2, 2022, from https://www.kaggle.com/devkambhampati/california-electricity-consumption-by-county/version/1

Kerlin, K. E. (2022, May 4). California's 2020 wildfire season. UC Davis. Retrieved June 25, 2022, from https://www.ucdavis.edu/climate/news/californias-2020-wildfire-season-numbers

Malik, A., Rao, M. R., Puppala, N., Koouri, P., Thota, V. A. K., Liu, Q., Chiao, S., & Gao, J. (2021, January 13). Data-driven wildfire risk prediction in Northern California. MDPI. Retrieved February 27, 2022, from https://www.mdpi.com/2073-4433/12/1/109/htm

Median family income, by family type (regions of 10,000 residents or more). Kidsdata.org. (n.d.). Retrieved March 2, 2022, from https://www.kidsdata.org/topic/545/income-family-type-10k/table

Miller, R. (2020, October 29). Climate change is central to California's wildfires. Scientific American. Retrieved June 28, 2022, from https://www.scientificamerican.com/article/climate-change-is-central-to-californias-wildfires/

Narkhede, S. (2021, June 15). Understanding AUC - roc curve. Medium. Retrieved July 23, 2022, from https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

National Geographic Society. (2019, July 15). Wildfires. National Geographic Society. Retrieved March 2, 2022, from https://www.nationalgeographic.org/encyclopedia/wildfires/

nikki2398@nikki2398. (2020, September 2). ML - gradient boosting. GeeksforGeeks. Retrieved June 28, 2022, from https://www.geeksforgeeks.org/ml-gradient-boosting/

Sklearn.ensemble.gradientboostingclassifier. scikit. (n.d.). Retrieved June 28, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

Sklearn.ensemble.gradientboostingregressor. scikit. (n.d.). Retrieved February 27, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

Sklearn.ensemble.randomforestregressor. scikit. (n.d.). Retrieved February 27, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

Sklearn.ensemble.gradientboostingclassifier. scikit. (n.d.). Retrieved June 28, 2022, from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

United States Cities Database. simplemaps. (n.d.). Retrieved March 2, 2022, from https://simplemaps.com/data/us-cities

Urban Forest Data for California. State Urban Forest Data: California. (n.d.). Retrieved February 28, 2022, from https://www.nrs.fs.fed.us/data/urban/state/?state=CA

Wikimedia Foundation. (2022, June 20). Random Forest. Wikipedia. Retrieved June 28, 2022, from https://en.wikipedia.org/wiki/Random_forest

Published

08-31-2022

How to Cite

Lai, W. (2022). Predicting the Danger of Particulate Matter Pollution from Wildfires Using Classification Models. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3272

Issue

Section

HS Research Projects