Predicting housing prices and analyzing real estate markets in the Chicago suburbs using machine learning

Kevin Xu

doi:10.47611/jsrhs.v11i3.3459

Authors

Kevin Xu Neuqua Valley High School

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3459

Keywords:

machine learning, data science, regression, real estate, modeling, support vector, random forest, decision tree, XGBoost, high school

PDF

Abstract

The pricing of housing properties is determined by a variety of factors. However, post-pandemic markets have experienced volatility in the Chicago suburb area, which have affected house prices greatly. In this study, analysis was done on the Naperville/Bolingbrook real estate market to predict property prices based on these housing attributes through machine learning models, and to evaluate the effectiveness of such models in a volatile market space. Gathering data from Redfin, a real estate website, sales data from 2018 up until the summer season of 2022 were collected for research. By analyzing these sales in this range of time, we can also look at the state of the housing market and identify trends in price. For modeling the data, the models used were linear regression, support vector regression, decision tree regression, random forest regression, and XGBoost regression. To analyze results, comparison was made on the MAE, RMSE, and R-squared values for each model. It was found that the XGBoost model performs the best in predicting house prices despite the additional volatility sponsored by post-pandemic conditions. After modeling, Shapley Values (SHAP) were used to evaluate the weights of the variables in constructing models. The code and data files can be found at https://github.com/ GeometricBison/HousePriceML.

Downloads

References or Bibliography

Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). https://dl.acm.org/doi/10.1145/2939672.2939785

Noble, W. S. (2006). What is a support vector machine?. Nature biotechnology, 24(12), 1565-1567. https://doi.org/10.1038/nbt1206-1565

Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of chemical information and computer sciences, 43(6), 1947-1958. https://doi.org/10.1021/ci034160g

Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106. https://doi.org/10.1007/BF00116251

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2021). Introduction to linear regression analysis. John Wiley & Sons.

Pedregosa, F., Varoquaux, Ga"el, Gramfort, A., Michel, V., Thirion, B., Grisel, O., … others. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825–2830. https://dl.acm.org/doi/10.5555/1953048.2078195

Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007. https://doi.org/10.5281/zenodo.3264781

Predicting housing prices and analyzing real estate markets in the Chicago suburbs using machine learning

Authors

DOI:

Keywords:

Abstract

Downloads

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE

Predicting housing prices and analyzing real estate markets in the Chicago suburbs using machine learning

Authors

DOI:

Keywords:

Abstract

Downloads

References or Bibliography

Published

How to Cite

Issue

Section

Announcements

Call for Papers: Volume 14 Issue 3

ARTICLESPUBLISHED

STUDENTAUTHORS

YEARSOF SERVICE

ARTICLES
PUBLISHED

STUDENT
AUTHORS

YEARS
OF SERVICE