Automated College Application Essay Grading
DOI:
https://doi.org/10.47611/jsrhs.v13i1.6041Keywords:
machine learning, natural language processing, Automated Essay Grading, Long Short Term Memory, Logistic Regression, Random Forest Classifier, college application essaysAbstract
In recent years, computer programs used to score essays have been explored extensively, with many different approaches being developed. Most of these approaches use Natural Language Processing (NLP) techniques (Ade-Ibijola et al., 2012), a field of machine learning often used to analyze and understand text. These approaches fall under the name of Automated Essay Scoring (AES), which typically assesses essay quality with a single score (Ke and Ng, 2019). This paper proposes a natural language processing (NLP) model which predicts the quality of a college application essay, which is proximally measured through a college’s acceptance rate. Key essay factors include the number of grammar mistakes, sophistication of writing, repetition, and the text of the essay. Multiple different models were tested. A Random Forest Classifier relying solely on grammar, sophistication of writing, and repetition metrics achieved the best performance, yielding an accuracy of 89.7%. The second-best model was a combination of an LSTM and a logistic regression model. Other models significantly underperformed, yielding accuracies in the range of 40%-60%. Ultimately, our model may help a number of students going through the college application process to understand where their essay may stand compared to other students.
Downloads
References or Bibliography
Burchfiel, A. (2022, May 16). What is NLP (Natural Language Processing) Tokenization? - tokenex. TokenEx. https://www.tokenex.com/blog/ab-what-is-nlp-natural-language-processing-tokenization/
Cairns, H. (2022, December 22). 4 Things Admissions Officers Don't Like In Your College Essays. College Raptor. https://www.collegeraptor.com/getting-in/articles/college-applications/4-things-in-college-app-essays-that-admissions-officers-dont-like/
Chugh, A. (2023, May 31). Deep Learning | Introduction to Long Short Term Memory. GeeksforGeeks. https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/
DiMAscio, C. (n.d.). py-readability-metrics · PyPI. PyPI. https://pypi.org/project/py-readability-metrics/
Fanning, P. (2012, April 7). 24. Good & Bad Repetition. guinlist. https://guinlist.wordpress.com/2012/04/07/24-good-bad-repetition/
Gina. (n.d.). How Many Words Should Be in a Sentence? LanguageTool. Retrieved September 27, 2023, from https://languagetool.org/insights/post/sentence-length/
Jaschik, S. (2019, March 17). A look at the many legal ways wealthy applicants have an edge in admissions. Inside Higher Ed. https://www.insidehighered.com/admissions/article/2019/03/18/look-many-legal-ways-wealthy-applicants-have-edge-admissions
Ke, Z., & Ng, V. (2019, July). Automated Essay Scoring: A Survey of the State of the Art. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 6300-6308. 10.24963/ijcai.2019/879
Kirasich, K., Smith, T., & Sadler, B. (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. SMU Data Science Review, 1(3). https://scholar.smu.edu/datasciencereview/vol1/iss3/9
Lee, J., Thymes, B., Zhou, J., Joachims, T., & Zizilcec, R. F. (2023, June 30). Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters. https://doi.org/10.48550/arXiv.2306.17575
Liddy, E. D. (2001). Natural Language Processing. In Encyclopedia of Library and Information Science (2nd ed.). N.Y. Marcel Decker, Inc.
Munson, N. (2021, December 13). 9 basic elements of a complete college application. Marymount University. https://marymount.edu/blog/9-basic-elements-of-a-complete-college-application/
Nadkarni, P. M., Machado, L. O., & Chapman, W. W. (2011, Sep-Oct). Natural language processing: an introduction. J Am Med Inform Assoc, 18(5), 544-51. 0.1136/amiajnl-2011-000464
Nietzel, M. T. (2023, March 30). College Applications Are Up Dramatically In 2023. Forbes. https://www.forbes.com/sites/michaeltnietzel/2023/03/30/college-applications-are-up-dramatically-in-2023/
Notorc. (2006, November 2). Clear Writing: How to Achieve and Measure Readability. Postscripts. http://notorc.blogspot.com/2006/09/devils-in-details-measuring.html
OluAde-Ibijola, A., Wakama, I., & Amadi, J. (2012). An Expert System for Automated Essay Scoring (AES) in Computing using Shallow NLP Techniques for Inferencing. International Journal of Computer Applications, 51, 37-45. 10.5120/8080-1480
Otter, D. W., Medina, J. R., & Kalita, J. K. (2019, December 21). A Survey of the Usages of Deep Learning of Natural Language Processing. IEEE Transactions on Neural Networks and Learning Systems, 32(2), 22. https://doi.org/10.48550/arXiv.1807.10854
Page, E. B., & Paulus, D. H. (1968, April). The Analysis of Essays by Computer. Final Report. ERIC. https://eric.ed.gov/?id=ED028633
Ramesh, D., & Sanampudi, S. K. (2021, September 23). An automated essay scoring systems a systematic literature review. Artif Intell Rev, 55(3), 2495-2527. 10.1007/s10462-021-10068-2
SaturnCloud. (2023, July 10). Why Is Logistic Regression Not Working But Decision Tree Is? Saturn Cloud. https://saturncloud.io/blog/why-is-logistic-regression-not-working-but-decision-tree-is/
Scott, B. (2023, August 1). The Gunning's Fog Index (or FOG) Readability Formula – ReadabilityFormulas.com. Readability Formulas. Retrieved September 27, 2023, from https://readabilityformulas.com/the-gunnings-fog-index-or-fog-readability-formula/
Selingo, J., & Mull, A. (2022, March 23). The College-Admissions Process Is Completely Broken. The Atlantic. https://www.theatlantic.com/ideas/archive/2022/03/change-college-acceptance-application-process/627581/
Sorensen, T. (2022, April 18). What Parents of College Applicants Need to Know. USNews.com. https://www.usnews.com/education/blogs/college-admissions-playbook/articles/changes-that-parents-of-college-applicants-need-to-know
TensorFlow. (2023, May 27). Word embeddings | Text. TensorFlow. https://www.tensorflow.org/text/guide/word_embeddings
Turin Tech. (2021, October 4). Machine Learning vs Statistical Modelling: which one is right for your business problem? – TurinTech AI. TurinTech AI. Retrieved October 1, 2023, from https://www.turintech.ai/2021/10/04/machine-learning-vs-statistical-modelling-which-one-is-right-for-your-business-problem/
Published
How to Cite
Issue
Section
Copyright (c) 2024 Rinah Zhang; Stepan Malkov, Peter Mbua
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.