Automated College Application Essay Grading

Authors

  • Rinah Zhang Holmdel High School
  • Stepan Malkov University of California, Los Angeles
  • Peter Mbua University of Florida

DOI:

https://doi.org/10.47611/jsrhs.v13i1.6041

Keywords:

machine learning, natural language processing, Automated Essay Grading, Long Short Term Memory, Logistic Regression, Random Forest Classifier, college application essays

Abstract

In recent years, computer programs used to score essays have been explored extensively, with many different approaches being developed. Most of these approaches use Natural Language Processing (NLP) techniques (Ade-Ibijola et al., 2012), a field of machine learning often used to analyze and understand text. These approaches fall under the name of Automated Essay Scoring (AES), which typically assesses essay quality with a single score (Ke and Ng, 2019). This paper proposes a natural language processing (NLP) model which predicts the quality of a college application essay, which is proximally measured through a college’s acceptance rate. Key essay factors include the number of grammar mistakes, sophistication of writing, repetition, and the text of the essay. Multiple different models were tested. A Random Forest Classifier relying solely on grammar, sophistication of writing, and repetition metrics achieved the best performance, yielding an accuracy of 89.7%. The second-best model was a combination of an LSTM and a logistic regression model. Other models significantly underperformed, yielding accuracies in the range of 40%-60%. Ultimately, our model may help a number of students going through the college application process to understand where their essay may stand compared to other students.  

Downloads

Download data is not yet available.

References or Bibliography

Burchfiel, A. (2022, May 16). What is NLP (Natural Language Processing) Tokenization? - tokenex. TokenEx. https://www.tokenex.com/blog/ab-what-is-nlp-natural-language-processing-tokenization/

Cairns, H. (2022, December 22). 4 Things Admissions Officers Don't Like In Your College Essays. College Raptor. https://www.collegeraptor.com/getting-in/articles/college-applications/4-things-in-college-app-essays-that-admissions-officers-dont-like/

Chugh, A. (2023, May 31). Deep Learning | Introduction to Long Short Term Memory. GeeksforGeeks. https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/

DiMAscio, C. (n.d.). py-readability-metrics · PyPI. PyPI. https://pypi.org/project/py-readability-metrics/

Fanning, P. (2012, April 7). 24. Good & Bad Repetition. guinlist. https://guinlist.wordpress.com/2012/04/07/24-good-bad-repetition/

Gina. (n.d.). How Many Words Should Be in a Sentence? LanguageTool. Retrieved September 27, 2023, from https://languagetool.org/insights/post/sentence-length/

Jaschik, S. (2019, March 17). A look at the many legal ways wealthy applicants have an edge in admissions. Inside Higher Ed. https://www.insidehighered.com/admissions/article/2019/03/18/look-many-legal-ways-wealthy-applicants-have-edge-admissions

Ke, Z., & Ng, V. (2019, July). Automated Essay Scoring: A Survey of the State of the Art. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 6300-6308. 10.24963/ijcai.2019/879

Kirasich, K., Smith, T., & Sadler, B. (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. SMU Data Science Review, 1(3). https://scholar.smu.edu/datasciencereview/vol1/iss3/9

Lee, J., Thymes, B., Zhou, J., Joachims, T., & Zizilcec, R. F. (2023, June 30). Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters. https://doi.org/10.48550/arXiv.2306.17575

Liddy, E. D. (2001). Natural Language Processing. In Encyclopedia of Library and Information Science (2nd ed.). N.Y. Marcel Decker, Inc.

Munson, N. (2021, December 13). 9 basic elements of a complete college application. Marymount University. https://marymount.edu/blog/9-basic-elements-of-a-complete-college-application/

Nadkarni, P. M., Machado, L. O., & Chapman, W. W. (2011, Sep-Oct). Natural language processing: an introduction. J Am Med Inform Assoc, 18(5), 544-51. 0.1136/amiajnl-2011-000464

Nietzel, M. T. (2023, March 30). College Applications Are Up Dramatically In 2023. Forbes. https://www.forbes.com/sites/michaeltnietzel/2023/03/30/college-applications-are-up-dramatically-in-2023/

Notorc. (2006, November 2). Clear Writing: How to Achieve and Measure Readability. Postscripts. http://notorc.blogspot.com/2006/09/devils-in-details-measuring.html

OluAde-Ibijola, A., Wakama, I., & Amadi, J. (2012). An Expert System for Automated Essay Scoring (AES) in Computing using Shallow NLP Techniques for Inferencing. International Journal of Computer Applications, 51, 37-45. 10.5120/8080-1480

Otter, D. W., Medina, J. R., & Kalita, J. K. (2019, December 21). A Survey of the Usages of Deep Learning of Natural Language Processing. IEEE Transactions on Neural Networks and Learning Systems, 32(2), 22. https://doi.org/10.48550/arXiv.1807.10854

Page, E. B., & Paulus, D. H. (1968, April). The Analysis of Essays by Computer. Final Report. ERIC. https://eric.ed.gov/?id=ED028633

Ramesh, D., & Sanampudi, S. K. (2021, September 23). An automated essay scoring systems a systematic literature review. Artif Intell Rev, 55(3), 2495-2527. 10.1007/s10462-021-10068-2

SaturnCloud. (2023, July 10). Why Is Logistic Regression Not Working But Decision Tree Is? Saturn Cloud. https://saturncloud.io/blog/why-is-logistic-regression-not-working-but-decision-tree-is/

Scott, B. (2023, August 1). The Gunning's Fog Index (or FOG) Readability Formula – ReadabilityFormulas.com. Readability Formulas. Retrieved September 27, 2023, from https://readabilityformulas.com/the-gunnings-fog-index-or-fog-readability-formula/

Selingo, J., & Mull, A. (2022, March 23). The College-Admissions Process Is Completely Broken. The Atlantic. https://www.theatlantic.com/ideas/archive/2022/03/change-college-acceptance-application-process/627581/

Sorensen, T. (2022, April 18). What Parents of College Applicants Need to Know. USNews.com. https://www.usnews.com/education/blogs/college-admissions-playbook/articles/changes-that-parents-of-college-applicants-need-to-know

TensorFlow. (2023, May 27). Word embeddings | Text. TensorFlow. https://www.tensorflow.org/text/guide/word_embeddings

Turin Tech. (2021, October 4). Machine Learning vs Statistical Modelling: which one is right for your business problem? – TurinTech AI. TurinTech AI. Retrieved October 1, 2023, from https://www.turintech.ai/2021/10/04/machine-learning-vs-statistical-modelling-which-one-is-right-for-your-business-problem/

Published

02-29-2024

How to Cite

Zhang, R., Malkov, S., & Mbua, P. (2024). Automated College Application Essay Grading. Journal of Student Research, 13(1). https://doi.org/10.47611/jsrhs.v13i1.6041

Issue

Section

HS Research Projects