The Limits of AI Content Detectors

Authors

  • Hongyu Wu Singapore American School
  • Tom Flanagan Singapore American School

DOI:

https://doi.org/10.47611/jsrhs.v12i3.5064

Keywords:

GPT-2 Output Detector, Academic Integrity, Post-editing Essays, AI content Detectors, AI, ChatGPT, Artificial Intelligence, Detecting Generated Essays, Detecting AI, AI in Education

Abstract

As ChatGPT became a popular and powerful language model used by people worldwide in 2023, the problem of students using it to cheat on schoolwork became palpable. While many existing AI content detectors can detect AI-generated texts, such as GPT-2 Content Detector and GPTZero, the accuracy of an AI content detector in detecting generated essays that have been post-edited by humans is unknown. This research discovered the limitations of the GPT-2 Content Detector and answered the question, “How does human post-editing of AI-generated high school English essays affect the result of an AI content detector? Ten English essays were generated using ChatGPT Plus based on prompts from high school English teachers. Each essay was then edited in 5 different ways to create pairs of unedited and edited essays. All unedited and edited essays were evaluated using GPT-2 Output Detector Demo, and then the results from the detector were studied and analyzed. It was found that introducing spelling mistakes in generated essays and processing the essays with QuillBot will make the result of AI content detectors less accurate. The findings from this research can be used as a guide for companies developing AI-generated text detectors, making them more accurate when dealing with edited generated text. The findings can also be helpful for schools and educators, because knowing that students can edit essays to bypass AI content detectors, educators can develop new ways to examine students’ writing ability.

Downloads

Download data is not yet available.

References or Bibliography

Aydın, Ö., & Karaarslan, E. (2023). Is ChatGPT Leading Generative AI? What is Beyond Expectations? SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4341500

Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4337484

Benzon, W. L. (2023). Discursive Competence in ChatGPT, Part 1: Talking with Dragons Version 2. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4318832

Bommarito, J., Bommarito, M. J., Katz, J., & Katz, D. M. (2023). Gpt as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4322372

ChatGPT. (2023). Openai.com. https://chat.openai.com/chat

Cotton, D., Cotton, P., & Shipway, J. (2023). Chatting and Cheating. Ensuring academic integrity in the era of ChatGPT. EdArXiv Preprints. https://edarxiv.org/mrz8h/

Dehouche, N. (2021). Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3). Ethics in Science and Environmental Politics, 21, 17–23. https://doi.org/10.3354/esep00195

Elkins, K., & Chun, J. (2020). Can GPT-3 pass a Writer’s turing test?. Journal of Cultural Analytics, 5(2).

Floridi, L., & Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30(4), 681–694. https://doi.org/10.1007/s11023-020-09548-1

Fröhling, L., & Zubiaga, A. (2021). Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover. PeerJ Computer Science, 7, e443. https://doi.org/10.7717/peerj-cs.443

Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2022). Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. https://doi.org/10.1101/2022.12.23.521610

Gozalo-Brizuela, R., & Garrido-Merchan, E. C. (2023). ChatGPT is not all you need. A State of the Art Review of large Generative AI models. ArXiv.org. https://doi.org/10.48550/arXiv.2301.04655

Lavoie, A., & Krishnamoorthy, M. (2010). Algorithmic Detection of Computer Generated Text. ArXiv.org. https://doi.org/10.48550/arXiv.1008.0706

OpenAI. (2022, November 30). ChatGPT: Optimizing Language Models for Dialogue. OpenAI; OpenAI. https://openai.com/blog/chatgpt/

Popenici, S. A. D., & Kerr, S. (2017). Exploring the impact of artificial intelligence on teaching and learning in higher education. Research and Practice in Technology Enhanced Learning, 12(1). https://doi.org/10.1186/s41039-017-0062-8

QuillBot. (2022). Quillbot.com; QuillBot. https://quillbot.com/

Rawlinson, G. (1976). University of Cambridge. Cam.ac.uk. https://www.mrc-cbu.cam.ac.uk/people/matt.davis/Cmabridge/rawlinson/

Rodriguez, J., Hay, T., Gros, D., Shamsi, Z., & Srinivasan, R. (2022). Cross-Domain Detection of GPT-2-Generated Technical Text. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. https://doi.org/10.18653/v1/2022.naacl-main.88

Salminen, J., Kandpal, C., Kamel, A. M., Jung, S., & Jansen, B. J. (2021). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771. https://doi.org/10.1016/j.jretconser.2021.102771

Sumalinog, G. (2018). COMMON GRAMMATICAL ERRORS OF THE HIGH SCHOOL STUDENTS: THE TEACHERS’ PERSPECTIVE. 5(10). https://doi.org/10.5281/zenodo.1473359

Teo Susnjak. (2022). ChatGPT: The End of Online Exam Integrity? ArXiv; https://www.semanticscholar.org/paper/ChatGPT%3A-The-End-of-Online-Exam-Integrity-Susnjak/8822357efe500caded16e603d21239be3a39547c

Thunström, A., Steingrimsson, S., & Gpt Generative Pretrained Transformer. (n.d.). Can GPT-3 write an academic paper on itself, with minimal human input? https://hal.science/hal-03701250/document

Ventayen, R. J. M. (2023). OpenAI ChatGPT Generated Results: Similarity Index of Artificial Intelligence-Based Contents. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4332664

Willems, J. (2023). ChatGPT at Universities – The Least of Our Concerns. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4334162

Winter, J. de. (2023). Can ChatGPT Pass High School Exams on English Language Comprehension? Researchgate. Retrieved February 15, 2023, from https://www.researchgate.net/profile/Joost-De-Winter/publication/366659237_Can_ChatGPT_pass_high_school_exams_on_English_Language_Comprehension/links/63b9c3fcc3c99660ebd8847c/Can-ChatGPT-pass-high-school-exams-on-English-Language-Comprehension.pdf

Published

08-31-2023

How to Cite

Wu, H., & Flanagan, T. (2023). The Limits of AI Content Detectors. Journal of Student Research, 12(3). https://doi.org/10.47611/jsrhs.v12i3.5064

Issue

Section

HS Research Articles