Investigating a Key COVID-19 Question by Using Natural Language Processing on Scientific Publications
DOI:
https://doi.org/10.47611/jsrhs.v11i3.2977Keywords:
natural language processing, transformer, health informatics, COVID-19, CORD-19Abstract
The COVID-19 pandemic has brought an unprecedented challenge to public health. Numerous scientific publications are published daily on COVID-19 to understand the unexplored facets of the disease. The sheer volume of these publications makes it daunting for researchers to quickly find information and evaluate data related to specific COVID-19 queries. Natural Language Processing (NLP), a form of artificial intelligence, assists in churning these huge piles of data with a sophisticated algorithmic approach. The purpose of this study is to investigate key a COVID-19 question by using NLP on scientific publications. Using the T5 (Text-To-Text Transfer Transformer) model, we analyzed 740,000 journal abstracts for specific answers an important COVID-19 question. We performed qualitative observations, T-Tests (p-values and inferences), and accuracy metrics (Precision, Recall, and F1 score) to evaluate the models in this study. As the number of scientific publications increases, our proposed methodology provides an efficient mechanism for performing specific information retrieval for emerging questions, diseases, and related conditions, especially for underrepresented populations.
Downloads
References or Bibliography
Qin X, Liu J, Wang Y, Liu Y, Deng K, Ma Y, Zou K, Li L, Sun X. Natural language processing was effective in assisting rapid title and abstract screening when updating systematic reviews. J Clin Epidemiol. 2021 May;133:121-129. doi: 10.1016/j.jclinepi.2021.01.010. Epub 2021 Jan 21. PMID: 33485929.
Lou Z, Zhang J. Abstractive Summarization on COVID-19 Publications. CS230 Deep Learning, Stanford University. Spring 2020.
Oniani D, Wang Y. A qualitative evaluation of language models on automatic question-answering for COVID-19. Association for Computing Machinery Digital Library. 21 September 2020.
Mlconsult. (2020, May 3). Transmission, incubation and environment 2.0. Kaggle. Retrieved November 14, 2021, from https://www.kaggle.com/mlconsult/transmission-incubation-and- environment-2-0.
COVID-19 Open Research Dataset (CORD-19), available for download at https://allenai.org/data/cord-19
Wang L, Lo K, Chandrasekhar Y, Reas R, Yang J, Eide D, Funk K, Kinney R, Liu Z, Merrill W, Mooney P, Murdick D, Rishi D, Sheehan J, Shen Z, Stilson B, Wade A, Wang, K, Wilhelm, C, Xie B, Raymond D, Weld D, Etzioni O, Kohlmeier S. CORD-19: The Covid-19 Open Research Dataset. National Institutes of Health. 22 April 2020. PMID: 32510522
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
T5. T5 - transformers 4.12.2 documentation. (n.d.). Retrieved November 14, 2021, from https://huggingface.co/transformers/model_doc/t5.html.
Pretrained models¶. Pretrained models - transformers 4.0.0 documentation. (n.d.). Retrieved November 14, 2021, from https://huggingface.co/transformers/v4.0.1/pretrained_models.htm
Pathak, N. (2021, September 30). Coronavirus incubation period: How long and when most contagious. WebMD. Retrieved November 14, 2021, from https://www.webmd.com/lung/coronavirus- incubation-period#1.
Devika Dua. (2021, November 13). AMIA 2021 1fbc4b. Kaggle. Retrieved November 14, 2021, from https://www.kaggle.com/devikadua/amia-2021-1fbc4b.
Hayes, A. (2021, November 13). T-test definition. Investopedia. Retrieved November 14, 2021, from https://www.investopedia.com/terms/t/t-test.asp.
Centers for Disease Control and Prevention. (2021, February 12). Management of patients with confirmed 2019-ncov. Centers for Disease Control and Prevention. Retrieved November 14, 2021, from https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-guidance-management- patients.html.
Sklearn.metrics.confusion_matrix. scikit. (n.d.). Retrieved November 14, 2021, from https://scikit- learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html.
Yacouby R, Axman D. Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. Eval4NLP. 20 November 2020.
Azunre, P. (2021, August 11). Recent advances in transfer learning for Natural Language Processing. Medium. Retrieved November 14, 2021, from https://towardsdatascience.com/why- should-you-leverage-transfer-learning-14d08a60f616.
AI, A. I. F. (2021, November 9). Covid-19 open research dataset challenge (cord-19). Kaggle. Retrieved November 15, 2021, from https://www.kaggle.com/allen-institute-for-ai/CORD-19- research-challenge.
Published
How to Cite
Issue
Section
Copyright (c) 2022 Devika Dua; John Mapes
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.