Evaluating Machine Learning Models on Predicting Change in Enzyme Thermostability
DOI:
https://doi.org/10.47611/jsrhs.v12i2.4364Keywords:
enzyme, thermostability, artificial intelligence, machine learningAbstract
Enzymes are efficient catalysts for biological reactions and can potentially be designed to speed up non-biological reactions, such as reactions in industrial processes. However, physically experimenting with new protein designs is time consuming, and an efficient method to predict protein stability is needed. Our research problem is finding the best machine learning model to predict the change in enzyme thermostability after a single point mutation in the amino acid sequence. We trained several machine learning models and found that the XGBoost model had the best performance with an R2 score of 0.593 (R2 score is a metric where higher is better and a perfect model would have a score of 1).
Downloads
References or Bibliography
Beheshti, N. (2022, March 2). Random Forest Regression. Towards Data Science. Retrieved February 26, 2023, from https://towardsdatascience.com/random-forest-regression-5f605132d19d
Deotte, C. (2022, September). How to use Kaggle's train data. Kaggle. Retrieved February 26, 2023, from https://www.kaggle.com/competitions/novozymes-enzyme-stability-prediction/discussion/358320
Engelberger, F., Galaz-davison, P., Bravo, G., Rivera, M., & Ramírez-sarmiento, C. A. (2021). Developing and implementing cloud-based tutorials that combine bioinformatics software, interactive coding, and visualization exercises for distance learning on structural bioinformatics. Journal of Chemical Education, 98(5), 1801-1807. https://doi.org/10.1021/acs.jchemed.1c00022
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-paredes, B., Nikolov, S., Jain, R., Adler, J., . . . Silver, D. (2021). Highly accurate protein structure prediction with Alphafold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
Martins, D. (2021, May 14). XGBoost: A complete guide to fine-tune and optimize your model. Towards Data Science. Retrieved February 26, 2023, from https://towardsdatascience.com/xgboost-fine-tune-and-optimize-your-model-23d996fab663
Mousavi, S. M., Hashemi, S. A., Iman moezzi, S. M., Ravan, N., Gholami, A., Lai, C. W., Chiang, W.-H., Omidifar, N., Yousefi, K., & Behbudi, G. (2021). Recent advances in enzymes for the bioremediation of pollutants. Biochemistry Research International, 2021, 1-12. https://doi.org/10.1155%2F2021%2F5599204
Novozymes Enzyme Stability Prediction. (2022). Retrieved from https://kaggle.com/competitions/novozymes-enzyme-stability-prediction
Coefficient of Determination - R2 score. (2023, January 10). GeeksforGeeks. Retrieved February 26, 2023, from https://www.geeksforgeeks.org/python-coefficient-of-determination-r2-score/
Types of Neural Network algorithms in Machine Learning. (2022, September 27). Omdena. Retrieved February 26, 2023, from https://omdena.com/blog/types-of-neural-network-algorithms-in-machine-learning
XGBoost. (2023, February 6). GeeksforGeeks. Retrieved February 26, 2023, from https://www.geeksforgeeks.org/xgboost/
Published
How to Cite
Issue
Section
Copyright (c) 2023 Avnith Vijayram; Jacklyn Luu
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.