Machine learning models accurately predict T-DNA insertion into plant genomes
DOI:
https://doi.org/10.47611/jsr.v13i3.2542Keywords:
Agrobacterium, Machine LearningAbstract
Agrobacterium tumefaciens is a gram-negative bacterium of the family Rhizobiaceae and is known for its pathogenic ability to induce a neoplastic response in over 100 different species of plants, often leading to significant decline in individual plant health. The mechanism by which tumors are induced includes a segment of DNA contained within the bacterium’s Ti plasmid which is integrated in the host genome. The T-DNA is oncogenic, encoding enzymes that increase the production of certain plant hormones ultimately leading to tumor formation. The impressive ability of T-DNA to integrate into plant genomes has led to its use as a common method of genetic transformation in plants. While it has been documented that the T-DNA insertion occurs at double strand breaks, the mechanism of insertion still remains elusive. Currently, the point at which the T-DNA is inserted in the host genome is believed to be somewhat random with respect to the surrounding sequences, and uncontrolled multiple insertion sites appear to be a common phenomenon. In this study, we utilized machine learning algorithms to assess the nucleotide sequences that are important in integration of Ti plasmid into the host genome. Various machine learning algorithms have yielded high-accuracy models provided sequence data alone.
Downloads
Metrics
Published
How to Cite
Issue
Section
Copyright (c) 2025 Sawyer H. Smith; Azin Agah

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright holder(s) granted JSR a perpetual, non-exclusive license to distriute & display this article.