Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Doctoral Dissertations
  5. Computational Analysis of Microbial Sequence Data Using Statistics and Machine Learning
Details

Computational Analysis of Microbial Sequence Data Using Statistics and Machine Learning

Date Issued
May 1, 2023
Author(s)
Lu, Zhixiu  
Advisor(s)
Scott J. Emrich
Additional Advisor(s)
Scott Emrich
Michela Taufer
Stephanie Kivlin
Audris Mockus
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/29378
Abstract

Since the discovery of the double helix of DNA in 1953, modern molecular biology has opened the door to a better understanding of how genes control chemical processes within cells, including protein synthesis. Although we are still far from claiming a complete understanding, recent advances in sequencing technologies, increased computational capacity, and more sophisticated computational methods have allowed the development of various new applications that provide further insight into DNA sequence data and how the information they encode impacts living organisms and their environment. Sequencing data can now be used to start identifying the relationships between microorganisms, where they live, and in some cases how they affect their host organisms. We introduce and compare methods used for this bioinformatics application, and develop a machine learning model that can be used to effectively predict environmental factors associated with these microorganisms. Codon Usage Bias (CUB), which refers to the highly non-uniform usage of codons that code for the same amino acid has been known to reflect the expression level of a protein-coding gene under the evolutionary theory that selection favors certain synonymous codons. Traditional methods used to estimate CUB and its relation with protein translation have been proven effective on single-celled organisms such as yeast and E. coli, but their applications are limited when it comes to more complex multi-cellular organisms such as plants and animals. To extend our abilities to further understand the relations between codon usage patterns and the protein translation processes in these organisms, we develop a novel deep learning model that can discover patterns in codon usage bias between different species using only their DNA sequences.

Subjects

Machine Learning

Bioinformatics

Codon Usage

DNA

Gene Expression.

Disciplines
Bioinformatics
Degree
Doctor of Philosophy
Major
Computer Science
Comments

Revised submission based on feedback given.

File(s)
Thumbnail Image
Name

Zhixiu_Lu_s_Dissertation__Computational_Analysis_of_Microbial_Sequence_Data_Using_Statistics_and_Machine_Learning.pdf

Size

3.09 MB

Format

Adobe PDF

Checksum (MD5)

f1a4f759936f98e02c972171954a519c

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify