"Comparative Analysis of TCR and TCR-pMHC Complex Structure Prediction " by Yudan Shi
 

Masters Theses

Date of Award

12-2024

Degree Type

Thesis

Degree Name

Master of Science

Major

Life Sciences

Major Professor

Jeremy C. Smith

Committee Members

Hong Guo, Francisco N. Barrera, Rajan Lamichhane

Abstract

T cell receptor (TCR) and TCR-peptide-major histocompatibility complex (pMHC) structure prediction tools have developed rapidly following AlphaFold's success. While these deep-learning tools offer new opportunities for studying T cell recognition and developing immunotherapies, their relative accuracy remains unclear due to lack of standardized benchmarking. This study presents a comprehensive evaluation of various TCR and TCR-pMHC structure prediction tools to assess their accuracy and limitations.

We analyzed six TCR and five TCR-pMHC structure prediction tools, encompassing homology-based methods, TCR-specific deep learning methods, and general protein structure prediction tools (AlphaFold 2 and 3). The evaluation utilized strictly curated benchmark sets of 40 αβ [alpha-beta] TCR structures and 27 TCR-pMHC structures (21 Class I and 6 Class II), selected based on post-training set cutoff dates and sequence identity of TCR variable region to ensure non-redundancy. We assessed accuracy using multiple metrics: Root Mean Square Deviation (RMSD) and Template Modeling score (TM-score) for global and region similarity, Local Distance Difference Test (lDDT) for local accuracy, and DockQ scores with Critical Assessment of Predicted Interactions (CAPRI) criteria for interface evaluation.

For isolated TCRs, AlphaFold showed superior accuracy with mean RMSD values of 1.6 Å, lDDT scores of 0.88 and TM-score with 0.96. In TCR-pMHC prediction, TCRmodel2 and AlphaFold2 performed best with mean RMSD of 2.5 Å, lDDT of 0.85 and TM-score of 0.93. While deep-learning based tools outperformed traditional homology-based approaches in complementarity determining region 3 (CDR3) prediction, significant challenges remain. Notable outliers in CDR3β [beta] regions revealed difficulties in modeling long CDR3β [beta] loops and unique CDR3-peptide interactions. Analysis of various metrics revealed that extreme outliers result from incorrect orientations between MHC and TCR or between TCR chains, highlighting a major bottleneck in this field.

This comprehensive analysis provides critical insights into the strengths and limitations of current TCR and TCR-pMHC structure prediction tools. Meanwhile, it emphasizes the importance of using multiple complementary metrics for accuracy assessment of models.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS