Date of Award
Doctor of Philosophy
Arvind Ramanathan, Hairong Qi, Russell Zaretzki
Electronic health records (EHRs) are the primary method for documenting and storing patient outcomes in modern healthcare; data mining and machine learning approaches utilize the information stored in EHRs to assist in clinical decision support and other critical healthcare applications. Important information in EHRs is often stored in the form of unstructured clinical text. Unfortunately, the state-of-the-art methods used to automatically extract useful information from unstructured clinical text lags significantly behind the state-of-the-art methods used in the general natural language processing (NLP) community for other tasks such as machine translation, question answering, and sentiment analysis. In this work, we attempt to bridge this gap by applying and developing hierarchical neural approaches to classify key data elements in cancer pathology reports, such as cancer site, histology, grade, and behavior. We (1) show that a hierarchical attention network (HAN), which has strong performance on classifying general text such as Yelp reviews and news snippets, achieves better classification accuracy and macro F-score on identifying cancer site and grade than previous state-of-the-art approaches, (2) develop a novel hierarchical self-attention network (HiSAN) which not only achieves better accuracy and macro F-score in cancer pathology pathology report classification than the HAN but also trains over 10x faster, and (3) introduce a hierarchical framework for incorporating case-level context when classifying cancer pathology reports and show that it gives a significant boost in accuracy and macro F-score.
Gao, Shang, "Hierarchical Neural Architectures for Classifying Cancer Pathology Reports. " PhD diss., University of Tennessee, 2019.