Doctoral Dissertations

Date of Award

12-2019

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Georgia Tourassi

Committee Members

Arvind Ramanathan, Hairong Qi, Russell Zaretzki

Abstract

Electronic health records (EHRs) are the primary method for documenting and storing patient outcomes in modern healthcare; data mining and machine learning approaches utilize the information stored in EHRs to assist in clinical decision support and other critical healthcare applications. Important information in EHRs is often stored in the form of unstructured clinical text. Unfortunately, the state-of-the-art methods used to automatically extract useful information from unstructured clinical text lags significantly behind the state-of-the-art methods used in the general natural language processing (NLP) community for other tasks such as machine translation, question answering, and sentiment analysis. In this work, we attempt to bridge this gap by applying and developing hierarchical neural approaches to classify key data elements in cancer pathology reports, such as cancer site, histology, grade, and behavior. We (1) show that a hierarchical attention network (HAN), which has strong performance on classifying general text such as Yelp reviews and news snippets, achieves better classification accuracy and macro F-score on identifying cancer site and grade than previous state-of-the-art approaches, (2) develop a novel hierarchical self-attention network (HiSAN) which not only achieves better accuracy and macro F-score in cancer pathology pathology report classification than the HAN but also trains over 10x faster, and (3) introduce a hierarchical framework for incorporating case-level context when classifying cancer pathology reports and show that it gives a significant boost in accuracy and macro F-score.

Comments

This dissertation is a manuscript-style dissertation in which each of the three core chapters is a previously published journal paper or is a paper currently undergoing the submission process.

Orcid ID

http://orcid.org/0000-0003-1803-1457

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS