Doctoral Dissertations
Date of Award
8-2023
Degree Type
Dissertation
Degree Name
Doctor of Philosophy
Major
Computer Science
Major Professor
Gregory D Peterson
Committee Members
Edmon Begoli, Audris Mockus, Russell Zaretzki, Suzanne Tamang
Abstract
Increasing demand for retrieving accurate and timely information from unstructured text data in healthcare motivates the need for effective biomedical machine reading comprehension (MRC) systems. However, high-quality labeled data scarcity in new biomedical problem domains presents an obstacle to training state-of-the-art MRC models, thereby limiting their performance and applicability. Consequently, the application of MRC in the biomedical domain is an under-explored area of research. Employing transfer learning to transmit knowledge to the low-resource biomedical domain from a high-resource domain is a potential solution for limited data availability. Unfortunately, transferring knowledge directly between domains often hurts models' performance due to the domain shift phenomenon.
In this dissertation, we focus on advancing biomedical text understanding using MRC in the absence of large-scale annotated datasets, with particular attention to transfer learning. We study the mitigation of domain shift and propose BioADAPT-MRC, a novel framework designed to implement improved transfer learning in the biomedical domain employing a domain adaptation algorithm based on adversarial learning. BioADAPT-MRC aims to overcome data scarcity challenges and enhance the performance of biomedical MRC. We identify two new problem domains that can benefit from biomedical MRC tasks: MRC on clinical practice guidelines (CPGs) and MRC on clinical notes to extract injection drug use (IDU) information. We facilitate the evaluation and benchmarking of MRC tasks on CPGs by manually building a benchmark dataset called cpgQA. Furthermore, we create a framework named IDU-QA, enabling us to develop a reference dataset and an MRC model capable of retrieving information regarding IDU from clinical notes. We conduct a thorough evaluation of the BioADAPT-MRC framework using MRC datasets sourced from the widely-known BioASQ competition. We tackle the challenge of limited labeled data in the new problem domains by employing different transfer learning settings, along with BioADAPT-MRC. We perform a comprehensive error analysis for each problem domain, highlighting the strengths and limitations of the MRC models. We further discuss future research directions to enhance the performance and applicability of biomedical MRC models. Overall, this dissertation contributes to the advancement of MRC systems in the biomedical domain, paving the way for improved healthcare interventions and a deeper understanding of critical biomedical information.
Recommended Citation
Mahbub, Maria, "Towards Machine Reading Comprehension to Advance Biomedical Text Understanding in The Absence of Large-scale Labeled Dataset. " PhD diss., University of Tennessee, 2023.
https://trace.tennessee.edu/utk_graddiss/8658
Included in
Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Data Science Commons
Comments
"In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of The University of Tennessee, Knoxville's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink. If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation."