Doctoral Dissertations

Towards Machine Reading Comprehension to Advance Biomedical Text Understanding in The Absence of Large-scale Labeled Dataset

Maria Mahbub, University of Tennessee, KnoxvilleFollow

Date of Award

8-2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Gregory D Peterson

Committee Members

Edmon Begoli, Audris Mockus, Russell Zaretzki, Suzanne Tamang

Abstract

Increasing demand for retrieving accurate and timely information from unstructured text data in healthcare motivates the need for effective biomedical machine reading comprehension (MRC) systems. However, high-quality labeled data scarcity in new biomedical problem domains presents an obstacle to training state-of-the-art MRC models, thereby limiting their performance and applicability. Consequently, the application of MRC in the biomedical domain is an under-explored area of research. Employing transfer learning to transmit knowledge to the low-resource biomedical domain from a high-resource domain is a potential solution for limited data availability. Unfortunately, transferring knowledge directly between domains often hurts models' performance due to the domain shift phenomenon.

In this dissertation, we focus on advancing biomedical text understanding using MRC in the absence of large-scale annotated datasets, with particular attention to transfer learning. We study the mitigation of domain shift and propose BioADAPT-MRC, a novel framework designed to implement improved transfer learning in the biomedical domain employing a domain adaptation algorithm based on adversarial learning. BioADAPT-MRC aims to overcome data scarcity challenges and enhance the performance of biomedical MRC. We identify two new problem domains that can benefit from biomedical MRC tasks: MRC on clinical practice guidelines (CPGs) and MRC on clinical notes to extract injection drug use (IDU) information. We facilitate the evaluation and benchmarking of MRC tasks on CPGs by manually building a benchmark dataset called cpgQA. Furthermore, we create a framework named IDU-QA, enabling us to develop a reference dataset and an MRC model capable of retrieving information regarding IDU from clinical notes. We conduct a thorough evaluation of the BioADAPT-MRC framework using MRC datasets sourced from the widely-known BioASQ competition. We tackle the challenge of limited labeled data in the new problem domains by employing different transfer learning settings, along with BioADAPT-MRC. We perform a comprehensive error analysis for each problem domain, highlighting the strengths and limitations of the MRC models. We further discuss future research directions to enhance the performance and applicability of biomedical MRC models. Overall, this dissertation contributes to the advancement of MRC systems in the biomedical domain, paving the way for improved healthcare interventions and a deeper understanding of critical biomedical information.

Comments

"In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of The University of Tennessee, Knoxville's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink. If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies of the dissertation."

Recommended Citation

Mahbub, Maria, "Towards Machine Reading Comprehension to Advance Biomedical Text Understanding in The Absence of Large-scale Labeled Dataset. " PhD diss., University of Tennessee, 2023.
https://trace.tennessee.edu/utk_graddiss/8658

Download

Available for download on Saturday, August 15, 2026

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Included in

Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Data Science Commons

COinS

Doctoral Dissertations

Towards Machine Reading Comprehension to Advance Biomedical Text Understanding in The Absence of Large-scale Labeled Dataset

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Comments

Recommended Citation

Included in

Search

Browse

Contributors

Useful Links

About Trace

Doctoral Dissertations

Towards Machine Reading Comprehension to Advance Biomedical Text Understanding in The Absence of Large-scale Labeled Dataset

Author

Date of Award

Degree Type

Degree Name

Major

Major Professor

Committee Members

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse

Contributors

Useful Links

About Trace