Doctoral Dissertations
Date of Award
12-2025
Degree Type
Dissertation
Degree Name
Doctor of Philosophy
Major
Data Science and Engineering
Major Professor
Jack J. Dongarra
Committee Members
Heike Jagode, Anthony Danalis, Russell L. Zaretzki
Abstract
Efficient execution of scientific applications on high-performance computing (HPC) systems depends heavily on effective performance analysis. Performance analysis refers to the process of evaluating how well an application performs on an HPC system, including its interaction with underlying hardware components, to understand how well it operates and identify potential areas for improvement. This typically involves the measurement of various performance metrics, such as speed, efficiency, resource utilization, and responsiveness. The insights gained from performance analysis allow the user of an HPC system to optimize hardware usage, pinpoint inefficiencies and bottlenecks, and ensure scalability. Many of the most informative performance metrics can only be obtained by monitoring hardware events (i.e., low-level operations tracked by the hardware itself). Without them, the only available performance metric is execution time.
However, the sheer volume of hardware events in modern HPC systems is overwhelming, making them difficult for users to comprehend and use effectively. The research in this dissertation has focused on mitigating this problem by developing an approach that quantitatively characterizes hardware events, automatically classifies them, and automatically derives meaningful performance metrics from them. To better understand the system behaviors captured by hardware events, this dissertation presents benchmarks consisting of well-defined operations that stress different hardware attributes in isolation. In addition, it elucidates an automated mathematical analysis to identify key hardware events and define useful performance metrics using them. Lastly, it establishes strategies for benchmarking the hardware shared among processor cores and identifying key inter-core events.
Recommended Citation
Barry, Daniel P., "Automated Classification and Verification of Performance Counters. " PhD diss., University of Tennessee, 2025.
https://trace.tennessee.edu/utk_graddiss/13559