Doctoral Dissertations

Orcid ID

Date of Award


Degree Type


Degree Name

Doctor of Philosophy


Computer Science

Major Professor

Michela Taufer

Committee Members

Michael Jantz, Jian Huang, Rodrigo Vargas, Yoonho Park, Jay Lofstead


Scientific communities across different domains increasingly run complex workflows for their scientific discovery. Scientists require that these workflows ensure robustness; where workflows must be reproducible, scale in performance; and exhibit trustworthiness in terms of the computational techniques, infrastructures, and people. However, as scientists leverage advanced techniques (big data analytics, AI, and ML) and infrastructure (HPC and cloud), their workflows grow in complexity, leading to new challenges in scientific computing; hindering robustness.

In this dissertation, we address the needs of diverse scientific communities across different fields to identify three main challenges that hinder the robustness of workflows: (i) lack of traceability, explainability, and reproducibility; (ii) hidden intermediate data reducing scalability; and (iii) inefficient data management in workflow orchestration. We codesign scientific workflows and HPC and cloud-converged infrastructure to develop robust science, bridging the gap between computational and domain scientists.

First, we develop fine-grained containerized environments that enable data traceability and results explainability by automatically annotating provenance information, to advance widespread reproducibility. Second, we integrate the workflows in HPC and cloud infrastructure and tune the storage technology to enable better I/O and data scalability. Finally, we provide a software architecture that enables efficient data management (scalable and trustworthy data) in the orchestration of scientific workflows while leveraging the high throughput and low latency of node-local storage.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."
