Doctoral Dissertations

Date of Award

12-2019

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Jack Dongarra

Committee Members

Michela Taufer, Michael Berry, Dimitry Liakh

Abstract

The successful utilization of the modern configuration of the heterogeneous many-core architectures with complex memory hierarchies is a challenge for many application developers. Portability and performance of existing and new applications are the key challenges scientific application developers are continuously facing. Many evolutionary solutions have been proposed, including ones that seek to extend the capabilities of the current message passing paradigm with intra-node features (MPI+X). A different, more revolutionary, solution explores data-flow task-based Runtime systems as a substitute to both local and distributed data dependencies management. The method of programming such a Runtime is important, as that directly affects the productivity of the developers and the performance of the applications. This work extends the capability of one of such runtime, the Parallel Runtime Scheduling and Execution Controller (PaRSEC), to the novel programming approach of allowing users to insert task in the Runtime by writing sequential code. This programming model is called Dynamic Task Discovery (DTD), which discovers tasks dynamically at runtime and uses optimized graph unrolling techniques to accommodate applications with large task graphs.In this work, PaRSEC's capability is extended by providing a new programming model, DTD. Bottlenecks of this programming model are identified and solutions to overcome its limitations are proposed. The performance of the implementation of DTD on top of dense linear algebra workload is analyzed at scale, where DTD has shown excellent results in distributed memory: 2.3x--1.3x better performance at 128 nodes for QR factorization compared to ScaLAPACK and in shared memory, 4x—5x better performance for Cholesky factorization compared to other runtimes, StarPU and QUARK. DTD was also evaluated via the coupled-cluster method of state of the art quantum chemistry application NWCHEM, where it performed remarkably well among all considered Runtimes at scale of 128 nodes. The hope is that the concept and the development of DTD, the detailed evaluation of its practical performance at scale, the analysis of the theoretical limitations of it, the thorough study and classification of various task-based Runtime system{s}, and the design, implementation and evaluations of the chosen Runtimes on micro-benchmarks will help the broad scientific application developer community.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS