Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Doctoral Dissertations
  5. Optimization of MPI Collective Communication Operations
Details

Optimization of MPI Collective Communication Operations

Date Issued
May 15, 2020
Author(s)
Luo, Xi
Advisor(s)
Jack Dongarra
Additional Advisor(s)
Dali Wang
Michela Taufer
Yingkui Li
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/27068
Abstract

High-performance computing (HPC) systems keep growing in scale and heterogeneity to satisfy the increasing need for computation, and this brings new challenges to the design of Message Passing Interface (MPI) libraries, especially with regard to collective operations.The implementations of state-of-the-art MPI collective operations heavily rely on synchronizations, and these implementations magnify noise across the participating processes, resulting in significant performance slowdowns. Therefore, I create a new collective communication framework in Open MPI, using an event-driven design to relax synchronizations and maintain the minimal data dependencies of MPI collective operations.The recent growth in hardware heterogeneity results in increasingly complex hardware hierarchies and larger communication performance differences.Hence, in this dissertation, I present two approaches to perform hierarchical collective operations, and both can exploit the different bandwidths of hardware in heterogeneous systems and maximizing concurrent communications.Finally, to provide a fast and accurate autotuning mechanism for my framework, I design a new autotuning approach by combining two existing methods. This new approach significantly reduces the search space to save the autotuning time and is still able to provide accurate estimations.I evaluate my work with microbenchmarks and applications at different scales. Microbenchmark results show my work speedups MPI_Bcast and MPI_Allreduce up to 7.34X and 4.86X, respectively, on 4096 processes.In terms of applications, I achieve a 24.3% improvement for Hovorod and a 143% improvement for ASP on 1536 processes as compared to the current Open MPI.

Subjects

MPI

Collective Operation

Hierarchical Collecti...

Noise

Autotuning

Degree
Doctor of Philosophy
Major
Computer Science
Comments
Portions of this document were previously published in HPDC, 18.
File(s)
Thumbnail Image
Name

utk.ir.td_13490.pdf

Size

1.86 MB

Format

Adobe PDF

Checksum (MD5)

d13493fc56238b4249332c11b01cc4cd

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify