Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Masters Theses
  5. Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach
Details

Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach

Date Issued
August 1, 2010
Author(s)
Nath, Rajib Kumar
Advisor(s)
Jack Dongarra
Additional Advisor(s)
Stanimire Z. Tomov
Lynne E. Parker
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/43736
Abstract

Dense linear algebra(DLA) is one of the most seven important kernels in


high performance computing. The introduction of new machines from vendors

provides us opportunities to optimize DLA libraries for the new machines

and thus exploit their power. Unfortunately the optimization phase is not

straightforward. The optimum code of a certain Basic Linear Algebra

Subprogram (BLAS) kernel, which is the core of DLA algorithms, in two

different machines with different semiconductor process can be different

even if they share the same features in terms of instruction set

architecture, memory hierarchy and clock speed. It has become a tradition

to optimize BLAS for new machines. Vendors maintain highly optimized BLAS

libraries targeting their CPUs. Unfortunately the existing BLAS for GPUs

is not highly optimized for DLA algorithms. In my research, I have

provided new algorithms for several important BLAS kernels for different

generation of GPUs and introduced a pointer redirecting approach to make

BLAS run faster in generic problem size. I have also presented an

auto-tuning approach to parameterize the developed BLAS algorithms and

select the best set of parameters for a given card.

The hardware trends have also brought up the need for updates on existing

legacy DLA software packages, such as the sequential LAPACK. To take

advantage of the new computational environment, successors of LAPACK must

incorporate algorithms of three main characteristics: high parallelism,

reduced communication, and heterogeneity-awareness. On multicore

architectures, Parallel Linear Algebra Software for Multicore

Architectures (PLASMA) has been developed to meet the challenges in

multicore. On the other extreme, Matrix Algebra on GPU and Multicore

Architectures (MAGMA) library demonstrated a hybridization approach that

indeed streamlined the development of high performance DLA for multicores

with GPU accelerators. The performance of these two libraries depend upon

right choice of parameters for a given problem size and given number of

cores and/or GPUs. In this work, the issue of automatically tuning these

two libraries is presented. A prune based empirical auto-tuning method has

been proposed for tuning PLASMA. Part of the tuning method for PLASMA was

considered to tune hybrid MAGMA library.

Subjects

Optimization

GPUs

Multicore

Dense Linear Algebra

BLAS

Hybrid Architecture

Degree
Master of Science
Major
Computer Science
Embargo Date
December 1, 2011
File(s)
Thumbnail Image
Name

my_dissertation.pdf

Size

1.73 MB

Format

Adobe PDF

Checksum (MD5)

374b8e38e2176b59b3d9dcf8072d353d

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify