Date of Award


Degree Type


Degree Name

Master of Science


Computer Science

Major Professor

Jack Dongarra

Committee Members

Stanimire Z. Tomov, Lynne E. Parker


Dense linear algebra(DLA) is one of the most seven important kernels in

high performance computing. The introduction of new machines from vendors

provides us opportunities to optimize DLA libraries for the new machines

and thus exploit their power. Unfortunately the optimization phase is not

straightforward. The optimum code of a certain Basic Linear Algebra

Subprogram (BLAS) kernel, which is the core of DLA algorithms, in two

different machines with different semiconductor process can be different

even if they share the same features in terms of instruction set

architecture, memory hierarchy and clock speed. It has become a tradition

to optimize BLAS for new machines. Vendors maintain highly optimized BLAS

libraries targeting their CPUs. Unfortunately the existing BLAS for GPUs

is not highly optimized for DLA algorithms. In my research, I have

provided new algorithms for several important BLAS kernels for different

generation of GPUs and introduced a pointer redirecting approach to make

BLAS run faster in generic problem size. I have also presented an

auto-tuning approach to parameterize the developed BLAS algorithms and

select the best set of parameters for a given card.

The hardware trends have also brought up the need for updates on existing

legacy DLA software packages, such as the sequential LAPACK. To take

advantage of the new computational environment, successors of LAPACK must

incorporate algorithms of three main characteristics: high parallelism,

reduced communication, and heterogeneity-awareness. On multicore

architectures, Parallel Linear Algebra Software for Multicore

Architectures (PLASMA) has been developed to meet the challenges in

multicore. On the other extreme, Matrix Algebra on GPU and Multicore

Architectures (MAGMA) library demonstrated a hybridization approach that

indeed streamlined the development of high performance DLA for multicores

with GPU accelerators. The performance of these two libraries depend upon

right choice of parameters for a given problem size and given number of

cores and/or GPUs. In this work, the issue of automatically tuning these

two libraries is presented. A prune based empirical auto-tuning method has

been proposed for tuning PLASMA. Part of the tuning method for PLASMA was

considered to tune hybrid MAGMA library.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."