Doctoral Dissertations

Orcid ID

https://orcid.org/0009-0007-7749-188X

Date of Award

12-2025

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Jack Dongarra

Committee Members

Jack Dongarra, Piotr Luszczek, Michael Berry, Hartwig Anzt, James Demmel, Michael Mahoney

Abstract

Randomized Numerical Linear Algebra (RandNLA) is a relatively young branch of the field of Numerical Linear Algebra (NLA). It leverages randomization as the main approach to develop algorithms for computing solutions to classical linear algebra problems with performance superior to the deterministic NLA schemes. Despite that, RandNLA methods have yet to gain broad adoption within mainstream NLA software libraries, and, by extension, within the High-Performance Computing (HPC) toolkit aimed at combating the challenges posed by the end of Moore’s law. This dissertation argues that the HPC community is missing a powerful opportunity by not embracing RandNLA methods. As part of this work, I propose and analyze three RandNLA methods for the two NLA problem classes: full matrix factorizations through QR decomposition with column pivoting (QRCP), and low-rank factorization via Singular Value Decomposition (SVD). The core contribution of the projects I present lies in my development of high-performance implementations for each method, integrated into the open-source RandLAPACK library. First, I present a novel QRCP algorithm for tall matrices. I provide a comprehensive analysis of the method in exact arithmetic and give a broad characterization of its finite-precision arithmetic analysis. Experiments on the algorithm’s implementation demonstrate order-of-magnitude speedups over LAPACK’s standard function for QRCP. Next, I present an algorithmic framework that enables the design of practical QRCP algorithms for general matrices through user-controlled choices of the core subroutines. I provide a comprehensive overview of how to navigate these choices on modern hardware platforms, offering detailed descriptions of alternative methods for both CPUs and GPUs. CPU experiments demonstrate that the proposed method achieves performance improvements of up to two orders of magnitude over LAPACK’s standard QRCP routine. GPU experiments suggest that my method attains approximately 65% of the performance of cuSOLVER’s unpivoted QR factorization. Finally, I discuss a high-performance implementation of a partial SVD algorithm based on the Krylov subspace method. The algorithm is tested against randomized SVD and a deterministic SVD based on the implicitly restarted Lanczos method. Results on a dense synthetic dataset and a real-world sparse model reduction problem show the superiority of the algorithm I implemented.

Comments

Keywords: Randomized Numerical Linear Algebra, RandNLA, High-Performance Computing, Partial SVD, Krylov Subspace Methods, QRCP, RandLAPACK, GPU.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS