Date of Award
Doctor of Philosophy
Robert C. Ward
Michael W. Berry, Jack J. Dongarra, Robert J. Hinde
In the first-principles calculation of electronic structures, one of the most timeconsuming tasks is that of computing the eigensystem of a large symmetric nonlinear eigenvalue problem. The standard approach is to use an iterative scheme involving the solution to a large symmetric linear eigenvalue problem in each iteration. In the early and intermediate iterations, significant gains in efficiency may result from solving the eigensystem to reduced accuracy. As the iteration nears convergence, the eigensystem can be computed to the required accuracy.
Traditional real symmetric eigensolvers compute the eigensystem in three steps: 1) reduce a dense matrix to a symmetric tridiagonal form using orthogonal transformations; 2) compute eigenpairs of the tridiagonal matrix; 3) back-transform eigenvectors of the tridiagonal matrix to those of the original matrix. Stable and efficient eigendecomposition algorithms for symmetric tridiagonal matrix are under constant investigation, while the performance of orthogonal reduction step remains a bottleneck.
The main contribution of this dissertation is an efficient parallel approximate
eigensolver that computes eigenpairs of a real symmetric matrix to reduced accuracy. This eigensolver consists of three major parts: 1) a parallel block tridiagonal divide-andconquer algorithm that computes the approximate eigenpairs of a block tridiagonal matrix to prescribed accuracy; 2) a parallel block tridiagonalization algorithm that constructs a block tridiagonal matrix from a sparse matrix or “effectively” sparse matrix – matrix with many small elements that can be regarded as zeros without affecting the prescribed accuracy of the eigenvalues; 3) a parallel orthogonal block tridiagonal reduction algorithm that reduces a dense real symmetric matrix to block tridiagonal form using similarity transformations with a high ratio of level 3 BLAS operations. The parallel approximate eigensolver chooses a proper combination of the three algorithms depending on the structure of the input matrix and computes all the eigenpairs of the input matrix to prescribed accuracy.
Numerical results show that the parallel block tridiagonal divide-and-conquer
algorithm is very efficient when at least a few off-diagonal blocks have a relatively low rank. With a very low computational cost, the parallel block tridiagonalization algorithm constructs a block tridiagonal matrix from a sparse or “effectively” sparse input matrix. The parallel orthogonal block tridiagonal reduction algorithm achieves high performance due to high ratio of level 3 BLAS operations. Using a small block size for the parallel orthogonal block tridiagonal reduction algorithm is a critical factor for competitive performance when combined with the parallel block tridiagonal divide-and-conquer algorithm.
Our parallel approximate eigensolver has the limitation that the block tridiagonal
matrices, either as the input matrices or after pre-processing steps, should have offdiagonal blocks with low rank, say 20 or less, or a very high ratio of deflation to achieve satisfactory performance. In addition, large variation in deflation rate may lead to workload imbalance, although such cases appear to be rare. Future work may include a complete data parallel implementation of the block tridiagonal divide-and-conquer algorithm and a parallel adaptive eigensolver that detects matrix structure automatically, adjusts the accuracy requirement when necessary and chooses the proper algorithms to solve the eigenproblem.
Bai, Yihua, "High Performance Parallel Approximate Eigensolver for Real Symmetric Matrices. " PhD diss., University of Tennessee, 2005.