Sparse matrix-vector multiplication kernels on the Cray T3D
In this thesis we examine sparse matrix-vector multiplication algorithms for massively- parallel computers such as the CRAY T3D. Performance results on a 256-processor CRAY T3D are presented along with a detailed analysis of each algorithm's computational complexity. The specific sparse matrix-vector multiplication algorithms discussed are the block-block algorithm (BBA) for square matrices, and the row and column block algorithms (RBA, CBA) for rectangular matrices. We also discuss the performance of these algorithms within applications, such as the Conjugate Gradient kernel from the NAS Parallel Benchmarks and a block-Lanczos method from SVD- PACK for finding the largest singular triplets of a sparse matrix. Results of this study demonstrate that the Conjugate Gradient benchmark for the class A problem size (matrix order 14,000) can be executed in 1.3 seconds which is quite competitive with published results for this benchmark on other massively-parallel machines.
Thesis94K758.pdf
1.59 MB
Unknown
5bb467fc656a1e60c2291816fcd4fcd3