A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats

Peter Ahrens, Helen Xu, and Nicholas Schiefer, “A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats,” in 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2018, pp. 546–556.


Many sparse matrices and tensors from a variety of applications, such as finite element methods and computational chemistry, have a natural aligned rectangular nonzero block structure. Researchers have designed high-performance blocked sparse operations which can take advantage of this sparsity structure to reduce the complexity of storing the locations of nonzeros. The performance of a blocked sparse operation depends on how well the block size reflects the structure of nonzeros in the tensor. Sparse tensor structure is generally unknown until runtime, so block size selection must be efficient. The fill is a quantity which, for some block size, relates the number of nonzero blocks to the number of nonzeros. Many performance models use the fill to help choose a block size. However, the fill is expensive to compute exactly. We present a sampling-based algorithm called Phil to estimate the fill of sparse matrices and tensors in any format. We provide theoretical guarantees for sparse matrices and tensors, and experimental results for matrices. The existing state-of-the-art fill estimation algorithm, which we will call OSKI, runs in time linear in the number of elements in the tensor. The number of samples Phil needs to compute a fill estimate is unrelated to the number of nonzeros and depends only on the order (number of dimensions) of the tensor, desired accuracy of the estimate, desired probability of achieving this accuracy, and number of considered block sizes. We compare Phil and OSKI on a suite of 42 matrices. On most inputs, Phil estimates the fill at least 2 times faster and often more than 20 times faster than OSKI. Phil consistently produced accurate estimates; in all cases that we tested Phil was faster and/or more accurate than OSKI. Finally, we find that Phil and OSKI produce comparable speedups in multicore blocked sparse matrix-vector multiplication (SpMV) when the block size was chosen using fill estimates in a model due to Vuduc et al.

Official Link