Doctoral Dissertations

Date of Award

8-2023

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Michael W. Berry

Committee Members

Michael Jantz, Audris Mockus, Joan Lind

Abstract

Tensors, or n-way arrays, are incredibly useful for storing indexable data in an arbitrary number of dimensions. Interest in tensor analysis using tensor decomposition has expanded to a variety of fields, including data mining, signal processing, computer vision, and machine learning. Tensors modelling interesting data may also be sparse, where the majority of its values are zero. These tensors can be extremely large and contain millions of entries that cannot be stored explicitly. To address this problem, various formats have arisen in the past decade to compress and compact such massive data. However, most of these existing structures are static and do not support tensor updates. This motivated the proposal of a new format in 2021, Hashed Coordinate Storage (HaCOO), a mode-agnostic format that stores sparse tensor indexes and values in a separate chaining hash table to rapidly insert and access arbitrary entries in constant time. To investigate the benefits of this novel format, we introduce a MATLAB class to create and manipulate sparse tensors in HaCOO format. This class was evaluated alongside MATLAB Tensor Toolbox using several real-world sparse tensor datasets to compare tensor update capability and MTTKRP, a key kernel in Canonical Polyadic Decomposition. Additionally, we discuss how HaCOO format can greatly accelerate building document tensors in a practical application of using sparse tensor decomposition in a text analysis model.

Comments

fixed small typo

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS