Doctoral Dissertations

Date of Award

8-2020

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Mathematics

Major Professor

Vasileios Maroulas

Committee Members

Piotr Franaszczuk, Michael Berry, Mathew Langford

Abstract

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques from the theory of marked point processes. After defining a Bayesian model in generality, a conjugate family of prior intensities is introduced via Gaussian mixtures to obtain a closed form of the posterior intensity. This enables efficient computation of posterior distributions for persistence diagrams. The utility of this Bayesian framework is demonstrated on classification problems with materials science and electroencephalography data. Viewing persistence diagrams as point processes, one may also define a kernel density estimator to approximate probability distributions of persistence diagrams in a nonparametric fashion. This dissertation uses the kernel density estimator to create a novel hypothesis test to detect specific time series dynamics in noisy measurements. Finally, the problem of data augmentation, the overarching goal of which is to increase training set diversity by generating additional training examples from existing ones by preserving large scale structures in elements of the training set, is considered. Herein, a novel data augmentation framework that considers the topology of data is introduced. Intuitively speaking, this new method 'adds noise' to training examples through controlled topological perturbations, which preserve large scale structure in data. The effectiveness of the novel data augmentation pipeline in training deep learners to classify atomic probe tomography and image data in the cases of balanced and unbalanced training examples is examined.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS