Date of Award

12-2011

Degree Type

Dissertation

Degree Name

Doctor of Philosophy

Major

Computer Science

Major Professor

Jian Huang

Committee Members

Jack Dongarra, Joshua Fu, Richard Mills

Abstract

According to a recent exascale roadmap report, analysis will be the limiting factor in gaining insight from exascale data. Analysis problems that must operate on the full range of a dataset are among the most difficult. Some of the primary challenges in this regard come from disk access, data managment, and programmability of analysis tasks on exascale architectures. In this dissertation, I have provided an architectural approach that simplifies and scales data analysis on supercomputing architectures while masking parallel intricacies to the user. My architecture has three primary general contributions: 1) a novel design pattern and implmentation for reading multi-file and variable datasets, 2) the integration of querying and sorting as a way to simplify data-parallel analysis tasks, and 3) a new parallel programming model and system for efficiently scaling domain-traversal tasks.

The design of my architecture has allowed studies in several application areas that were not previously possible. Some of these include large-scale satellite data and ocean flow analysis. The major driving example is of internal-model variability assessments of flow behavior in the GEOS-5 atmospheric modeling dataset. This application issued over 40 million particle traces for model comparison (the largest parallel flow tracing experiment to date), and my system was able to scale execution up to 65,536 processes on an IBM BlueGene/P system.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS