Masters Theses

Date of Award

5-1997

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

James S. Plank

Committee Members

Brad Vader Zanden

Abstract

As the choice of parallel platforms shifts from dedicated parallel machines to networks of workstations, the need for program fault-tolerance has never been greater. Checkpointing is the only means to provide programs with fault-tolerance in general-purpose computing environments. Checkpointing usually involves saving program states to disk. However, in parallel environments, stable storage becomes a bottleneck that prevents efficient checkpointing. Presented in this thesis are algorithms to provide parallel programs with fault-tolerance without relying on stable storage. An implementation of these algorithms was created and compared with the traditional disk-based algorithms. Results show that diskless checkpointing is a viable option to provide efficient fault-tolerance with low overhead.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS