Date of Award

12-2015

Degree Type

Thesis

Degree Name

Master of Science

Major

Computer Science

Major Professor

Michael D. Vose

Committee Members

Gregory D. Peterson, Jeremy Holleman

Abstract

There is an increasing gap between the rate at which data is generated by scientific and non-scientific fields and the rate at which data can be processed by available computing resources. In this paper, we introduce the fields of Bioinformatics and Cheminformatics; two fields where big data has become a problem due to continuing advances in the technologies that drives these fields: such as gene sequencing and small ligand exploration. We introduce high performance computing as a means to process this growing base of data in order to facilitate knowledge discovery. We enumerate goals of the project including reusability, efficiency, reliability, and scalability. We then describe the implementation of a software scheduler which aims to improve input and output performance of a targeted collection of informatics tools, as well as the profiling and optimization needed to tune the software. We evaluate the performance of the software with a scalability study of the Bioinformatics tools BLAST, HMMER, and MUSCLE; as well as the Cheminformatics tool DOCK6.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS