Masters Theses

Date of Award


Degree Type


Degree Name

Master of Science


Computer Science

Major Professor

Rafael C. Gonzalez

Committee Members

Michael G. Thomason, Reinhold C. Mann


When converting paper documents into a format suitable for storage in a computer system, electronic documents or document images can only be retrieved when they are indexed. Unfortunately, manual indexing can account for over 75% of the conversion cost, thus, for many applications, it is important to consider automated methods of reading indexing fields in document images.

An automated technique was developed to read document control information on labels that are affixed in a free-form manner to documents recorded by the Office of the Knox County Register of Deeds. The method consisted of two main steps: Intelligent Field Detection (IFD) and field recognition. An IFD algorithm was developed to find the labels and is discussed, with particular emphasis on accuracy, speed, and practicality in a production environment. The method used to recognize indexing fields on the label is based on commercially available Optical Character Recognition (OCR) technology and a post-processing approach that uses a modifiable set of image enhancement operations to improve recognition. The algorithms were tested extensively using a database of over 13,000 document images; rates of 97% and 87% were achieved for label detection and the recognition of indexing fields, respectively. The methods developed can be used now to verify the correctness of the existing database of 1,000,000 document images, assist in the indexing of scanned images, and monitor the print quality of the labels that are affixed to documents recorded at the Knox County registry. However, they are also applicable to similar problems at most registries in the United States and other organizations where indexing fields are positioned on documents in an unconstrained manner.

