Faculty Publications and Other Works -- EECS

Document Type


Publication Date




Advances in sequencing technologies are outpacing the rate at which genomes can be thoroughly finished and analyzed. Over the next year, genome sequencing will increase many-fold, but high quality and high-throughput annotation methods have yet to be developed to handle the need. As more microbial genomes are sequenced, whole-genome annotation methods identify many putative genes which need further verification. By analyzing a broad range of annotated genomes we can identify patterns and statistics useful in determining the annotation quality and spurious gene outliers. Our work is attempting to identify quality control measures based on a full inter-genomic comparison instead of individual sequence-level or database-specific statistics. Using these methods to compare and filter, it is possible to narrow the scope of manual gene curation and allow greater scrutiny on putative genes before publication, making higher quality genome annotation possible. Our results plainly show the quality of well-studied genomes, the weaknesses of draft genome builds, and illustrate the need for further high-throughput quality control measures.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."