Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Herbert College of Agriculture
  4. Animal Science
  5. Animal Science Publications and Other Works
  6. A systematic comparison of genome-scale clustering algorithms
Details

A systematic comparison of genome-scale clustering algorithms

Date Issued
June 25, 2012
Author(s)
Jay, Jeremy J.  
Eblen, John D.
Zhang, Yun  
Benson, Mikael
Perkins, Andy D.  
Saxton, Arnold M.  
Voy, Brynn H.  
Chesler, Elissa J.
Langston, Michael A  
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/15320
Abstract

Background


A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.

Methods

For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method.

Results

Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods.

Conclusions

Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.

Disciplines
Animal Sciences
Recommended Citation
BMC Bioinformatics 2012, 13(Suppl 10):S7 doi:10.1186/1471-2105-13-S10-S7
Embargo Date
July 15, 2013
File(s)
Thumbnail Image
Name

1471_2105_13_S10_S7.pdf

Size

567.43 KB

Format

Adobe PDF

Checksum (MD5)

83d54dfaf2e498d6288676194313a71f

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify