Biotechnology Progress, Vol.19, No.4, 1142-1148, 2003
An adaptive strategy for single- and multi-cluster gene assignment
Strict assignment of genes to one class, dimensionality reduction, a priori specification of the number of classes, the need for a training set, nonunique solution, and complex learning mechanisms are some of the inadequacies of current clustering algorithms. Existing algorithms cluster genes on the basis of high positive correlations between their expression patterns. However, genes with strong negative correlations can also have similar functions and are most likely to have a role in the same pathways. To address some of these issues, we propose the adaptive centroid algorithm (ACA), which employs an analysis of variance (ANOVA)-based performance criterion. The ACA also uses Euclidian distances, the center-of-mass principle for heterogeneously distributed mass elements, and the given data set to give unique solutions. The proposed approach involves three stages. In the first stage a two-way ANOVA of the gene expression matrix is performed. The two factors in the ANOVA are gene expression and experimental condition. The residual mean squared error (MSE) from the ANOVA is used as a performance criterion in the ACA. Finally, correlated clusters are found based on the Pearson correlation coefficients. To validate the proposed approach, a two-way ANOVA is again performed on the discovered clusters. The results from this last step indicate that MSEs of the clusters are significantly lower compared to that of the fibroblast-serum gene expression matrix. The ACA is employed in this study for single- as well as multi-cluster gene assignments.