DATE Analysis: A General Theory of Biological Change Applied to Microarray Data

Rasnick D

Biotechnology Progress, Vol.25, No.5, 1275-1288, 2009

In contrast to conventional data mining, which searches for specific subsets of genes (extensive variables) to correlate with specific phenotypes, DATE analysis correlates intensive state variables calculated from the same datasets. At the heart of DATE analysis are two biological equations of state not dependent on genetic pathways. This result distinguishes DATE analysis from other bioinformatics approaches. The dimensionless state variable F quantifies the relative overall cellular activity of test cells compared to well-chosen reference cells. The variable pi(i) is the fold-change in the evpression of the ith gene of test cells relative to reference. It is the fraction phi of the genome undergoing differential expression-not the magnitude pi-that controls biological change. The state variable 4) is equivalent to the control strength of metabolic control analysis. For tractability, DATE analysis assumes a linear system of enzyme-connected networks and exploits the small average contribution of each cellular component. This approach was validated by reproducible values of the state variables F, RNA index, and phi calculated from random subsets of transcript microarray data. Using published microarray data, F, RNA index, and phi were correlated with: (1) the blood-feeding cycle of the malaria parasite, (2) embryonic development of the fruit fly, (3) temperature adaptation of Killifish, (4) exponential growth of cultured S. pneumoniae, and (5) human cancers. DATE analysis was applied to aCGH data from the great apes. A good example of the power of DATE analysis is its application to genomically unstable cancers, which have been refractory to data mining strategies. (c) 2009 American Institute of Chemical Engineers Biotechnol. Prog., 25: 1275-1288, 2009

Keywords:metabolic;bioinformatics;state equations;control-analysis;data mining