Selecting maximally informative genes

Androulakis IP

Computers & Chemical Engineering, Vol.29, No.3, 535-546, 2005

DOI10.1016/j.compchemeng.2004.08.037 Export Citation

Selecting maximally informative genes

Microarray experiments are emerging as one of the main driving forces in modem biology. By allowing the simultaneous monitoring of the expression of the entire genome for a given organism, array experiments provide tremendous insight into the fundamental biological processes that translate genetic information. One of the major challenges is to identify computationally efficient and biologically meaningful analysis approaches to extract the most informative and unbiased components of the microarray data. This process is complicated by the fact that a number of uncertainties are associated with array experiments. Therefore, the assumption of the existence of a unique computational descriptive model needs to be challenged. In this paper, we introduce a framework that integrates machine learning and optimization techniques for the selection of maximally informative genes in microarray expression experiments. The fundamental premise of the approach is that maximally informative genes are the ones that lead to least complex descriptive and predictive models. We propose a methodology, based on decision trees, which identifies ensembles of groups of maximally informative genes. We raise a number of computational issues that need to be comprehensively addressed and illustrate the approach by analyzing recently published microarray experimental data. (c) 2004 Elsevier Ltd. All rights reserved.

Keywords:maximally informative genes;microarray experiments;genetic information;machine learning;optimization