Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

Lopez-Kleine L; Torres-Aviles F; Tejedor FH; Gordillo LA

Applied Microbiology and Biotechnology, Vol.93, No.5, 2091-2098, 2012

DOI10.1007/s00253-012-3917-3 Export Citation

Virulence factor prediction in Streptococcus pyogenes using classification and clustering based on microarray data

Lopez-Kleine L, Torres-Aviles F, Tejedor FH, Gordillo LA

Interesting biological information as, for example, gene expression data (microarrays), can be extracted from publicly available genomic data. As a starting point in order to narrow down the great possibilities of wet lab experiments, global high throughput data and available knowledge should be used to infer biological knowledge and emit biological hypothesis. Here, based on microarray data, we propose the use of cluster and classification methods that have become very popular and are implemented in freely available software in order to predict the participation in virulence mechanisms of different proteins coded by genes of the pathogen Streptococcus pyogenes. Confidence of predictions is based on classification errors of known genes and repetitive prediction by more than three methods. A special emphasis is done on the nonlinear kernel classification methods used. We propose a list of interesting candidates that could be virulence factors or that participate in the virulence process of S. pyogenes. Biological validations should start using this list of candidates as they show similar behavior to known virulence factors.

Keywords:Classification;Microarray data;Protein function;Statistical genomics;Support vector machines;Virulence factor