Journal of Bioscience and Bioengineering, Vol.106, No.2, 154-161, 2008
A Motif Detection and Classification Method for Peptide Sequences Using Genetic Programming
An exploration of common rules (property motifs) in amino acid sequences has been required for the design of novel sequences and elucidation of the interactions between molecules controlled by the structural or physical environment. In the present study, we developed a new method to search property motifs that are common in peptide sequence data. Our method comprises the following two characteristics: (i) the automatic determination of the position and length of common property motifs by calculating the physicochemical similarity of amino acids, and (ii) the quick and effective exploration of motif candidates that discriminates the positives and negatives by the introduction of genetic programming (GP). Our method was evaluated by two types of model data sets. First, the intentionally buried property motifs were searched in the artificially derived peptide data containing intentionally buried property motifs. As a result, the expected property motifs were correctly extracted by our algorithm. Second, the peptide data that interact with MHC class II molecules were analyzed as one of the models of biologically active peptides with buried motifs in various lengths. Twofold MHC class II binding peptides were identified with the rule using our method, compared to the existing scoring matrix method. In conclusion, our GP based motif searching approach enabled to obtain knowledge of functional aspects of the peptides without any prior knowledge.