Biochemical and Biophysical Research Communications, Vol.347, No.1, 141-144, 2006
Automatic transcription factor classifier based on functional domain composition
To understand the transcriptional regulatory mechanism, it is indispensable to identify transcription factors (TF) from the whole genome and to classify transcription factors into different classes. New computational approaches have been developed to identify TFs/non-TFs, and furthermore to classify TFs into four different classes, based on the protein functional domain composition [K.C. Chou, Y.D. Cai, Using functional domain composition and support vector machines for prediction of protein subcellular location, J. Biol. Chem. 277 (2002) 45765-45769]. We trained and tested our method on a non-redundancy dataset consisting of 74 transcription factors collected from TRANSFAC v7.0 [V. Matys, O.V. Kel-Margoulis, E. Fricke, I. Liebich, S. Land, A. Barre-Dirrie, I. Reuter, D. Chekmenev, M. Krull, K. Hornischer, N. Voss, P. Stegmaier, B. Lewicki-Potapov, H. Saxel, A.E. Kel, E. Wingender, TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes, Nucleic Acids Res. 34 (2006) D108-D110] and 1558 non-transcription factors from UniProtKB/Swiss-Prot Release 49.3 of 2 1-Mar-2006. The overall success rates of jackknife cross-validation tests reached 98.4% for TF/non-TF identification and 97.2% for classifications of TF classes: basic domains, zinc-coordinating DNA-binding domains. helix-turn-helix, and beta-scaffold factors. (c) 2006 Elsevier Inc. All rights reserved.
Keywords:transcription factors;functional domain composition;intimate sorting classifier;jackknife cross-validation test