화학공학소재연구정보센터
Korean Journal of Chemical Engineering, Vol.22, No.3, 345-352, May, 2005
A Gene Clustering Method with Masking Cross-matching Fragments Using Modified Suffix Tree Clustering Method
E-mail:
Multiple sequence alignment is a method for comparing two or more DNA or protein sequences. Most multiple sequence alignment methods rely on pairwise alignment and Smith-Waterman algorithm [Needleman and Wunsch, 1970; Smith and Waterman, 1981] to generate an alignment hierarchy. Therefore, as the number of sequences increases, the runtime increases exponentially. To resolve this problem, this paper presents a multiple sequence alignment method using a parallel processing suffix tree algorithm to search for common subsequences at one time without pairwise alignment. The cross-matched subsequences among the searched common subsequences may be generated and those cause inexact-matching. So the procedure of masking cross-matching pairs was suggested in this study. The proposed method, improved STC (Suffix Tree Clustering), is summarized as follows: (1) construction of suffix tree; (2) search and overlap of common subsequences; (3) grouping of subsequence pairs; (4) masking of cross-matching pairs; and (5) clustering of gene sequences. The new method was successfully evaluated with 23 genes in Mus musculus and 22 genes in three species, clustering nine and eight clusters, respectively.
  1. Chen JY, Carlis JV, Inf. Syst., 28, 287 (2003) 
  2. Choi SH, Manousiouthakis V, Korean J. Chem. Eng., 19(2), 227 (2002)
  3. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL, Nucleic Acids Res., 27(11), 2369 (1999) 
  4. Delcher AL, Phillippy A, Carlton J, Salzberg SL, Nucleic Acids Res., 30(11), 2478 (2002) 
  5. Gusfield D, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, London (1997)
  6. Higgins DG, Thompson JD, Gibson TJ, Methods Enzymol., 266, 383 (1996)
  7. Higgins DG, Sharp PM, Gene, 73, 237 (1988) 
  8. Hon WK, Sadakane K, "Space-Economical Algorithms for Finding Maximal Unique Matches", Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching, 144 (2002)
  9. Kalyanaraman A, Aluru S, Kothari S, Parallel EST Clustering, HICOMB 2002, 185 (2002)
  10. Kim DK, Lee KS, Yang DR, Korean J. Chem. Eng., 21(5), 942 (2004)
  11. Lee JM, Lee JH, Korean J. Chem. Eng., 21(2), 338 (2004)
  12. McCreight E, J. ACM, 23, 262 (1976) 
  13. Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA, Genome Res., 9, 1143 (1999) 
  14. Morgenstem B, Frech K, Dress A, Wemer T, Bioinformatics, 14, 290 (1998) 
  15. Mount DW, Bioinformatics : Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press (2001)
  16. Needleman SB, Wunsch CD, J. Mol. Biol., 48, 443 (1970) 
  17. Notredame C, Higgins DG, Nucleic Acids Res., 24, 1515 (1996) 
  18. Ostell JM, Wheelan SJ, Kans JA, Methods Biochem. Anal., 43, 19 (2001)
  19. Pearson WR, Miller W, Methods Enzymol., 210, 575 (1992)
  20. Phillips A, Janies D, Wheeler W, Molecular Phylogenetics and Evolution, 16, 317 (2000) 
  21. Randal LS, Christiansen T, Learning Perl, Second Edition, O'Reilly (1997)
  22. Salzberg SL, Searls DB, Kasif S, Trends Guide to Bioinformatics, Elsevier Science (1998)
  23. Shin PK, Koo JH, Lee WJ, Seo JH, Korean J. Chem. Eng., 13(1), 82 (1996)
  24. Smith TF, Waterman MS, J. Mol. Biol., 197, 723 (1981)
  25. Thompson JD, Higgins DG, Gibson TJ, Nucleic Acids Res., 22, 4673 (1994)
  26. Tisdall JD, Beginning Perl for Bioinformatics, O'REILLY (2001)
  27. Ukkonen E, Algorithmica, 14, 249 (1995) 
  28. Volfovsky N, Haas BJ, Salzberg SL, Genome Biology, 2, 1 (2001)
  29. Weiner P, "Linear Pattern Matching Algorithms", In Proc. of the 14th IEEE Annual Symposium on Switching and Automata Theory, 1 (1973)
  30. Zamir O, Etzioni O, Madani O, Karp RM, "Fast and Intuitive Clustering of Web Documents", In Proc. of the 3nd International Conference on Knowledge Discovery and Data Mining, 287 (1997)
  31. Zamir O , Etzioni O, "Web Document Clustering: A Feasibility Demonstration", In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 46 (1998)