Korean Journal of Chemical Engineering, Vol.22, No.3, 345-352, May, 2005
A Gene Clustering Method with Masking Cross-matching Fragments Using Modified Suffix Tree Clustering Method
E-mail:
Multiple sequence alignment is a method for comparing two or more DNA or protein sequences. Most multiple sequence alignment methods rely on pairwise alignment and Smith-Waterman algorithm [Needleman and Wunsch, 1970; Smith and Waterman, 1981] to generate an alignment hierarchy. Therefore, as the number of sequences increases, the runtime increases exponentially. To resolve this problem, this paper presents a multiple sequence alignment method using a parallel processing suffix tree algorithm to search for common subsequences at one time without pairwise alignment. The cross-matched subsequences among the searched common subsequences may be generated and those cause inexact-matching. So the procedure of masking cross-matching pairs was suggested in this study. The proposed method, improved STC (Suffix Tree Clustering), is summarized as follows: (1) construction of suffix tree; (2) search and overlap of common subsequences; (3) grouping of subsequence pairs; (4) masking of cross-matching pairs; and (5) clustering of gene sequences. The new method was successfully evaluated with 23 genes in Mus musculus and 22 genes in three species, clustering nine and eight clusters, respectively.
- Chen JY, Carlis JV, Inf. Syst., 28, 287 (2003)
- Choi SH, Manousiouthakis V, Korean J. Chem. Eng., 19(2), 227 (2002)
- Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL, Nucleic Acids Res., 27(11), 2369 (1999)
- Delcher AL, Phillippy A, Carlton J, Salzberg SL, Nucleic Acids Res., 30(11), 2478 (2002)
- Gusfield D, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, Cambridge, London (1997)
- Higgins DG, Thompson JD, Gibson TJ, Methods Enzymol., 266, 383 (1996)
- Higgins DG, Sharp PM, Gene, 73, 237 (1988)
- Hon WK, Sadakane K, "Space-Economical Algorithms for Finding Maximal Unique Matches", Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching, 144 (2002)
- Kalyanaraman A, Aluru S, Kothari S, Parallel EST Clustering, HICOMB 2002, 185 (2002)
- Kim DK, Lee KS, Yang DR, Korean J. Chem. Eng., 21(5), 942 (2004)
- Lee JM, Lee JH, Korean J. Chem. Eng., 21(2), 338 (2004)
- McCreight E, J. ACM, 23, 262 (1976)
- Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA, Genome Res., 9, 1143 (1999)
- Morgenstem B, Frech K, Dress A, Wemer T, Bioinformatics, 14, 290 (1998)
- Mount DW, Bioinformatics : Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press (2001)
- Needleman SB, Wunsch CD, J. Mol. Biol., 48, 443 (1970)
- Notredame C, Higgins DG, Nucleic Acids Res., 24, 1515 (1996)
- Ostell JM, Wheelan SJ, Kans JA, Methods Biochem. Anal., 43, 19 (2001)
- Pearson WR, Miller W, Methods Enzymol., 210, 575 (1992)
- Phillips A, Janies D, Wheeler W, Molecular Phylogenetics and Evolution, 16, 317 (2000)
- Randal LS, Christiansen T, Learning Perl, Second Edition, O'Reilly (1997)
- Salzberg SL, Searls DB, Kasif S, Trends Guide to Bioinformatics, Elsevier Science (1998)
- Shin PK, Koo JH, Lee WJ, Seo JH, Korean J. Chem. Eng., 13(1), 82 (1996)
- Smith TF, Waterman MS, J. Mol. Biol., 197, 723 (1981)
- Thompson JD, Higgins DG, Gibson TJ, Nucleic Acids Res., 22, 4673 (1994)
- Tisdall JD, Beginning Perl for Bioinformatics, O'REILLY (2001)
- Ukkonen E, Algorithmica, 14, 249 (1995)
- Volfovsky N, Haas BJ, Salzberg SL, Genome Biology, 2, 1 (2001)
- Weiner P, "Linear Pattern Matching Algorithms", In Proc. of the 14th IEEE Annual Symposium on Switching and Automata Theory, 1 (1973)
- Zamir O, Etzioni O, Madani O, Karp RM, "Fast and Intuitive Clustering of Web Documents", In Proc. of the 3nd International Conference on Knowledge Discovery and Data Mining, 287 (1997)
- Zamir O , Etzioni O, "Web Document Clustering: A Feasibility Demonstration", In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 46 (1998)