Biochemical and Biophysical Research Communications, Vol.302, No.2, 296-301, 2003
Non-randomness in Shine-Dalgarno regions: links to gene characteristics
A probabilistic approach to the study of the Shine-Dalgarno region was used to identify the most non-random positions based on parsing of genomes in four species: Escherichia coli, Bacillus subtilis, the AT-rich Clostridium perfringens, and the GC-rich Streptomyces coelicolor. The compositional non-randomness shows a clear peak centered around 9-11 nucleotides upstream of the start codon. This peak was in all species associated with guanine as the most abundant nucleotide, flanked by guanine in the closest proximity and adenines farther away (cytosine in case of S. coelicolor). Using contingency tables, the nucleotides in the Shine-Dalgarno region were shown to have a strong association to the choice of start codons. We also show that gene characteristics such as length, aromaticity, and lipophilicity are related to the nucleotide at this peak position upstream of the start codon. (C) 2003 Elsevier Science (USA). All rights reserved.