Synergistic use of promoter prediction algorithms: a choice of small training dataset?
Oppon, Ekow CruickShank
MetadataShow full item record
Promoter detection, especially in prokaryotes, has always been an uphill task and may remain so, because of the many varieties of sigma factors employed by various organisms in transcription. The situation is made more complex by the fact, that any seemingly unimportant sequence segment may be turned into a promoter sequence by an activator or repressor (if the actual promoter sequence is made unavailable). Nevertheless, a computational approach to promoter detection has to be performed due to number of reasons. The obvious that comes to mind is the long and tedious process involved in elucidating promoters in the ‘wet’ laboratories not to mention the financial aspect of such endeavors. Promoter detection/prediction of an organism with few characterized promoters (M.tuberculosis) as envisaged at the beginning of this work was never going to be easy. Even for the few known Mycobacterial promoters, most of the respective sigma factors associated with their transcription were not known. If the information (promoter-sigma) were available, the research would have been focused on categorizing the promoters according to sigma factors and training the methods on the respective categories. That is assuming that, there would be enough training data for the respective categories. Most promoter detection/prediction studies have been carried out on E.coli because of the availability of a number of experimentally characterized promoters (+- 310). Even then, no researcher to date has extended the research to the entire E.coli genome.