A bioinfonnatics approach to the study of the transcriptional regulation of AMPA Glutamate receptors (GRIAs) and genes whose expression are co-regulated with GRIAs
Abstract
It was postulated that each gene has three main sets of transcriptional elements: one which is gene-specific, one which is family-specific, and a third which is tissue-specific. The starting hypothesis for this project had been: "Each family of genes has a distinct set of transcriptional elements that is unique to this family". The primary aim of this project was therefore the identification of the family-specific set of transcriptional elements within the AMPA receptor gene family. The question then is how one measures or identifies this uniqueness within the promoters of this family of genes. The answer seemed to lie in making an assessment of the promoters of this family of genes
against a background of a comprehensive set of promoter sequences and in the process, trying to find the transcriptional elements that were present in the AMPA receptor gene promoters but were not so common in the general population of gene promoters. To achieve the primary aim of this project, it was essential that a comprehensive dataset of promoter sequences was available. There are ample data freely available through the web. However, it is often not available in a form that we might want it in. Another problem that one constantly encounters is the lack of general consensus among the research community in agreeing on a standard annotation. For example, a gene can sometimes be given 2 or 3 different names by different laboratories which have successfully cloned the same gene. This, in tum, hinders the data collection process. At the start of this project, there was an existing curated database of experimentally-verified eukaryotic promoter sequences called the Eukaryotic Promoter Database (EPD) and software called Promoter Extraction from GenBank (PEG) which, as its name implies, extracts promoter sequences available through GenBank (Cavin Perier et al., 1998; Zhang & Zhang, 2001; Praz et al., 2002; Schmid et al., 2004). However, limitations existed in both these resources. For EPD, the number of curated promoter sequences available was low and also, the length of these promoter sequences was short. For PEG, the main limitation was that the extraction from GenBank would result in the extraction of sequences of variable lengths.