A bioinformatics approach to the study of the transcriptional regulation of AMPA glutamate receptors (GRIAs) and genes whose expression are co-regulated with GRIAs
Abstract
It was postulated that each gene has three main sets of transcriptional elements: one which is gene-specific, one which is family-specific, and a third which is tissue-specific.The starting hypothesis for this project had been: “Each family of genes has a distinct set of transcriptional elements that is unique onto this family”. The primary aim of this project was therefore the identification of the family-specific set of transcriptional elements within the AMPA receptor gene family. The question then is how does one measure or identify this uniqueness within the promoters of this family of genes. The answer seemed to lie in making an assessment of the promoters of this family of genes against a background of a comprehensive set of promoter sequences and in the process,to try to find the transcriptional elements that were present in the AMPA receptor gene promoters but were not so common in the general population of gene promoters.To achieve the primary aim of this project, it was essential that a comprehensive dataset of promoter sequences was available. There are ample data freely available through the web. However, it is often not available in a form that we might want it in. Another
problem that one constantly encounters is the lack of general consensus among the research community in agreeing on a standard annotation. For example, a gene can sometimes be given 2 or 3 different names by different laboratories which have successfully cloned the same gene. This, in turn, hinders the data collection process. At the start of this project, there was an existing curated database of experimentally-verified eukaryotic promoter sequences called the Eukaryotic Promoter Database (EPD) and a software called Promoter Extraction from GenBank (PEG) which, as its name implies,
extracts promoter sequences available through GenBank (Cavin Périer et al., 1998;Zhang & Zhang, 2001; Praz et al., 2002; Schmid et al., 2004). However, limitations existed in both these resources. For EPD, the number of curated promoter sequences available was low and also, the length of these promoter sequences was short. For PEG,the main limitation was that the extraction from GenBank would result in extraction of sequences of variable lengths.Therefore, the 5’-end Information Extraction (FIE)system was developed for the expressed purpose of collecting promoter sequences without the limitations of PEG. This software relies on the alignment of multiple mRNA/cDNA sequences that are representative of a gene on the human genomic sequence to determine the transcription start site (TSS) of the gene and thus, with this information, extract the promoter sequence for the gene from the available human genomic sequence. This was the first promoter extraction software to work on this principle (Chong et al., 2002). This method was later supported by experimental work carried out by Coleman and colleagues (2002). Using the FIE2 software (Chong et al.,2003), some 10,000-odd human promoter sequences was extracted, starting at 1500bp uptream and ending at 1000bp downstream of the 5’-most TSS.Following the collection of the human promoter sequences, the approach developed by Bajic et al. (2004) was applied to study the promoters of the AMPA receptor genes. This approach relies on both the MATCH program to map putative transcription factor binding sites (TFBSs) to the promoter sequences and a software developed by Bajic etal. (2004) that calculates to the density for each TFBS or composite element. Having calculated the densities for the TFBSs and composite elements for both the target promoters (in this case, the AMPA receptor gene promoters) and the background promoters (the 10,000-odd human promoters), the software then calculates the degree of over-representation of each TFBS and composite element in the target promoters(measured against the background promoters) and then ranks the “singles”, “pairs” and “triplets” in the order of their degree of over-representation. Using this method, I identified the top 3 ranked “single”, “pair” and “triplet” transcriptional elements found commonly within the AMPA receptor promoters. In addition, a conventional phylogenetic footprinting study was also carried out for the human, mouse and rat GRIA1 promoter to identify key transcriptional elements within this subunit’s promoter.While the approach developed by Bajic et al. (2004) identifies key family-specific transcriptional elements, the phylogenetic footprinting study helps identify key genespecific transcriptional elements. Thus, they complement one another.The approach developed by Bajic et al. (2004) yielded an interesting result. It was found that the combination of the top 3 ranked “single”, “pair” and “triplet” transcriptional elements found in the AMPA receptor promoters were also found in 47 other genes. It was postulated that these 47 genes might, in fact, be co-regulated / co-expressed with the GRIAs and thus, explaining the existence of a shared promoter profile with the GRIA promoters. In support of this hypothesis, supporting evidence was found in published literature that 7 of these 47 genes (VAMP4, Rab3B, FKBP8, 3-OST-3A, CLSTN3,SOCS1 and IκBβ) might indeed be involved in the expression and functioning of the AMPA receptors.