Integrating regulatory and methylome data for the discovery of clear cell Renal Cell Carcinoma (ccRCC) variants
Kidney cancers, of which clear cell renal cell carcinoma comprises an estimated 70%, have been placed amongst the top ten most common cancers in both males and females. With a mortality rate that exceeds 40%, kidney cancer is considered the most lethal cancer of the genitourinary system. Despite advances in its treatment, the mortality- and incidence rates across all stages of the disease have continued to climb. Since the release of the Human Genome Project in the early 2000’s, most genetics studies have focused on the protein coding region of the human genome, which accounts for a mere 2% of the entire genome. It has been suggested that diverting our focus to the other 98% of the genome, which was previously dismissed as non-functional “junk DNA”, could possibly contribute significantly to our understanding of the underlying mechanisms of complex diseases.In this study a whole genome sequencing somatic mutation data set from the International Cancer Genome Consortium was used. The non-coding somatic mutations within the promoter, intronic, 5-prime untranslated and 3-prime untranslated regions of clear cell renal cell carcinoma-implicated genes were extracted and submitted to RegulomDB for their functional annotation.As expected, most of the variants were located within the intronic regions and only a small subset of identified variants was predicted to be deleterious. Although the variants all belonged to a selected subset of kidney cancer-associated genes, the genes frequently mutated in the non-coding regions were not the same genes that were frequently mutated in the whole exome studies (where the focus is on the coding sequences). This indicates that with whole genome sequencing studies a new set of genes/variants previously unassociated with the clear cell renal cell carcinoma could be identified. In addition, most of the non-coding somatic variants fell within multiple transcriptions factor binding sites. Since many of these variants were also deleterious (as predicted by RegulomDB), this suggests that mutations in the non-coding regions could contribute to disease due to their role in transcription factor binding site disruptions and their subsequent impact on transcriptional regulation. The substantial overlap between the genes with the most aberrantly methylated variants and the genes with the most transcription factor binding site disruptions signifies a potential link between differential methylation and transcription factor binding site affinities. In contrast to the upregulated DNA methylation generally seen in promoter methylation studies, all of the significant hits in this study were hypomethylated, with the subsequent up-regulation of the genes of interest, suggesting that in the clear cell renal cell carcinoma, aberrant methylation may play a role in activating proto-oncogenes, rather than the silencing of genes. When a cross-analysis was carried out between the gene expression patterns and the transcription factor binding site disruptions, the non-coding somatic variants and differential methylation profiles, the genes affected again showed a clear overlap. Interestingly, most of the variants were not present in the 1000genomes data and thus represent novel mutations, which possibly occurred as a result of genomic instability. However, identifying novel variants are always promising, since they epitomise the possibility of developing pioneering ways to target diseases. The numerous detrimental effects a single non-coding mutation can have on other genomic processes have been demonstrated in this study and therefore validate the inclusion of non-coding regions of the genome in genetic studies in order to study complex multifactorial diseases.