Semantic discovery and computational filtering to identify potentially novel breast cancer genes and signatures in omics data
Abstract
High-throughput sequencing technologies developed rapidly in recent years. Using such platforms to sequence DNA and RNA samples has been shown to be a powerful method to analyze the genome and transcriptome of even very complex eukaryotic organisms, including humans and diseases like cancer, which results in substantial genomic and gene expression changes compared to healthy tissues. Such analyses have led to the discovery of hundreds of thousands of novel genetic and transcriptomic variations associated with disease conditions such as breast cancer. The Cancer Genome Atlas (TCGA) is one such database which maintains RNA sequencing data of all cancer related genes. However, searching such a database for aberrations that contribute to the specific disease condition can be cumbersome, especially since a relatively small set of mutations and/or expression changes are drivers of the disease, with the large majority being ‘passengers’. Similarly, mutated or differentially expressed genes that are not yet known to be related to breast cancer may be incorrectly discarded as they are not ‘classical’ cancer genes.