Normalization and statistical methods for crossplatform expression array analysis

Mapiye, Darlington S

dc.contributor.advisor	Gamieldien, Junaid
dc.contributor.advisor	Christoffels, Alan
dc.contributor.author	Mapiye, Darlington S
dc.date.accessioned	2015-10-19T12:49:53Z
dc.date.available	2015-10-19T12:49:53Z
dc.date.issued	2012
dc.identifier.uri	http://hdl.handle.net/11394/4586
dc.description	>Magister Scientiae - MSc	en_US
dc.description.abstract	A large volume of gene expression data exists in public repositories like the NCBI’s Gene Expression Omnibus (GEO) and the EBI’s ArrayExpress and a significant opportunity to re-use data in various combinations for novel in-silico analyses that would otherwise be too costly to perform or for which the equivalent sample numbers would be difficult to collects exists. For example, combining and re-analysing large numbers of data sets from the same cancer type would increase statistical power, while the effects of individual study-specific variability is weakened, which would result in more reliable gene expression signatures. Similarly, as the number of normal control samples associated with various cancer datasets are often limiting, datasets can be combined to establish a reliable baseline for accurate differential expression analysis. However, combining different microarray studies is hampered by the fact that different studies use different analysis techniques, microarray platforms and experimental protocols. We have developed and optimised a method which transforms gene expression measurements from continuous to discrete data points by grouping similarly expressed genes into quantiles on a per-sample basis. After cross mapping each probe on each chip to the gene it represents, thereby enabling us to integrate experiments based on genes they have in common across different platforms. We optimised the quantile discretization method on previously published prostate cancer datasets produced on two different array technologies and then applied it to a larger breast cancer dataset of 411 samples from 8 microarray platforms. Statistical analysis of the breast cancer datasets identified 1371 differentially expressed genes. Cluster, gene set enrichment and pathway analysis identified functional groups that were previously described in breast cancer and we also identified a novel module of genes encoding ribosomal proteins that have not been previously reported, but whose overall functions have been implicated in cancer development and progression. The former indicates that our integration method does not destroy the statistical signal in the original data, while the latter is strong evidence that the increased sample size increases the chances of finding novel gene expression signatures. Such signatures are also robust to inter-population variation, and show promise for translational applications like tumour grading, disease subtype classification, informing treatment selection and molecular prognostics.	en_US
dc.language.iso	en	en_US
dc.publisher	University of the Western Cape	en_US
dc.subject	Differential expression analysis	en_US
dc.subject	Expression array	en_US
dc.subject	Quantile discretization	en_US
dc.subject	Gene expression	en_US
dc.title	Normalization and statistical methods for crossplatform expression array analysis	en_US
dc.rights.holder	University of the Western Cape	en_US

Files in this item

Name:: Mapiye_MSC_2012.pdf
Size:: 20.35Mb
Format:: PDF
Description:: Thesis

View/Open

This item appears in the following Collection(s)

Magister Scientiae - MSc (Bioinformatics)

Show simple item record