The molecular evolution and epidemiology of Rubella virus
Despite widespread rubella virus (RV) vaccination programs, annually RV still causes severe congenital defects in an estimated 100,000 children globally. A concerted attempt to eradicate RV is currently underway and analytical tools to monitor the global decline of the last remaining RV lineages will be useful for assessing the effectiveness of this endeavour. Importantly, RV evolves rapidly enough that much of its epidemiological information might be inferable from RV genomic sequence data. Using BEASTv1.8.0, I analysed publically available RV sequence data to estimate genome-wide and gene-specific nucleotide substitution rates, to test whether the current estimates of RV substitution rates are representative of the entire RV genome. During these investigations, I specifically accounted for possible confounders of nucleotide substitution rate estimates, such as temporally biased sampling, sporadic recombination, and natural selection favouring either increased or decreased genetic diversity (estimated by the PARRIS and FUBAR methods) at nucleotide sites within RV nucleic acid secondary structures (predicted by the NASP method). I determined that RV nucleotide substitution rates range from 1.19×10-3 substitutions/site/year (in the E1 region) to 7.52×10-4 substitutions/site/year (in the P150 region). I found that these differences between nucleotide substitution rate estimates in various RV gene regions are largely attributable to temporal sampling biases, such that datasets containing a higher proportion of recently sampled sequences will tend to have inflated estimates of mean substitution rates. Although there exists little evidence of positive selection or natural genetic recombination in RV, I revealed that RV genomes possess extensive biologically functional nucleic acid secondary structures and that purifying selection acting to maintain these structures contributes substantially to variations in estimated nucleotide substitution rates across RV genomes. Although both temporal sampling biases and purifying selection favouring the conservation of RV nucleic acid secondary structures have an appreciable impact on substitution rate estimates, I find that these biases do not preclude the use of RV sequence data to date ancestral sequences and evaluate the associated RV phylodynamics. The combination of uniformly high substitution rates across the RV genome and strong temporal signal within the available sequence data enabled me to analyse the epidemiological and demographical dynamics of this virus during these attempts to eradicate it. By implementing a generalized linear model (GLM) and symmetrical model of discretized phylogeographic spread, I was able to identify several predictive variables of geographical RV spread and detect transmission linkages between distinct geographical regions. These results suggest that, in addition to strengthened vaccination strategies, there also needs to be an increased effort to educate people about the effects of vaccination and risks of RV infection.