Show simple item record

dc.contributor.advisorChristoffels, Alan
dc.contributor.advisorGamieldien, Junaid
dc.contributor.authorDireko, Mmakamohelo
dc.date.accessioned2016-10-25T12:13:58Z
dc.date.available2016-10-25T12:13:58Z
dc.date.issued2011
dc.identifier.urihttp://hdl.handle.net/11394/5286
dc.description>Magister Scientiae - MScen_US
dc.description.abstractNext generation sequencing (NGS) technology platforms have accelerated ability to produce completed genome assemblies. Recently, collaborators at Tygerberg Medical School outsourced the sequencing of Oryx bacillus, a member of the Mycobacterium tuberculosis complex (MTC). A total of 31,271,059 short reads were generated and required filtering, assembly and annotation using bioinformatics algorithms. In this project, an NGS assembly pipeline was implemented, tailored specifically for SOLiD sequence data. The raw reads were aligned to seven fully sequenced and annotated MTC members, namely, Mycobacterium tuberculosis H37Rv, H37Ra, CDC1551, F11, KZN 1435, Mycobacterium bovis AF2122/97 and Mycobacterium bovis BCG str. Pasteur 1173P2 using NovoalignCS. Depth and breadth of sequence coverage across each base of the reference genome was calculated using BEDTools, and structural variation. Structural variation at the nucleotide level including deletions, insertions and single nucleotidepolymorphisms (SNPs) were called using three tools, GATK, SAMtools and Nesoni. These variations were further filtered using in-house PERL scripts. Putative functional roles for the alterations at the DNA level were extrapolated from the overlap with essential genes present in annotated MTC members. Approximately 20,730,631 short reads (59.78%) out of a total of 31,271,059 reads aligned to the seven reference genomes. The per base sequence coverage calculations revealed an average of 1,243 unaligned regions. These unaligned regions overlapped with mycobacterial regions of difference (RD) and genetic phage elements acquired by the MTC through horizontal gene transfer and are genes prevalent in the clinical isolates of M. tuberculosis. A total of 2,680 genetic variations were identified and categorised into 845 synonymous and 1,724 non-synonymous SNPs together with 44 insertions and 67 deletions. Some of the variant alleles overlapped known genes to be involved in TB drug resistance. While the biological significance of our findings remain to be elucidated, it nonetheless deserves further attention, because SNPs have the potential to impact on strain phenotype by gene disruption. Therefore, any hypotheses generated from these large-scale analyses will be tested by our collaborators at Tygerberg medical school.en_US
dc.publisherUniversity of the Western Capeen_US
dc.subjectMycobacterium tuberculosisen_US
dc.subjectOryx bacillusen_US
dc.subjectComparative genomicsen_US
dc.subjectABI SOLiD systemen_US
dc.subjectNext generation sequencingen_US
dc.titleGenome assembly of next-generation sequencing data for the Oryx bacillus : species of the Mycobacterium tuberculosis complexen_US
dc.rights.holderUniversity of the Western Capeen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record