Mining a Chinese hyperthermophilic metagenome
Abstract
Metagenomic sequencing of environmental samples provide direct access to
genomic information of organisms within the respective environments. This
sequence information represents a significant resource for the identification and subsequent characterization of potentially novel genes, or known genes with acquired novel characteristics. Within this context, the thermophilic environments are of particular interest due to its potential for deriving novel thermostable enzymes with biotechnological and industrial applications. In this work metagenomic library construction, random sequencing and sequence analysis strategies were employed to enhance identification and characterisation of potentially novel genes, from a thermophilic soil sample. High molecular weight metagenomic DNA was extracted from two Chinese hydrothermal soil samples. This was used as source material for the construction of four genomic DNA libraries. The combined libraries were estimated to contain in the order of 1.3 million genes, which provides a rich resource for gene identification. Approximately 70 kbp of sequence data was generated from one of the libraries as a resource for sequence-based analysis. Initial BLAST analysis predicted the presence of 53 ORFs/partial ORFs. The BLAST similarity scores for the investigated ORFs were sufficiently high (>40%) to infer homology with database proteins while also being indicative of novel sequence variants of these database matches. In an attempt to enhance the potential for deriving more full length ORFs a novel strategy, based on WGA technology, was employed. This resulted in the recovery of the near complete sequence of partial ORF5, directly from the
WGA DNA of the environmental sample. While the full length ORF5 could not be
recovered, the feasibility of this novel approach, for enhanced metagenomic
sequence recovery was proved in principle. The implementation of multiple insilico strategies resulted in the identification of two ORFs, classified as homologs of the DUF29 and Usp protein families respectively. The functional inference obtained from the integrated in-silico predictions was furthermore highly suggestive of a putative nucleotide binding/interaction role for both ORFs. A putative novel DNA polymerase gene (denoted TC11pol) was identified from the sequence data. Expression and characterization of the full length TC11pol did however not result in detectable polymerase activity. The implementation of a homology modeling approach proved succesfull for deriving a structural model of the polymerase that was used for: (i) deriving functional inferences of the potential activities of the polymerase and (ii) deriving a 5’ exonuclease deletion mutant for functional analysis. Expression and subsequent functional characterization of the putative 5’exo- TC11pol mutant resulted in detectable polymerase and 3’-5’ exonuclease activity at 37 and 45 oC, following a heat denaturation step at 55 oC for 1 hour. It was, therefore concluded that the putative 5’exo- TC11pol mutant was functionally equivalent to the Klenow fragment of E. coli, while exhibiting increased thermostability.