Chapter 3. Whole genome sequencing, general genome features, and biosynthetic
3.3 Results and Discussion
3.3.3 Secondary metabolites biosynthetic potential of the class Ktedonobacteria . 117
117
genome sequences and needs to be further confirmed via experiments because of their large sizes.
3.3.3 Secondary metabolites biosynthetic potential of the class Ktedonobacteria
118
Ktedonobacteria could be an alternative to Streptomyces and serve as a next-generation microbial resource.
Moreover, composition of the antiSMASH-predicted BGC types were very diverse in the 23 Ktedonobacteria genomes (Fig. 3-4). For example, RiPPs family gene clusters encoding for lanthipeptide, lasso peptide, thiopeptide, bacteriocin, and linear azol(in)e-containing peptide (LAP) predominated and counted for almost half of the total antiSMASH-predicted BGCs in the genomes of Ts. hazakensis SK20-1T, Ts. hazakensis COM3, Dictyobacter sp. SOSP1-9, Ktedonobacteraceae bacterium SOSP1-1, and Ktedonobacterales bacterium SCAWS-G2. By contrast, NRPS/T1PKS hybrid gene clusters were the main antiSMASH-predicted BGC types in strains of D. alpinus Uno16T and Dictyobacter sp. Uno17, whereas strain Ktedonobacterales bacterium 150040 and Tg. aurantia A1-2T were more abundant in NRPS gene clusters. Overall, it was observed that BGCs encoding for peptide compounds including NRPS, NRPS/T1PKS hybrid, and RiPP family gene clusters predominate in the Ktedonobacteria genomes (Fig. 3-4), which may assist them in fighting against their competitors and predators in their niches (Challis & Naismith, 2004; Naughton et al., 2017; Ortega & van der Donk, 2016; Zhang et al., 2012).
This hypothesis could be supported by the observation that the genus Thermogemmatispora, which exhibited the lowest number of BGCs in the class Ktedonobacteria, predominates in geothermal niches in which the microbial community richness is markedly decreased (de Miera et al., 2014; Jiang et al., 2016; Yabe et al., 2017a). Owing to a lack of competitors and predators (Yabe et al., 2017a), the genus Thermogemmatispora may tend to reduce the number of BGCs in their genomes as secondary metabolites are not necessary but are always costly for microorganisms. Besides, PKS gene clusters encoding the polyketide compounds were the second most abundant BGC types. Hence, we focused on NRPS, PKS, NRPS/PKS hybrid, and lantipeptide BGCs in the next section.
119
3.3.3.2 NRPS, PKS, NRPS/PKS Hybrid, and lantipeptide gene clusters
A total of 39 putative antiSMASH-predicted NRPS gene clusters were identified in the 23 available Ktedonobacteria genomes, ranging from 2.1 kbp (Region 14.1 in strain Tg. aurantia A1-2T) to 154.4 kbp (Region 2.4 in strain T. tsumagoiensis Uno3T) in size. Domain composition and organization analysis indicated that some of the NRPS gene clusters such as Region 10 in Ts. hazakensis COM3 (Fig. 3-6A) could produce peptide products with a ring moiety, by the existence of multiple epimerization (E) domain and heterocyclization (HC) domain in the cluster (Bloudoff et al., 2017; Chen et al., 2016). In addition, the existence of fatty acyl-AMP ligase (FAAL) domain in gene clusters such as Region 2.1 of T.
tsumagoiensis Uno3T (Fig. 3-6B) suggests that the final product of this gene cluster may constitute a lipopeptide or its derivatives. The FAAL domain was first discovered in Mycobacterium tuberculosis and activates fatty acids such as acyl adenylates, subsequently catalyzing their transfer onto the ACPs of PKSs or non-ribosomal peptide synthetases to produce lipidic metabolites (Hayashi et al., 2011).
As for PKS gene clusters, 19 type I PKS gene clusters and 13 type II PKS gene clusters were identified. The 19 type I PKS gene clusters spanned from 33.9 kbp (Region 4.1 in strain Ktedonobacter sp. SOSP1-30) to 46.7 kb (Region 2.2 in strain Dictyobacteraceae bacterium SOSP1-142) in size whereas the 13 type II PKS gene clusters spanned from 28.8 kbp (Region 97.1 in strain Dictyobacter sp.
Uno17) to 72.5 kb (Region 2.2 in strain Ktedonobacterales bacterium 150040).
Domain composition and organization analysis indicated that some of the PKS gene clusters such as Region 2.4 in strain D. alpinus Uno16T (Fig. 3-7) contained two additional NRPS-A domains that were predicted to function as a long chain fatty acid CoA ligase and an acyl CoA synthetase by Protein-Protein Blast, indicating that the final products of this cluster may comprise derivatives of lipids (Hisanaga et al., 2004; Soupene & Kuypers, 2008).
Thirtyfive antiSMASH-predicted BGCs were classified to be NRPS/T1PKS hybrid gene clusters, which were distributed mainly in strains of D. alpinus Uno16T, Dictyobacter sp. Uno17, T. tsumagoiensis Uno3T, Ts. hazakensis SK20-1T, and Ts.
120
hazakensis COM3.The 35 putative hybrid NRPS/T1PKS gene clusters ranged in size from 40.8 kbp (Region 1.4 in strain Dictyobacter sp. SOSP1-9) to 231.8 kbp (Region 2.5 in strain D. vulcani W12T). Based on the NRPS/PKS collinearity biosynthetic principle/rule (Challis & Naismith, 2004; Dutta et al., 2014; Shen, 2003), core scaffold structures of the Ktedonobacteria–derived NRPS/T1PKS hybrid products were roughly predicted by antiSMASH v5.0 based on the composition and organization of biosynthetic domains. Although the tailoring reactions such as final cyclization and oxidation-reduction reactions (Bloudoff et al., 2017; Chen et al., 2016) were not taken into account, it can be proposed from some representative examples (Fig. 3-8) that, the Ktedonobacteria–derived NRPS/T1PKS hybrid products should comprise structurally diversed peptide compounds including linear aminopolyol (Rogers & Molinski, 2007), cyclic lipopeptide (Edwards et al., 2004; Hrouzek et al., 2012), macrocyclic peptide (Tsakos et al., 2016), bicyclic peptides (Cornelio et al., 2019), cyclic depsipeptide (Xu et al., 2012), and sulfur-containing peptide compounds (Zhao & Jiang, 2018), etc.
After a comprehensive analysis of antiSMASH v5.0 and BAGEL v4, the 62 lantipeptide gene clusters were further classified into two classes according to their biosynthetic machinery (Zhang et al., 2012). Class I contains lantipeptide gene clusters that are synthesized by two separated modification genes, LanB and LanC (Fig. 3-9A), while Class II are synthesized by a single bifunctional enzyme termed LanM (Fig. 3-9B). Moreover, after conducting a MUSCLE alignment of the amino acid sequences of the Ktedonobacteria-derived precursor peptides and generating sequence logos with WebLogo 4, we observed F(E/D)LD (Fig. 3-10A) and GG (Fig. 3-10B) cleavage sites in class I and class II lantipeptide gene clusters, respectively.
3.3.3.3 Phylogenetic analysis of the PKS-KS and NRPS-C domains
Considering that class Ktedonobacteria constitutes a relatively new bacterial taxa, to date only very limited knowledge is available regarding their secondary metabolites. The domain-specific phylogenetic analysis of
Ktedonobacteria-121
originated secondary metabolite BGCs identified in the present study may provide a better understanding of the functional and evolutionary classification of their domains. Furthermore, as C and KS domains are responsible for peptide and polyketide chain elongation in NRPS and PKS biosynthesis, respectively, the two domains represent the best candidates for domain-specific phylogenetic analysis (Jenke-Kodama & Dittmann, 2009; Rausch et al., 2007). As shown in Fig. 3-11A, the most abundant functional type among the Ktedonobacteria KS domains was assigned to hybrid KS by NaPDoS classification, owing to the high occurrence of hybrid NRPS-T1PKS clusters identified in the 23 Ktedonobacteria genomes.
Modular KS, the second most abundant KS domain functional type in the class Ktedonobacteria, was also mainly derived from hybrid NRPS/T1PKS clusters. The modular PKS is responsible for the incorporation of one building block and contains at least three domains: KS, AT, and ACP (Jenke-Kodama & Dittmann, 2009).
According to the NaPDoS functional classification, the most abundant types of Ktedonobacteria-originated C domains are LCL and DCL (Fig. 3-11B). The LCL type C domain catalyzes formation of a peptide bond between two L-amino acids whereas the DCL type links an L-amino acid to a growing peptide ending with a D-amino acid in NRPS biosynthesis (Rausch et al., 2007). In agreement with our analysis of NRPS and hybrid clusters, the C domains, which are replaced by HC domains in strains Ts. hazakensis SK20-1T and Ts. hazakensis COM3, were classified as cyclization domains by NaPDoS, which catalyze both peptide bond formation and subsequent cyclization of cysteine, serine, or threonine residues.
With regard to evolutionary classification, the Ktedonobacteria C domains showed more diversity compared with the KS domains. As shown in Fig. 3-11B, only a small percentage of Ktedonobacteria-originated C domains were related to the reference sequences. Rather, a large proportion of Ktedonobacteria-originated C domains formed independent branches, indicating they are distinct from those that originated from other phyla in evolutionary taxonomy.
3.3.4 CAZYmes biosynthetic potential of the class Ktedonobacteria
Members within the class Ktedonobacteria exhibit a broad range of utilization of carbohydrates or degradation abilities in our physiological assays in Chapter 2,
122
indicating that they may represent a potential cellulolytic bacterial group. However, comprehensive characterization of CAZymes in the genomes of Ktedonobacteria are still rare in the literature. Accordingly, I performed genome-wide analysis to profile the composition and distribution of CAZymes in the 23 available Ktedonobacteria genomes. As shown in Fig. 3-12, a large number of 153~320 genes per genome encoding for putative CAZymes were predicted in these Ktedonobacteria genomes. Specifically, 176~320 putative CAZymes per genome were detected in the Ktedonobacterales order due to their relatively large genomes (Charlesworth & Barton, 2004), which far exceeded the number of CAZymes annotated in other Chloroflexi species (12~165 CAZymes per genome), and were comparable to well-known cellulolytic actinomycetes such as Micromonospora aurantiaca, Cellulomonas flavigena, and Thermobifida fusca (100~244 CAZymes per genome), as shown in Fig. 3-12. Moreover, the Ktedonobacterales order harbored a higher ratio of CAZymes in their genomes (176~320 CAZymes/7.21~13.66 Mb genomes) when compared with cellulolytic fungi, including Aspergillus niger, Trichoderma harzianum, and Trichoderma reesei (372~548 CAZymes/33.35~40.96 Mb genomes), although the latter had higher absolute numbers. In addition, although the Thermogemmatisporales order harbored fewer CAZymes than the Ktedonobacterales order, these Thermogemmatispora species still far exceeded the outgroup bacterial strains (Bacillus subtilis and Escherichia coli) in absolute number of CAZymes in their genomes (153~158 CAZymes per genome). Moreover, although the ancient and highly phylogenetically diverse phylum Chloroflexi is generally known as a group of filamentous anoxygenic phototrophs, their potential in carbohydrate degradation is poorly characterized so far. Thus, it is noteworthy to identify the class Ktedonobacteria within this phylum to have so many CAZymes in their genomes.
The most abundant CAZyme class in the genomes of Ktedonobacteria were GHs and GTs, with 63~139 GHs and 53~108 GTs per genome. It is well-known that GHs break down glycosidic bonds in complex carbohydrates, while, conversely, GTs are involved in the formation of glycosidic bonds in disaccharides, oligosaccharides and polysaccharides (Lombard et al., 2014). Interestingly, GTs
123
can also catalyze the transfer of sugar moieties to specific acceptor molecules, such as diverse natural products, to enhance their water solubility, stability, and bioavailability (Pandey, 2017; Schmid et al., 2016). Considering the high abundance of BGCs for secondary metabolites present in the genomes of Ktedonobacteria, we propose there could be a high possibility that GTs take part in the tailoring modifications to natural products.
To deepen our understanding of the composition and distribution of diverse CAZymes families in the genomes of the class Ktedonobacteria, we subjected annotated putative CAZymes to sequence-based classification, according to the CAZy database (Lombard et al., 2014). As a result, putative Ktedonobacteria CAZymes were assigned to 85 GH families, 18 GT families, 11 CE families, 8 AA families, 5 PL families, and 18 CBM families, as given in Fig. 3-13. In general, GT2 and GT4 families predominated, and accounted for almost half of the total number of GTs identified in the genomes of 23 Ktedonobacteria strains. Due to the highly diversified sequences, their possible catalytic activities are also highly diversified, including cellulose synthase (EC 2.4.1.12), chitin synthase (EC 2.4.1.16), mannosyltransferase (EC 2.4.1.83), etc. for the GT2 family, and sucrose synthase (EC 2.4.1.13), sucrose-phosphate synthase (EC 2.4.1.14), α-glucosyltransferase (EC 2.4.1.52), etc. for the GT4 family, according to the CAZy database (Lombard et al., 2014). As for GH genes, GH3, GH5, GH13, and GH15 families were the most abundant in the 23 Ktedonobacteria genomes. Remarkably, GH3 (β-glucosidase, xylan 1,4-β-xylosidase, β-glucosylceramidase, etc.) and GH5 (endo-β-1,4-glucanase/cellulase, endo-β-1,4-xylanase, β-glucosidase, etc.) families are characterized as plant polysaccharide-degrading enzymes, and have played important roles in cellulose and hemicellulose degradation (Rytioja et al., 2014).
Thus, the predominance of these two GH families may contribute to the utilization and degradation ability of Ktedonobacteria strains against a wide range of polysaccharide substrates such as starch, chitin, CMC, and the crystalline cellulose Avicel, according to previous research in Chapter 2. Interestingly, genes belonging to the CE10 family predominated in the identified CEs. However, unlike other CE families which hydrolyze carbohydrate esters (Lombard et al., 2014), the
124
CE10 family has been recently revoked because most protein products of genes in this family act on the ester bonds of non-carbohydrate substrates (Lombard et al., 2014), indicating that related genes identified in Ktedonobacteria may involve other metabolites. In addition, in accordance with the composition of CAZymes indicated in Fig. 3-12, distribution of the annotated CAZymes families was more diverse in the genomes of the Ktedonobacterales order, while more conserved in the Thermogemmatisporales order. Collectively, the existence of large numbers of CAZymes in genomes supports the wide range of carbohydrate degradation ability, offering possibilities for the class Ktedonobacteria as a cellulolytic bacterial group as a potential microbial resource for novel CAZyme discovery.