Table 8. The results of the meta-analyses for the six protein families under positive selection by the DerSimonian-Laird method, the Restricted
maximum-likelihood method, the Peto method and the Mantel-Haenszel
method.
therefore referred to as functional constraints. The relative intensities of the structural
constraints and the functional constraints differ from site to site. The active sites
registered in CSA did not always have low correlation coefficient. In such sites, the
structural constraint may be consistent with the functional constraint. Despite the
inconsistency among the sites, the integrated analysis by the meta-analysis suggested
that the active sites tend to have low correlation coefficients. Likewise, the correlation
coefficients of some positively selected sites did not fall into the tail region. For some
sites, the acceleration of amino acid substitutions along the structural constraints may be
enough for the positive selection. For example, the substitution of an amino acid residue
to a physicochemically similar residue at the antigenicity determining site may be
effective to escape from the recognition by antibody. However, the meta-analyses
suggested that the correlation coefficients for the positively selected sites tend to have
lower values than those for the non-positively selected sites. In other words, the
acceleration of the amino acid substitutions at the positively selected sites tends to occur
in a manner that is subject to the functional constraints.
The positively selected sites, which are listed in Table 5, are mapped on the
tertiary structures of -toxin,-toxin, Group I PLA2, Group II PLA2, HA and MHC (see
Figure 6). In the figures, the positively selected residues with the correlation
coefficients between two profiles < 0.1 are colored red. The residues colored red are
basically concentrated at the same face, in any case except for the analysis with -toxin
(see Figure 6). As for the -toxin, four residues corresponding to the residue positions
11, 18, 38 and 40 of PDB ID:2ASC are colored red (see Figure 6 (A)). Out of them,
residues at the positions 11, 18 and 40, have been reported to affect the toxin activity
(Gordon et.al. 2007, Weinberger et al. 2010.). Furthermore, it is reported that the
residue position 40 is involved in the toxin selectivity as well as toxin activity
(Weinberger et al. 2010). Only two residues corresponding to the residue positions 15
and 16 of PDB ID:2I61 were detected for the -toxin by my analysis. The residues are
also colored red in Figure 6 (B). The residues have been reported to be important for
both toxin activity and selectivity to the insect voltage-gated sodium channel (Karbat et
al., 2007; Tian et al., 2008.). Most of the residues colored red are located around the
active sites (blue sites) for either Groups I or II PLA2 (See Figure 6 (C) and (D)). Both
groups of enzymes are used as toxins by the snakes. It is suggested that the positive
selection is considered to be related to the adaptation to the change of the prey in the
different habitats (Lynch, 2007). In addition, some of the residues detected from Group
II PLA2 are also present near or at the terminal region (see Figure 6 (D)). The
C-terminal region, which corresponds to the residue positions 115-133 of PDB ID:1OZ6,
have been reported to be involved in the toxin action (Prijatelj et al., 2000; Ivanovski et
al., 2000; Fujisawa et al., 2008). Three residues of HA are colored red (see Figure 6
(E)). Out of them, two residues corresponding to the residue positions 156 and 159 of
PDB ID:2VIU are present in the antigenic region B1. The region, which consists of the
residue positions 150-170 of PDB ID:2VIU, have been reported to play an important
role for evacuating antibody’s attack (Sato et al., 2004). The most of the residues
colored red of MHC are located at or around the ligand binding sites in the tertiary
structure (see Figure 6 (F)). Thus, the individual exploration about the positively
selected residues with the correlation coefficient < 0.1 suggested that the substitutions
under the constraint different from structural one are effective for such adaptation or
selectivity.
As described above, the sites under positive selection do not always have the
low correlation coefficient between profiles. Therefore, it would be difficult to apply
this tendency to the prediction of positively selected sites. Rather, my method seems to
connect the two schools for the detection of positive selection. Several different
methods are available to detect positive selection at a single amino acid site in proteins.
Roughly speaking, the methods are classified into two approaches. One of the methods
uses the ratio of non-synonymous to synonymous substitution rates or estimate the
ratio as an indicator of positive selection, and has been developed by introducing
statistical approaches (Fitch et al., 1997; Nielsen and Yang, 1998; Suzuki and Gojobori,
1999; Yang et al., 2000; Kosakovsky Pond and Frost, 2005; Massingham and Goldman,
2005). The PAML program package (Yang, 2007) used in this study is a representative
tool for the identification of codons under positive selection. In contrast, there are
different approaches in which the ratio of the radical to conservative amino acid
substitution rates is used as a measure for positive selection (Hughes et al., 1990; Rand
et al., 2000; Zhang, 2000; Suzuki, 2007). Dagan et al. (2002) suggested that the ratio of
radical to conservative substitutions may not be a good indicator of positive selection,
since the ratio is affected by the transition-transversion rate ratio and the amino acid
compositions. In addition, the radical and conservative substitutions are defined based
on the physicochemical characteristics of the amino acid residues in the approaches.
However, the structural aspects have been neglected in these studies. Even the
substitutions of amino acids belonging to the same physicochemical group may be
radical, and the substitutions between amino acids in different physicochemical groups
may be conservative, depending on the structural context. In this study, I examined the
amino acid replacements under positive selection indicated by the ratio, in which
neither radical-conservative substitutions nor structural-functional constraints are taken
into account. In spite of the exclusion of such aspects, the profile comparison study
suggested that the amino acid substitutions under positive selection may have occurred
in a manner that the substitution pattern deviates from structural constraints. If the
amino acid substitutions according to the structural constraints at a site are regarded as
conservative, and those that deviates from structural constraints as radical, then my
study can be considered as an attempt to connect the approaches with the ratio to
those with the radical-conservative ratio. The key is the introduction of the structural
information. Recently, Suzuki (2013) extended his work by using structural
information, in which the thermodynamic stability obtained by an in silico method is
used, instead of the physicochemical group. Considering the accumulation of the
coordinate data of proteins, the use of structural information will shed light on studies of
positive selection.
Figure 6. Mapping of the amino acid residues corresponding to positively
selected sites.
The positively selected sites are mapped on the tertiary structures of -toxin (A),
-toxin (B), Group I PLA2 (C), Group II PLA2 (D), HA (E) and MHC (F). The residues
corresponding to positively selected sites with a correlation coefficient < 0.1 are colored
red, whereas the residues corresponding to the positively selected sites with a
correlation coefficient > 0.1 are colored yellow. The residues colored blue indicate the
catalytic sites for Group I and II PLA2. The ligands of MHC are represented as green
sticks. Two structures are shown in each row. They are the illustrations of the same
structure. The structure at the left side is arranged to exhibit the residues colored red as
many as possible, which is rotated around the Z-axis running from the bottom of the
page to the top by approximately 180 degrees to be shown at the right side.
Finally, I’d like to conclude this manuscript by describing a possible extension of my
profile comparison as future works. Recently, several groups including us have
investigated positive selection from structural viewpoint, although the number of such
studies is still small. For example, Meyer et al. (2013) and Echave et al. (2016)
described a possibility that even positively selected codon sites have dS/dN < 1.0 due to
structural constraints. It means ordinary approaches with ratio cannot be applicable to
the detection of such codon sites. Echave et al. (2016) insisted that a structural baseline
is required to detect such positively selected sites. If the substitution rate at a site is
greater than the base line, the site is regarded as a positively selected site, although there
is no concrete method to generate such base line at this stage. During the research
described in this manuscript, I found that some positively selected sites identified in this
study seem to be under the relatively strong structural constraint, since the correlation
coefficient of profile comparison at such sites were 0.5 or more. The observation is
inverse to the case suggested by Echave et al. (2016). That is, amino acid substitution
could be accelerated according to the structural constraints. There are two
interpretations for my observation. Consider antigenicity determinants of an antigen for
the first interpretation. If the object of the amino acid substitution is to avoid the
detection by the host antibodies, the replacement according to the structural constraint
would be enough for the survival of the pathogens. The second interpretation may be
related to a report by Dasmeh et al. (2014), in which the relationship between the
protein stability and the dN/dS ratio is examined with the simulated evolution of
myoglobulin in silico. In their simulation, they observed that positive selection would
occur to maintain the structural stability under the condition of the low structural
stability. The positively selected sites with high correlation coefficients detected in this
study may contribute to the structural stability as suggested by Dasmeh et al. (2014).
The profile comparison could be used to further investigate the mechanism of the
positive selection with high correlation coefficients between the profiles. The study of
positive selection from structural viewpoint has been just begun. The introduction of
structural information would shed light on the study of positive selection and other
topics of molecular evolution.
Acknowledgments
I would like to show my greatest appreciation to my supervisor, Professor
Hiroyuki Toh for providing me the opportunity to conduct this research and guidance
from the beginning to the completion of this research. My heartfelt appreciation goes to
Professor Motonori Ota for giving me a program to calculate the 3D profile of this study
and useful advice. I owe my deepest gratitude to Professor Daisuke Kohda for advising
me as a chief investigator and giving me the opportunity to conduct this research from
April 1 to September 30, 2011 in his laboratory. I am very grateful to everyone
belonging to Professor Toh's laboratory and Professor Kohda's laboratory, in particular
Drs. Kazutaka Katoh, Tetsuya Sato, and Fredrik Johansson for their valuable comments
and discussion on my computational analyses.
References
Agileta, G., Refregier, G., Yockteng, R., Fournier, E., and Giraud, T. (2009). Rapidly
evolving genes in pathogens: Methods for detecting positive selection and examples
among fungi, bacteria, viruses and protists. Infection, Genetics and Evolution 9,
656-670.
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and
Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Research 25, 3389-3402.
Angelis, K., Dos, R. M., and Yang, Z. (2014). Konstantinos Angelis, Mario dos Reis,
and Ziheng Yang Bayesian Estimation of Nonsynonymous/Synonymous Rate. Ratios
for Pairwise Sequence Comparisons. Mol. Biol. Evol 31(7), 1902–1913.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57,
289-300.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H.,
Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic Acids
Research 28, 235-242.
Chelliah, V., Chen, L., Blundell, T.L., and Lovell, S.C. (2004). Distinguishing structural
and functional restraints in evolution in order to identify interaction sites. J Mol Biol
342, 1487-1504.
Cheng, G., Qian, B., Samudrala, R., and Baker, D. (2005). Improvement in protein
functional site prediction by distinguishing structural and functional constraints on
protein family evolution using computational design. Nucleic Acids Res 33, 5861-5867.
Chothia, C., and Lesk, A.M. (1986). The relation between the divergence of sequence
and structure in proteins. EMBO J 5, 823-826.
Cochran WG. (1954). The combination of estimates from different experiments. Biometrics 10,
101-29
Corbeil, R. R., and Searle, S. R. (1976). Restricted maximum likelihood (REML)
estimation of variance components in the mixed model. Technometrics, 18, 31–38.
Dagan, T., Talmor, Y., and Graur, D. (2002). Ratios of radical to conservative amino
acid replacement are affected by mutational and compositional factors and may not be
indicative of positive Darwinian selection. Mol Biol Evol 19(7), 1022-1025.
Dasmeh, P., Serohijos, A. W. R., Kepp, K. P., & Shakhnovich, E. I. (2014). The
influence of selection for protein stability on dN/dS estimations. Genome Biology and
Evolution 6(10), 2956–2967.
Davidson, F.F., and Dennis, E.A. (1990). Evolutionary relationships and implications
for the regulation of phospholipase A2 from snake venom to human secreted forms. J
Mol Evol 31, 228-238.
DerSimonian, R., and Laird, N. (1986). Meta-analysis in clinical trials. Controlled
Clinical Trials, 7, 177–188.
DerSimonian, R., and Laird, N. (2015). Meta-analysis in clinical trials revisited.
Contemporary Clinical Trials, 45, 139–145.
Dutheil, J.Y., Galtier, N., Romiguier, J., Douzery, E.J., Ranwez, V., and Boussau, B.
(2012). Efficient selection of branch-specific models of sequence evolution. Mol. Biol.
Evol. 29, 1861–1874.
Echave, J., Spielman, S. J., & Wilke, C. O. (2016). Causes of evolutionary rate variation
among protein sites. Nature Reviews Genetics 17(2), 109–121.
Elcock AH. (2001) Prediction of functionally important residues based solely on the
computed energetics of protein structure. J Mol Biol. 28, 885-96
Fitch, W.M., Bush, R.M., Bender, C.A., and Cox, N.J. (1997). Long term trends in the
evolution of H (3) HA1 human influenza type A. Proc Natl Acad Sci U S A 94,
7712-7718.
Fujisawa, D., Yamazaki, Y., Lomonte, B., and Morita, T. (2008) Catalytically inactive
phospholipase A2 homologue binds to vascular endothelial growth factor receptor-2 via
a C-terminal loop region. Biochem. J. 411, 515–522.
Gharib, W.H., and Robinson-Rechavi, M. (2013). The branch-site test of positive
selection is surprisingly robust but lacks power under synonymous substitution
saturation and variation in GC. Mol. Biol. Evol 30, 1675–1686.
Godzik, A., Kolinski, A., and Skolnick, J. (1992). Topology fingerprint approach to the
inverse protein folding problem. J Mol Biol 227, 227-238.
Hatala, R., Keitz, S., Wyer, P., and Guyatt, G. (2005). Tips for learners of
evidence-based medicine: Assessing heterogeneity of primary studies in systematic reviews and
whether to combine their results[J]. CMAJ 172(5), 661–665.
Henikoff, S., and Henikoff, J.G. (1994). Position-based sequence weights. J Mol Biol
243, 574-578.
Higgins, J.P.T., Thompson, S.G., Deeks, J.J., and Altman, D.G. (2003). Measuring
inconsistency in meta-analyses. BMJ 327, 557–560.
Ho, SY., Yu, FC., Chang, CY., and Huang, HL. (2007). Design of accurate predictors
for DNA-binding sites in proteins using hybrid SVM-PSSM method. Biosystems. 90(1),
234-41.
Huang, C., and Yuan, J. (2013). Using radial basis function on the general form of
Chou's pseudo amino acid composition and PSSM to predict subcellular locations of
proteins with both single and multiple sites. Biosystems 113(1), 50-7.
Hughes, A., Ota, T., and Nei, M. (1990). Positive Darwinian selection promotes charge
profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex
molecules. Mol Biol Evol 7(6), 515-524.
Ivanovski, G., Copic, A., Krizaj, I., Gubensek, F., and Pungercar, J. (2000). The amino
acid region 115-119 of ammodytoxins plays an important role in neurotoxicity.
Biochem Biophys Res Commun 276(3), 1229–1234.
Kanehisa, M., and Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and
Genomes. Nucleic Acids Res 28(1), 27-30.
Karbat, I., Turkov, M., Cohen, L., Kahn, R., Gordon, D., Gurevitz, M., and Frolow, F.
(2007) X-ray structure and mutagenesis of the scorpion depressant toxin LqhIT2 reveals
key determinants crucial for activity and anti-insect selectivity. J Mol Biol 366, 586–
601
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: a novel method for
rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res
30, 3059-3066.
Katoh, K., and Toh, H. (2008). Recent developments in the MAFFT multiple sequence
alignment program. Brief Bioinform 9, 286-298.
Kosakovsky Pond, S.L., and Frost, S.D. (2005). Not so different after all: a comparison
of methods for detecting amino acid sites under selection. Mol Biol Evol 22, 1208–
1222.
Kuhlman, B., and Baker, D. (2000). Native protein sequences are close to optimal for
their structures. Proc Natl Acad Sci U S A 97, 10383-10388.
Lynch, V.J. (2007). Inventing an arsenal: adaptive evolution and neofunctionalization of
snake venom phospholipase A2 genes. BMC Evol Biol 7, 2.
Mantel N, Haenszel W (1959). Statistical Aspects of the Analysis of Data from
Retrospective Studies of Disease. Journal of the National Cancer Institute, 22(4), 719–
748.
Massingham, T., and Goldman, N. (2005). Detecting amino acid sites under positive
selection and purifying selection. Genetics 169, 1753–1762.
Meyer AG, Wilke CO. (2013). Integrating sequence variation and protein structure to identify sites under selection. Mol Biol Evol. 30, 36–44.
Nei, M,. and Gojobori, T. (1986). Simple methods for estimating the numbers of
synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418-426
Nielsen, R., and Yang, Z. (1998). Likelihood models for detecting positively selected
amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–936.
Ota, M., Isogai, Y., and Nishikawa, K. (2001). Knowledge-based potential defined for a
rotamer library to design protein sequences. Protein Eng 14, 557-564.
Ota, M., Kinoshita, K., and Nishikawa, K. (2003). Prediction of catalytic residues in
enzymes based on known tertiary structure, stability profile, and sequence conservation.
J Mol Biol 327, 1053-1064.
Porter, C., Bartlett, G., and Thornton, J.M. (2004). The Catalytic Site Atlas: a resource
of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids
Res 32, D129-133.
Prijatelj, P., Copic, A., Krizaj, I., Gubensek, F., and Pungercar, J. (2000). Charge
reversal of ammodytoxin A, a phospholipase A2-toxin, does not abolish its
neurotoxicity. Biochem J. 352 ,251–255.
Rand, D., Weinreich, D., and Cezairliyan, B. (2000). Neutrality tests of
conservative-radical amino acid changes in nuclear- and mitochondrially-encoded proteins. Gene
261(1), 115-125.
Redelings, B. (2014). Erasing Errors due to Alignment Ambiguity When Estimating
Positive Selection. Mol. Biol. Evol 31(8), 1979–1993.
Rodríguez de la Vega, R.C., and Possani, L.D. (2005). Overview of scorpion toxins
specific for Na+ channels and related peptides: biodiversity, structure-function
relationships and evolution. Toxicon 46, 831-844.
Russell, R.B., and Barton, G.J. (1994). Structural features can be unconserved in
proteins with similar folds. An analysis of side-chain to side-chain contacts secondary
structure and accessibility. J Mol Biol 244, 332-350.
Saitou, N., and Nei, M. (1987). The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Mol Biol Evol 4, 406-425.
Sato, K., Morishita, T., Nobusawa, E., Tonegawa, K., Sakae, K., Nakajima, S., and
Nakajima K. (2004) Amino-acid change on the antigenic region B1 of H3
haemagglutinin may be a trigger for the emergence of drift strain of influenza A virus.
Epidemiol Infect 132, 399–406
Suzuki, Y. (2007). Inferring natural selection operating on conservative and radical
substitution at single amino acid sites. Genes Genet Syst 82(4), 341-360.
Suzuki, Y. (2013). Detection of positive-selection eliminating effects of structural
constraints in hemagglutinin of H3N2 human influenza A virus Infection. Genetics and
Evolution 16, 93-98.
Suzuki, Y., and Gojobori, T. (1999). A method for detecting positive selection at single
amino acid sites. Mol Biol Evol 16, 1315-1328.
Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular
Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24(8),
1596-1599.
Tian, C., Yuan, Y., and Zhu, S. (2008). Positively selected sites of scorpion depressant
toxins: possible roles in toxin functional divergence. Toxicon 51, 555-562.
Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. J
Stat Software 36(3), 1-48.
Weinberger, H., Moran, Y., Gordon, D., Turkov, M., Kahn, R., and Gurevitz, M.
(2010). Positions under positive selection--key for selectivity and potency of scorpion
alpha-toxins. Mol Biol Evol 27, 1025-1034.
Wilson, D.J., and McVean, G. (2006). Estimating diversifying selection and functional
constraint in the presence of recombination. Genetics 172, 1411-1425.
Yang, Z. (2006). Computational molecular evolution. Oxford Series in Ecology and
Evolution.
Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol
Evol 24, 1586-1591.
Yang, Z., and Nielsen, R. (2002). Codon substitution models for detecting molecular
adaptation at individual sites along specific lineages. Mol Biol Evol 19, 908-917.
Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A.-M.K. (2000). Codon-substitution
models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449.
Yang, Z., and Swanson, W. (2002). Codon-substitution models to detect adaptive
evolution that account for heterogeneous selective pressures among site classes. Mol
Biol Evol 19(1), 49-57.
Yang, Z., Wong, W.S.W., and Nielsen, R. (2005). Bayes empirical Bayes Inference of
amino acid sites under positive selection. Mol Biol Evol 22, 1107-1118.
Yusuf, S., Peto, R., Lewis, J., Collins, R., and Sleight, P. (1985). Beta blockade during
and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc
27, 335–371.
Zaheri, M., Dib, L., and Salamin, N. (2014). A Generalized Mechanistic Codon Model.
Mol. Biol. Evol. 31(9), 2528–2541.
Zhang, J. (2000). Rates of conservative and radical nonsynonymous nucleotide
substitutions in mammalian nuclear genes. J Mol Evol 50(1), 56-68.