Discussion - タンパク質立体構造情報を用いた正の選択の解析

Table 8. The results of the meta-analyses for the six protein families under positive selection by the DerSimonian-Laird method, the Restricted

maximum-likelihood method, the Peto method and the Mantel-Haenszel

method.

therefore referred to as functional constraints. The relative intensities of the structural

constraints and the functional constraints differ from site to site. The active sites

registered in CSA did not always have low correlation coefficient. In such sites, the

structural constraint may be consistent with the functional constraint. Despite the

inconsistency among the sites, the integrated analysis by the meta-analysis suggested

that the active sites tend to have low correlation coefficients. Likewise, the correlation

coefficients of some positively selected sites did not fall into the tail region. For some

sites, the acceleration of amino acid substitutions along the structural constraints may be

enough for the positive selection. For example, the substitution of an amino acid residue

to a physicochemically similar residue at the antigenicity determining site may be

effective to escape from the recognition by antibody. However, the meta-analyses

suggested that the correlation coefficients for the positively selected sites tend to have

lower values than those for the non-positively selected sites. In other words, the

acceleration of the amino acid substitutions at the positively selected sites tends to occur

in a manner that is subject to the functional constraints.

The positively selected sites, which are listed in Table 5, are mapped on the

tertiary structures of -toxin,-toxin, Group I PLA2, Group II PLA2, HA and MHC (see

Figure 6). In the figures, the positively selected residues with the correlation

coefficients between two profiles < 0.1 are colored red. The residues colored red are

basically concentrated at the same face, in any case except for the analysis with -toxin

(see Figure 6). As for the -toxin, four residues corresponding to the residue positions

11, 18, 38 and 40 of PDB ID:2ASC are colored red (see Figure 6 (A)). Out of them,

residues at the positions 11, 18 and 40, have been reported to affect the toxin activity

(Gordon et.al. 2007, Weinberger et al. 2010.). Furthermore, it is reported that the

residue position 40 is involved in the toxin selectivity as well as toxin activity

(Weinberger et al. 2010). Only two residues corresponding to the residue positions 15

and 16 of PDB ID:2I61 were detected for the -toxin by my analysis. The residues are

also colored red in Figure 6 (B). The residues have been reported to be important for

both toxin activity and selectivity to the insect voltage-gated sodium channel (Karbat et

al., 2007; Tian et al., 2008.). Most of the residues colored red are located around the

active sites (blue sites) for either Groups I or II PLA2 (See Figure 6 (C) and (D)). Both

groups of enzymes are used as toxins by the snakes. It is suggested that the positive

selection is considered to be related to the adaptation to the change of the prey in the

different habitats (Lynch, 2007). In addition, some of the residues detected from Group

II PLA2 are also present near or at the terminal region (see Figure 6 (D)). The

C-terminal region, which corresponds to the residue positions 115-133 of PDB ID:1OZ6,

have been reported to be involved in the toxin action (Prijatelj et al., 2000; Ivanovski et

al., 2000; Fujisawa et al., 2008). Three residues of HA are colored red (see Figure 6

(E)). Out of them, two residues corresponding to the residue positions 156 and 159 of

PDB ID:2VIU are present in the antigenic region B1. The region, which consists of the

residue positions 150-170 of PDB ID:2VIU, have been reported to play an important

role for evacuating antibody’s attack (Sato et al., 2004). The most of the residues

colored red of MHC are located at or around the ligand binding sites in the tertiary

structure (see Figure 6 (F)). Thus, the individual exploration about the positively

selected residues with the correlation coefficient < 0.1 suggested that the substitutions

under the constraint different from structural one are effective for such adaptation or

selectivity.

As described above, the sites under positive selection do not always have the

low correlation coefficient between profiles. Therefore, it would be difficult to apply

this tendency to the prediction of positively selected sites. Rather, my method seems to

connect the two schools for the detection of positive selection. Several different

methods are available to detect positive selection at a single amino acid site in proteins.

Roughly speaking, the methods are classified into two approaches. One of the methods

uses the ratio of non-synonymous to synonymous substitution rates or estimate the 

ratio as an indicator of positive selection, and has been developed by introducing

statistical approaches (Fitch et al., 1997; Nielsen and Yang, 1998; Suzuki and Gojobori,

1999; Yang et al., 2000; Kosakovsky Pond and Frost, 2005; Massingham and Goldman,

2005). The PAML program package (Yang, 2007) used in this study is a representative

tool for the identification of codons under positive selection. In contrast, there are

different approaches in which the ratio of the radical to conservative amino acid

substitution rates is used as a measure for positive selection (Hughes et al., 1990; Rand

et al., 2000; Zhang, 2000; Suzuki, 2007). Dagan et al. (2002) suggested that the ratio of

radical to conservative substitutions may not be a good indicator of positive selection,

since the ratio is affected by the transition-transversion rate ratio and the amino acid

compositions. In addition, the radical and conservative substitutions are defined based

on the physicochemical characteristics of the amino acid residues in the approaches.

However, the structural aspects have been neglected in these studies. Even the

substitutions of amino acids belonging to the same physicochemical group may be

radical, and the substitutions between amino acids in different physicochemical groups

may be conservative, depending on the structural context. In this study, I examined the

amino acid replacements under positive selection indicated by the  ratio, in which

neither radical-conservative substitutions nor structural-functional constraints are taken

into account. In spite of the exclusion of such aspects, the profile comparison study

suggested that the amino acid substitutions under positive selection may have occurred

in a manner that the substitution pattern deviates from structural constraints. If the

amino acid substitutions according to the structural constraints at a site are regarded as

conservative, and those that deviates from structural constraints as radical, then my

study can be considered as an attempt to connect the approaches with the  ratio to

those with the radical-conservative ratio. The key is the introduction of the structural

information. Recently, Suzuki (2013) extended his work by using structural

information, in which the thermodynamic stability obtained by an in silico method is

used, instead of the physicochemical group. Considering the accumulation of the

coordinate data of proteins, the use of structural information will shed light on studies of

positive selection.

Figure 6. Mapping of the amino acid residues corresponding to positively

selected sites.

The positively selected sites are mapped on the tertiary structures of -toxin (A), 

-toxin (B), Group I PLA2 (C), Group II PLA2 (D), HA (E) and MHC (F). The residues

corresponding to positively selected sites with a correlation coefficient < 0.1 are colored

red, whereas the residues corresponding to the positively selected sites with a

correlation coefficient > 0.1 are colored yellow. The residues colored blue indicate the

catalytic sites for Group I and II PLA2. The ligands of MHC are represented as green

sticks. Two structures are shown in each row. They are the illustrations of the same

structure. The structure at the left side is arranged to exhibit the residues colored red as

many as possible, which is rotated around the Z-axis running from the bottom of the

page to the top by approximately 180 degrees to be shown at the right side.

Finally, I’d like to conclude this manuscript by describing a possible extension of my

profile comparison as future works. Recently, several groups including us have

investigated positive selection from structural viewpoint, although the number of such

studies is still small. For example, Meyer et al. (2013) and Echave et al. (2016)

described a possibility that even positively selected codon sites have dS/dN < 1.0 due to

structural constraints. It means ordinary approaches with  ratio cannot be applicable to

the detection of such codon sites. Echave et al. (2016) insisted that a structural baseline

is required to detect such positively selected sites. If the substitution rate at a site is

greater than the base line, the site is regarded as a positively selected site, although there

is no concrete method to generate such base line at this stage. During the research

described in this manuscript, I found that some positively selected sites identified in this

study seem to be under the relatively strong structural constraint, since the correlation

coefficient of profile comparison at such sites were 0.5 or more. The observation is

inverse to the case suggested by Echave et al. (2016). That is, amino acid substitution

could be accelerated according to the structural constraints. There are two

interpretations for my observation. Consider antigenicity determinants of an antigen for

the first interpretation. If the object of the amino acid substitution is to avoid the

detection by the host antibodies, the replacement according to the structural constraint

would be enough for the survival of the pathogens. The second interpretation may be

related to a report by Dasmeh et al. (2014), in which the relationship between the

protein stability and the dN/dS ratio is examined with the simulated evolution of

myoglobulin in silico. In their simulation, they observed that positive selection would

occur to maintain the structural stability under the condition of the low structural

stability. The positively selected sites with high correlation coefficients detected in this

study may contribute to the structural stability as suggested by Dasmeh et al. (2014).

The profile comparison could be used to further investigate the mechanism of the

positive selection with high correlation coefficients between the profiles. The study of

positive selection from structural viewpoint has been just begun. The introduction of

structural information would shed light on the study of positive selection and other

topics of molecular evolution.

Acknowledgments

I would like to show my greatest appreciation to my supervisor, Professor

Hiroyuki Toh for providing me the opportunity to conduct this research and guidance

from the beginning to the completion of this research. My heartfelt appreciation goes to

Professor Motonori Ota for giving me a program to calculate the 3D profile of this study

and useful advice. I owe my deepest gratitude to Professor Daisuke Kohda for advising

me as a chief investigator and giving me the opportunity to conduct this research from

April 1 to September 30, 2011 in his laboratory. I am very grateful to everyone

belonging to Professor Toh's laboratory and Professor Kohda's laboratory, in particular

Drs. Kazutaka Katoh, Tetsuya Sato, and Fredrik Johansson for their valuable comments

and discussion on my computational analyses.

References

Agileta, G., Refregier, G., Yockteng, R., Fournier, E., and Giraud, T. (2009). Rapidly

evolving genes in pathogens: Methods for detecting positive selection and examples

among fungi, bacteria, viruses and protists. Infection, Genetics and Evolution 9,

656-670.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and

Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein

database search programs. Nucleic Acids Research 25, 3389-3402.

Angelis, K., Dos, R. M., and Yang, Z. (2014). Konstantinos Angelis, Mario dos Reis,

and Ziheng Yang Bayesian Estimation of Nonsynonymous/Synonymous Rate. Ratios

for Pairwise Sequence Comparisons. Mol. Biol. Evol 31(7), 1902–1913.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a

practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57,

289-300.

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H.,

Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic Acids

Research 28, 235-242.

Chelliah, V., Chen, L., Blundell, T.L., and Lovell, S.C. (2004). Distinguishing structural

and functional restraints in evolution in order to identify interaction sites. J Mol Biol

342, 1487-1504.

Cheng, G., Qian, B., Samudrala, R., and Baker, D. (2005). Improvement in protein

functional site prediction by distinguishing structural and functional constraints on

protein family evolution using computational design. Nucleic Acids Res 33, 5861-5867.

Chothia, C., and Lesk, A.M. (1986). The relation between the divergence of sequence

and structure in proteins. EMBO J 5, 823-826.

Cochran WG. (1954). The combination of estimates from different experiments. Biometrics 10,

101-29

Corbeil, R. R., and Searle, S. R. (1976). Restricted maximum likelihood (REML)

estimation of variance components in the mixed model. Technometrics, 18, 31–38.

Dagan, T., Talmor, Y., and Graur, D. (2002). Ratios of radical to conservative amino

acid replacement are affected by mutational and compositional factors and may not be

indicative of positive Darwinian selection. Mol Biol Evol 19(7), 1022-1025.

Dasmeh, P., Serohijos, A. W. R., Kepp, K. P., & Shakhnovich, E. I. (2014). The

influence of selection for protein stability on dN/dS estimations. Genome Biology and

Evolution 6(10), 2956–2967.

Davidson, F.F., and Dennis, E.A. (1990). Evolutionary relationships and implications

for the regulation of phospholipase A2 from snake venom to human secreted forms. J

Mol Evol 31, 228-238.

DerSimonian, R., and Laird, N. (1986). Meta-analysis in clinical trials. Controlled

Clinical Trials, 7, 177–188.

DerSimonian, R., and Laird, N. (2015). Meta-analysis in clinical trials revisited.

Contemporary Clinical Trials, 45, 139–145.

Dutheil, J.Y., Galtier, N., Romiguier, J., Douzery, E.J., Ranwez, V., and Boussau, B.

(2012). Efficient selection of branch-specific models of sequence evolution. Mol. Biol.

Evol. 29, 1861–1874.

Echave, J., Spielman, S. J., & Wilke, C. O. (2016). Causes of evolutionary rate variation

among protein sites. Nature Reviews Genetics 17(2), 109–121.

Elcock AH. (2001) Prediction of functionally important residues based solely on the

computed energetics of protein structure. J Mol Biol. 28, 885-96

Fitch, W.M., Bush, R.M., Bender, C.A., and Cox, N.J. (1997). Long term trends in the

evolution of H (3) HA1 human influenza type A. Proc Natl Acad Sci U S A 94,

7712-7718.

Fujisawa, D., Yamazaki, Y., Lomonte, B., and Morita, T. (2008) Catalytically inactive

phospholipase A2 homologue binds to vascular endothelial growth factor receptor-2 via

a C-terminal loop region. Biochem. J. 411, 515–522.

Gharib, W.H., and Robinson-Rechavi, M. (2013). The branch-site test of positive

selection is surprisingly robust but lacks power under synonymous substitution

saturation and variation in GC. Mol. Biol. Evol 30, 1675–1686.

Godzik, A., Kolinski, A., and Skolnick, J. (1992). Topology fingerprint approach to the

inverse protein folding problem. J Mol Biol 227, 227-238.

Hatala, R., Keitz, S., Wyer, P., and Guyatt, G. (2005). Tips for learners of

evidence-based medicine: Assessing heterogeneity of primary studies in systematic reviews and

whether to combine their results[J]. CMAJ 172(5), 661–665.

Henikoff, S., and Henikoff, J.G. (1994). Position-based sequence weights. J Mol Biol

243, 574-578.

Higgins, J.P.T., Thompson, S.G., Deeks, J.J., and Altman, D.G. (2003). Measuring

inconsistency in meta-analyses. BMJ 327, 557–560.

Ho, SY., Yu, FC., Chang, CY., and Huang, HL. (2007). Design of accurate predictors

for DNA-binding sites in proteins using hybrid SVM-PSSM method. Biosystems. 90(1),

234-41.

Huang, C., and Yuan, J. (2013). Using radial basis function on the general form of

Chou's pseudo amino acid composition and PSSM to predict subcellular locations of

proteins with both single and multiple sites. Biosystems 113(1), 50-7.

Hughes, A., Ota, T., and Nei, M. (1990). Positive Darwinian selection promotes charge

profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex

molecules. Mol Biol Evol 7(6), 515-524.

Ivanovski, G., Copic, A., Krizaj, I., Gubensek, F., and Pungercar, J. (2000). The amino

acid region 115-119 of ammodytoxins plays an important role in neurotoxicity.

Biochem Biophys Res Commun 276(3), 1229–1234.

Kanehisa, M., and Goto, S. (2000). KEGG: Kyoto Encyclopedia of Genes and

Genomes. Nucleic Acids Res 28(1), 27-30.

Karbat, I., Turkov, M., Cohen, L., Kahn, R., Gordon, D., Gurevitz, M., and Frolow, F.

(2007) X-ray structure and mutagenesis of the scorpion depressant toxin LqhIT2 reveals

key determinants crucial for activity and anti-insect selectivity. J Mol Biol 366, 586–

601

Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: a novel method for

rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res

30, 3059-3066.

Katoh, K., and Toh, H. (2008). Recent developments in the MAFFT multiple sequence

alignment program. Brief Bioinform 9, 286-298.

Kosakovsky Pond, S.L., and Frost, S.D. (2005). Not so different after all: a comparison

of methods for detecting amino acid sites under selection. Mol Biol Evol 22, 1208–

1222.

Kuhlman, B., and Baker, D. (2000). Native protein sequences are close to optimal for

their structures. Proc Natl Acad Sci U S A 97, 10383-10388.

Lynch, V.J. (2007). Inventing an arsenal: adaptive evolution and neofunctionalization of

snake venom phospholipase A2 genes. BMC Evol Biol 7, 2.

Mantel N, Haenszel W (1959). Statistical Aspects of the Analysis of Data from

Retrospective Studies of Disease. Journal of the National Cancer Institute, 22(4), 719–

748.

Massingham, T., and Goldman, N. (2005). Detecting amino acid sites under positive

selection and purifying selection. Genetics 169, 1753–1762.

Meyer AG, Wilke CO. (2013). Integrating sequence variation and protein structure to identify sites under selection. Mol Biol Evol. 30, 36–44.

Nei, M,. and Gojobori, T. (1986). Simple methods for estimating the numbers of

synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418-426

Nielsen, R., and Yang, Z. (1998). Likelihood models for detecting positively selected

amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–936.

Ota, M., Isogai, Y., and Nishikawa, K. (2001). Knowledge-based potential defined for a

rotamer library to design protein sequences. Protein Eng 14, 557-564.

Ota, M., Kinoshita, K., and Nishikawa, K. (2003). Prediction of catalytic residues in

enzymes based on known tertiary structure, stability profile, and sequence conservation.

J Mol Biol 327, 1053-1064.

Porter, C., Bartlett, G., and Thornton, J.M. (2004). The Catalytic Site Atlas: a resource

of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids

Res 32, D129-133.

Prijatelj, P., Copic, A., Krizaj, I., Gubensek, F., and Pungercar, J. (2000). Charge

reversal of ammodytoxin A, a phospholipase A2-toxin, does not abolish its

neurotoxicity. Biochem J. 352 ,251–255.

Rand, D., Weinreich, D., and Cezairliyan, B. (2000). Neutrality tests of

conservative-radical amino acid changes in nuclear- and mitochondrially-encoded proteins. Gene

261(1), 115-125.

Redelings, B. (2014). Erasing Errors due to Alignment Ambiguity When Estimating

Positive Selection. Mol. Biol. Evol 31(8), 1979–1993.

Rodríguez de la Vega, R.C., and Possani, L.D. (2005). Overview of scorpion toxins

specific for Na+ channels and related peptides: biodiversity, structure-function

relationships and evolution. Toxicon 46, 831-844.

Russell, R.B., and Barton, G.J. (1994). Structural features can be unconserved in

proteins with similar folds. An analysis of side-chain to side-chain contacts secondary

structure and accessibility. J Mol Biol 244, 332-350.

Saitou, N., and Nei, M. (1987). The neighbor-joining method: a new method for

reconstructing phylogenetic trees. Mol Biol Evol 4, 406-425.

Sato, K., Morishita, T., Nobusawa, E., Tonegawa, K., Sakae, K., Nakajima, S., and

Nakajima K. (2004) Amino-acid change on the antigenic region B1 of H3

haemagglutinin may be a trigger for the emergence of drift strain of influenza A virus.

Epidemiol Infect 132, 399–406

Suzuki, Y. (2007). Inferring natural selection operating on conservative and radical

substitution at single amino acid sites. Genes Genet Syst 82(4), 341-360.

Suzuki, Y. (2013). Detection of positive-selection eliminating effects of structural

constraints in hemagglutinin of H3N2 human influenza A virus Infection. Genetics and

Evolution 16, 93-98.

Suzuki, Y., and Gojobori, T. (1999). A method for detecting positive selection at single

amino acid sites. Mol Biol Evol 16, 1315-1328.

Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular

Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24(8),

1596-1599.

Tian, C., Yuan, Y., and Zhu, S. (2008). Positively selected sites of scorpion depressant

toxins: possible roles in toxin functional divergence. Toxicon 51, 555-562.

Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. J

Stat Software 36(3), 1-48.

Weinberger, H., Moran, Y., Gordon, D., Turkov, M., Kahn, R., and Gurevitz, M.

(2010). Positions under positive selection--key for selectivity and potency of scorpion

alpha-toxins. Mol Biol Evol 27, 1025-1034.

Wilson, D.J., and McVean, G. (2006). Estimating diversifying selection and functional

constraint in the presence of recombination. Genetics 172, 1411-1425.

Yang, Z. (2006). Computational molecular evolution. Oxford Series in Ecology and

Evolution.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol

Evol 24, 1586-1591.

Yang, Z., and Nielsen, R. (2002). Codon substitution models for detecting molecular

adaptation at individual sites along specific lineages. Mol Biol Evol 19, 908-917.

Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A.-M.K. (2000). Codon-substitution

models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449.

Yang, Z., and Swanson, W. (2002). Codon-substitution models to detect adaptive

evolution that account for heterogeneous selective pressures among site classes. Mol

Biol Evol 19(1), 49-57.

Yang, Z., Wong, W.S.W., and Nielsen, R. (2005). Bayes empirical Bayes Inference of

amino acid sites under positive selection. Mol Biol Evol 22, 1107-1118.

Yusuf, S., Peto, R., Lewis, J., Collins, R., and Sleight, P. (1985). Beta blockade during

and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc

27, 335–371.

Zaheri, M., Dib, L., and Salamin, N. (2014). A Generalized Mechanistic Codon Model.

Mol. Biol. Evol. 31(9), 2528–2541.

Zhang, J. (2000). Rates of conservative and radical nonsynonymous nucleotide

substitutions in mammalian nuclear genes. J Mol Evol 50(1), 56-68.

ドキュメント内タンパク質立体構造情報を用いた正の選択の解析 (ページ 46-74)