Identification of the VERNALIZATION 4 gene reveals the origin of spring growth habit in ancient wheats from South Asia

(1)

Supplementary Information Appendix

Identification of the VERNALIZATION 4 gene reveals the origin of spring growth habit in ancient wheats from South Asia

Nestor Kippes, Juan M. Debernardi, Hans Vasquez-Gross, Bala A. Akpinar, Budak Hikmet, Kenji Kato, Shiaoman Chao, Eduard Akhunov and Jorge Dubcovsky*

* correspondence to: [email protected]

This file includes:

SI Appendix, Methods S1 to S6 SI Appendix, Figures S1 to S14 SI Appendix, Tables S1 to S8 SI Appendix, References

(2)

SI Appendix, Methods

SI Appendix, Method S1. TILLING Mutants Screening

A mutagenized population of 1,153 lines of TDF treated with 0.9% EMS was screened with three different set of primers using a CelI assay described before (1). The first set of PCR primers (TILL-5DS-F1 and TILL-A-DS-R10, Table S8) included the VRN-D4 specific SNP A367C, and amplified a region from exon 4 to 8 only from VRN-D4. Two additional sets of primers, developed in a previous study (2), were used to amplify different regions of the VRN-D4 gene. Primers V1A-5PRIME-F1 and V1-INT1R amplified a region including exon 1, whereas primers VA1-3PRIME-F2 and VA1-3PRIME-R3 amplified a region from exon 3 to 6. Even though these primers amplified both VRN-D4 and VRN-A1, mutations were still detectable with the CelI assay. To determine which of the two genes carry the detected mutations, we extracted RNA from the first leaf of selected mutants and sequenced the resulting cDNAs. Since only VRN-D4 is expressed in unvernalized plants during the early stages of development (Fig. S3), the mutations detected in the cDNAs from the first leaf were assigned to VRN-D4. The potential effect of these mutations on protein structure and function was predicted using programs PROVEAN (3), SIFT (4), and PolyPhen-2 (5).

VRN-D4 specific primers: Using primers TILL-5DS-F1 and TILL-A-DS-R10 (Table S8) we detected 21 mutations. Ten of these mutations were located in the VRN-D4 coding region, and four of them were predicted to produce amino acid changes. Prediction programs consistently identified mutation E158K in the K-box as having the strongest effect among the mutations detected (Table S1). The same prediction programs suggested limited effects for the other three mutations (A180T, D184N and A219T, Table S1). Based on these results the last three mutations were discarded and only the E158K mutation was selected for further phenotypic

characterization. RNAs were extracted from the first leaf of the E158K mutant line and the presence of the causing mutation in the transcripts was validated by sequencing the

corresponding cDNAs.

VRN-D4 and VRN-A1 specific primers: The screening of the TDF TILLING population with

(3)

within the exon 1 region. Two of these mutations were predicted to generate amino acid changes (E12K and E42K) but, unfortunately, both were located in the VRN-A1 gene.

Primers VA1-3PRIME-F2 and VA1-3PRIME-R3 detected 7 mutants in the exons 3-6 region.

One of these mutations was predicted to produce an amino acid change (E129K) but it was located in the VRN-A1 gene. A second mutation resulted in the loss of the donor splicing site of the fourth exon of VRN-D4. The un-spliced intron sequence is predicted to encode two premature stop codons 8 bp and 17 bp downstream of the induced mutation. The first stop codon is

predicted to truncate the protein after position 143. This premature stop codon eliminates 30% of the K-box and more than 40% of the protein and, therefore, will almost certainly result in a non- functional protein. To confirm the presence of these premature stop codons in the VRN-D4 transcripts we extracted RNA from the first leaf of unvernalized mutant and wildtype TDF plants. Amplified VRN-D4 cDNAs were cloned using NEB® PCR Cloning Kit (New England Biolabs, USA) and sequenced by Sanger sequencing. The presence of the un-spliced intron and the two predicted stop codons was detected in the sequences of 16 independent clones from the TDF mutant plant. The sequenced transcripts included the diagnostic A367C SNP, confirming that they were encoded by VRN-D4. By contrast, all the sequences from the 22 clones from wildtype TDF used as control showed the correctly spliced transcripts with the A367C natural SNP corresponding to VRN-D4.

Visualization of the effect of VRN-D4 mutations: We selected 268 protein sequences from PROVEAN (provean.jcvi.org) using a BLAST search with the VRN-D4 protein as query. These protein sequences were aligned using MEGA 6 (6) (http://www.megasoftware.net/) and the multiple alignment file was used to generate Figure S8 using the program WebLogo

(weblogo.berkeley.edu), in which the size of the letters for the different amino acids are proportional to their frequency in the multiple alignment.

SI Appendix, Method S2. BAC Sequencing and Physical Map

We screened the Minimum Tiling Path (MTP) clones of the 5DS physical map constructed from ditelosomic 5DS stock in Chinese Spring (CS-dt5DS) using three sets of primers for VRN-D4 (Table S8). The 5DS BAC library was developed by the Institute of Experimental Botany AS CR and the physical map was generated by the International Wheat Genome Sequencing Consortium

(4)

(IWGSC, www.wheatgenome.org). We identified one BAC (TaeCsp5DShA_0038_M14) that is part of overlapping contigs CTG87 and CTG61 (Table S2). A total of 12 MTP clones from these two contigs were selected for sequencing. Sequencing libraries were prepared using the KAPA LTP Library Preparation Kit according to manufacturer instructions (KAPA Biosystems, USA) with an average insert size of 250 bp. Four additional libraries were prepared for BACs 9-12 (Table S2) with an average insert size of 400 bp to increase the quality of the assembly in this region. Libraries were barcoded, pooled, and sequenced at the Beijing Genomic Institute (Sacramento, CA, USA) using 100-bp paired-end reads (Illumina Hi Seq2500).

Each BAC was assembled using Abyss v1.5.2 (k=63) and combined libraries with 250 and 400- bp insert sizes to assess library coverage (individual BACs coverage was higher than 100X).

Since extensive overlap existed between adjacent BACs, a better assembly was obtained by analyzing together the Illumina reads from groups of four BACs and performing the assembly with Abyss (k=63). Additional contigs generated from 454 sequencing of the same BACs were used to fill some of the gaps between the Illumina contigs. 454 sequencing was carried out on GS FLX Titanium platform (Roche 454 Life Sciences, Branford, CT, USA) using kits and

instructions provided by the manufacturer. The assembled 680-kb sequence was deposited in GenBank (AC270085).

Genes were annotated using the TriAnnot pipeline (7). The resulting assembly was compared using BLAT with the IWGSC sequence of the 5AL chromosome arm

(https://urgi.versailles.inra.fr/blast/), with the A genome of T. urartu (8) and with three different Ae. tauschii databases (http://plants.ensembl.org/Aegilops_tauschii/, GCA_000347335.1 (9), http://aegilops.wheat.ucdavis.edu/ATGSP/, (10), and

https://urgi.versailles.inra.fr/download/iwgsc/TGAC_WGS_assemblies_of_other_wheat_species, TGAC_WGS_tauschii_v1). Hits were filtered to be larger than 500 bp and more than 99%

identical (Fig. 3A). Results were converted to gff format and visualized using IGV (www.broadinstitute.org/igv/).

SI Appendix, Method S3. Marker for the 5DS/5AL Insertion Site

(5)

insertion site”, Fig. 3B). Upstream of this point, the CS-dt5DS sequence was 99.3% identical to Ae. tauschii contig 5265.1, and downstream of this point it was 99.6 % identical to contig IWGSC_5AL_2795346 (chromosome arm 5AL). We designed primers VRND4-ins.F4

(CGTCTACAGAACCAGCTGAC) and VRND4-ins.R3 (GAAAATCAGGGTGTGACAAA) at both sides of this 5DS/5AL insertion point. These primers amplify a 1,440-bp product only in the lines carrying the 5DS/5AL insertion. As a PCR amplification control, we included a second pair of primers that amplifies a gene located in the centromeric region of chromosome arm 5DL

(BJ315664) (11). Primers BJ315664-F (ACTTGCGTTCCATGTTCACACTTGTAGG) and BJ315664-R (GCGCTCCTGGCAGCAGTCTC) amplify a 687-bp product in all lines.

The other border of the 5DS/5AL insertion was identified within a 1.7-kb region located 201,341 to 203,027 bp downstream of VRN-D4 (henceforth “downstream insertion site”, Fig 3B). The CS-dt5DS sequences flanking this region are 98.7% identical to 5AL contig 2801682 and 99.8%

identical to Ae. tauschii contig 280242 (TGAC_WGS_tauschii_v1). We used primers VRND4- ins2.F1 (TATCTACACCCAGCGAGAGA) and VRND4-ins2.R1 (GGCCTACCAAAAACAAAAGT) to amplify a PCR product of 1,283bp. As a positive control we use primers for BE606654, a gene located in chromosome arm 5DS (11). Primers BE606654-F (ACCGACAAGAACTCCTAGAT) and BE606654-R (CAAAAGCATCGCAGAGAAACAC), amplify a 534-bp product in all lines.

PCR reactions were conducted in a total volume of 20 μl containing 10 mM Tris–HCl, PH 8.3, 50 mM KCl, 2 mM MgCl2, 0.2 mM dNTP, 0.5 μM of each primer for the insertion and 0.05 μM of each control primer, 50–100 ng of genomic DNA, and 0.5 U Taq DNA polymerase. PCR amplification was performed as follows. 94°C for 5 m, 8 cycles of initial touch down starting at 65°C (-0.5°C per cycle), followed by 32 cycles of 94 °C for 30 s, 61 °C for 30 s, and 72 °C for 90 s, with a final extension at 72 °C for 7 min.

SI Appendix, Method S4. RNA-EMSA

RNA probes (27 nt and 34 nt) biotinylated at the 3’ were acquired from SIGMA-ALDRICH (St.

Louis, MO USA). Purified RNase-free proteins GST-TaGRP2 and GST (as control) were used in the binding assays. GST-TaGRP2 was cloned in the pGEX-6P-1 vector, expressed in E. coli, and purified using GST SpinTrap Purification Module (GE Healthcare) (Fig. S9A). RNA-

Electrophoretic Mobility Shift Assays (EMSA) were performed according to the LightShift®

(6)

Chemiluminescent RNA EMSA Kit instructions (Pierce, USA), including 5% DMSO in the binding reaction. To avoid variation in protein quantity between binding reactions, a master mix with GST-TaGRP2 and all the binding components, except RNA probes, was prepared and aliquoted. We also included a pre-incubation step of the free RNA probes for 3 minutes at 80 °C to favor complete denaturation of the RNA structure. Samples were then allowed to re-nature slowly at room temperature before adding to the binding reaction. A separate experiment was performed by transferring the samples from room temperature to 4°C and performing the binding reaction at this temperature to simulate vernalization conditions. Probe sequences are listed in Table S8. RNA structure for the 27-nt and 34-nt probes and for the larger flanking region

(including 200 nt from each side of the T. monococcum RIP-3) were generated with the program RNAfold (12) using default parameters.

SI Appendix, Method S5. Population Structure analysis

To determine if the absence of VRN-D4 in two lines classified as T. aestivum ssp.

sphaerococcum from China (CItr 8610 and CItr 10911, Supplemental Table S6) was the result of introgression from other T. aestivum subspecies, we performed a population structure analysis.

The analysis included 100 T. aestivum lines from six different subspecies: T. aestivum ssp.

aestivum (10 spring and 10 winter accessions), T. aestivum ssp. compactum (17 accession), T.

aestivum ssp. macha (15 accession), T. aestivum ssp. spelta (15 accession), and T. aestivum ssp.

sphaerococcum (33 accessions, including the two conflictive accessions from China) (Table S6).

A total of 16,371 polymorphic SNP were used to determine the population structure (13), using a model-based method (Bayesian clustering) implemented in STRUCTURE v.2.3.4 (13). The SNP data for the 100 accessions listed in Table S6 was deposited in the Triticeae Toolbox website (https://triticeaetoolbox.org/wheat/). The number of subpopulations (K) tested varied from 2 to 8.

For each K value 10 runs were repeated separately, starting with a burn-in period of 5,000 iterations, and followed by a sampling phase of 10,000 replicates. To detect the numbers of groups that better fit with the population structure, a plot of the mean likelihood values per K was produce using STRUCTURE HARVESTER v 0.6.93 (14). The kinship matrix was

(7)

SI Appendix, Method S6. Genetic Diversity

Variability at each locus was determined using the Polymorphism Index Content (PIC) (15). PIC values were calculated separately for each of the five subspecies.





 ⁿ

i

pi

PIC 1 ² (where piis the frequency of the i^th allele)

Genetic diversity was estimated using Weir’s formula (16) that is equivalent to an average PIC value,







l i

pi

D L1 ²

1

where p is the frequency of the i allele at the l locus and L is the number of loci. PIC values for each subspecies were averaged across genomes (Table S7), chromosomes (Fig. S12), and chromosome regions (Fig. 5E). To reduce the effect of ascertainment bias in the last two parameters, the average diversity per chromosome or chromosome region (for each of the

subspecies) was divided by the corresponding genome average in Table S7. This ratio is referred to as “standardized genetic diversity”. The values from Table S7 cannot be used to compare diversity levels among subspecies because many of the SNPs used in this study were specifically selected for higher minor allele frequencies in T. aestivum ssp. aestivum (17). These values were calculated only to standardize the average genetic diversity values per chromosome and

chromosome region. These studies were performed using a subset of 14,236 SNP for which chromosome locations were available.

(8)

SI Appendix, Figures

Figure S1. Levels of expression of the three VRN1 homoeologs. Expression of the three different VRN1 homoeologs in leaves. (A) Three-week old plants. (B) Five-week old plants. The results for week 1 are presented in the main text (Fig. 1). Bars represent means calculated from eight biological replications. Error bars represent standard errors of the mean (SEM). Growth habit and alleles for the different lines are: TDC (winter, vrn-A1 vrn-B1 vrn-D1 vrn-D4), TDF (spring, vrn-A1 vrn-B1 vrn-D1 Vrn-D4), TDD (spring, Vrn-A1 vrn-B1 vrn-D1 vrn-D4), and BW (Bobwhite, spring, Vrn-A1 Vrn-B1 Vrn-D1 vrn-D4). NS = non-significant, * = P < 0.05, ** = P

< 0.01, and *** = P < 0.001.

A B

(9)

Figure S2. Detection of the VRN-D4 A367C SNP. (A) Sequencing chromatograms showing the A367C polymorphism in lines carrying the dominant Vrn-D4 allele (TDF and Gabo) and its absence in the lines carrying the recessive vrn-D4 allele (CS5402, CS: Chinese Spring and TDC). (B) Chromatograms including the A367C polymorphisms in plants of the TDF x CS5402 high-density mapping population with the closest recombination events to VRN-D4 (10). (C) Marker for the 5AL/5DS insertion site upstream of VRN-D4 (1,440-bp band). (D) Marker for the 5AL/5DS insertion site downstream of VRN-D4 (1,283-bp band). Note that all lines carrying the A367C polymorphism show amplification of the 1,440-bp and the 1,283bp band. H= progeny segregating for growth habit (heterozygous VRN-D4 vrn-D4), early= all progeny early

(homozygous Vrn-D4), late= all progeny late (homozygous vrn-D4), control= PCR amplification from a separate set of primers that produced a smaller PCR product in all lines (Supplementary Method 3).

A B

C D

(10)

Figure S3. Expression of VRN1 paralogs from chromosome arms 5AL (VRN-A1) and 5DS (temporary name VRN-5DS). RNA was extracted one, two, and three weeks after germination from leaves of TDC (winter, vrn-A1 vrn-B1 vrn-D1 vrn-5DS), TDF (spring, vrn-A1 vrn-B1 vrn- D1 Vrn-5DS), TDD (Vrn-A1 vrn-B1 vrn-D1 vrn-5DS) and BW (Bobwhite, Vrn-A1 Vrn-B1 Vrn- D1 vrn-5DS). RT-PCR with primers cDNA-VRN1-A-5DS forward and reverse (Table S8) amplified both VRN-A1 and VRN-5DS. The two transcripts can be differentiated by the A367C polymorphism, which generates an additional BstNI restriction site in VRN-5DS. After BstNI digestion, the 251-bp fragment detected in VRN-A1 (arrowhead) is cut into two fragments of 195 bp and 56 bp in VRN-5DS (arrows). In TDF, where VRN-5DS and vrn-A1 are together, only VRN-5DS transcripts (“C” variant at position 367) are detected during the first three weeks. At five weeks, a faint band for VRN-A1 is also detected in TDF (“A” variant at position 367). TDC has a winter growth habit so no VRN1 gene is expressed. Three biological repetitions per genotype were tested. ACTIN (18) was used as control for PCR amplification and to confirm similar cDNA levels in all the reactions. These results support the hypothesis that VRN-5DS is VRN-D4.

(11)

VRN-D4 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 Chinese_Spring 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 Jagger-JQ915055.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 Claire-JF965395.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 2174-JQ915056.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 TDC-AY747600.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVVSL 82 Hereward-JF965397.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 Malacca-JF965396.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 Tm_G2528-AY244509.2 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 T.urartu 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 T.turgidum-AAW73219 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 Tm_DV92-AY188331.1 1 MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVS 82 VRN-D4 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLQELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Chinese_Spring 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Jagger-JQ915055.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Claire-JF965395.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 2174-JQ915056.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDFESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 TDC-AY747600.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Hereward-JF965397.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Malacca-JF965396.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Tm_G2528-AY244509.2 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 T.urartu 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 T.turgidum-AAW73219 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 Tm_DV92-AY188331.1 81 SESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEE 164 VRN-D4 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 Chinese_Spring 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 Jagger-JQ915055.1 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 Claire-JF965395.1 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 2174-JQ915056.1 161 NKVLQKELVEKQKAHVAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 TDC-AY747600.1 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 24 Hereward-JF965397.1 161 NKVLQKELVEKQKAHVAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 Malacca-JF965396.1 161 NKVLQKELVEKQKAHVAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 AY244509.2-Tm_G2528 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAAAGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 T.urartu 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAAAGERAEDAAVQPQAPPRTGPPPWMVSHING* 245 T.turgidum-AAW73219 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAATGERAEDAAVQPQAPPRTGLPPWMVSHING* 245 Tm_DV92-AY188331.1 161 NKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAAAGERAEDAAVQPQAPPRTGLPPWMVSHING* 245

Figure S4. VRN-D4 protein sequence analysis. Multiple sequence alignment between the predicted protein sequence of VRN-D4 and the publicly available VRN1 protein sequences.

Highlighted in red is the K123Q amino acid change characteristic of VRN-D4. The K-Box domain is highlighted in light gray. T. urartu sequence was obtained from plants.ensembl.org (scaffold60538). The complete Chinese Spring VRN-A1 (GenBank KR422423) and VRN-D4 (GenBank KR422424) sequences were obtained in this study.

(12)

Figure S5. Conservation of amino acids affected by natural and induced mutations in VRN- D4. A multiple alignment of 268 sequences including AP1, AG, FUL, CAULIFLOWER, VRN1 and other related MAD-BOX proteins was used to generate a graphical representation of the conservation at each amino acid position. We used the WebLogo tool (weblogo.berkeley.edu) that generates a graph where the size of each amino acid letter is proportional to its frequency at that position. The region shown covers amino acids 115 to 175 (VRN-D4 coordinates) in the K- Box domain. Arrows indicate the natural amino acid change K123Q and the EMS induced amino acid change E158K. Note that the amino acids resulting from these mutations are not detected among the 268 closest available sequences for this protein domain.

(13)

Figure S6. PCR amplification of VRN-A1 flanking genes from chromosome arm 5DS. DNA extracted from flow-sorted 5DS chromosome arms arm (Institute of Experimental Botany, Olomouc, Czech Republic) was used to test if genes flanking VRN-A1 on chromosome arm 5AL (18) were also present in the 5AL segment inserted in chromosome arm 5DS. CS: Chinese Spring, 5DS: 5DS arm DNA, TDC: Triple Dirk C. Using the same 5DS DNA sample for all primers, only the VRN-A1 marker was amplified, indicating that PHYC, CYS and AGLG1 are not present in 5DS. Multiple bands in CYS PCR amplification corresponds to multiple copies of CYS genes, none of them present in the 5DS arm DNA. A BLAST search of the CS-dt5DS database showed no genes with significant similarity to PHYC, CYS and AGLG1, confirming the PCR results.

900 bp 750 bp

CS 5DS TDC CS 5DS TDC CS 5DS TDC CS 5DS TDC VRN1 PHYC CYS AGLG1

(14)

Figure S7. VRN-D4 phylogeny. Neighbor-Joining tree based on genomic DNA sequences of the VRN-A1 / VRN-D4 gene regions (from 1000-bp upstream of start codon to 1000-bp downstream of stop codon) estimated using MEGA6 (6). Numbers in the nodes of the tree indicate bootstrap confidence values based on 5000 subsamples (% of tress showing that node).

Except for the sequence identified as VRN-D4 all other sequences correspond to VRN-A1.

Numbers indicate GenBank accession numbers, except for T. urartu, where they indicate scaffold numbers in the genome sequence assembly (plants.ensamble.org, GCA_000347455.1).

(15)

Figure S8. RIP-3 binding region in VRN-1 and VRN-D4 first intron. (A) Published binding site of Arabidopsis GRP7 (19) shares 8 identical nucleotides with the RIP-3 binding site for TaGRP2 in VRN1 and VRN-D4 first intron. (B) The canonical RIP-3 sequence is found in VRN- B1, VRN-D1 and VRN-A^m1 (T. monococcum), whereas VRN-A1 in T. urartu and all tetraploid and hexaploid wheats analyzed in Table S3 carry the T2783C mutation. The RIP-3 haplotype with the 3 SNPs has been found in VRN-D4 and in the VRN-A1 alleles from Chinese Spring, Jagger and Claire. Arrows in RIP-3 indicate polymorphisms at positions 2780, 2783 and 2784 (counting from the ATG of VRN-D4). Names of sequences include GenBank accession numbers.

A

B

(16)

Figure S9. TaGRP2 binding to the RIP-3 region in VRN1/VRN-D4 first intron. (A) GST- TaGRP2 expression in E. coli, Te: total extract, Pe: protein elution. (B) RNA-electrophoretic mobility shift assay (REMSA) showing specific interaction of TaGRP2 with a 27-nt RNA RIP-3 probe published before (20). Two concentration of TaGRP2 were tested (1= 1µM and 2= 2µM) (C) REMSA showing interactions of TaGRP2 with the three different 27-nt RNA probes: a strong interactions with the canonical RIP-3, a reduced interaction with the RIP-3^VRN-A1 probe (SNP T2783C), and a very weak interaction with the RIP-3^VRN-D4 probe (SNPs G2780C, T2783C and C2784T, also shared by VRN-A1 in Jagger, Claire and CS). (D) Predicted secondary

structures of the three 27-nt probes using the RNA fold tool (12). Arrows indicate the position of the three SNPs counted from the VRN-D4 start codon (2780, 2783 and 2784).

55kDa

35kDa

(17)

Figure S10. Collection site of the T. aestivum ssp. sphaerococcum entries from the NSGC.

Colors represent each geographical location according to the collection site coordinates available at GRIN (www.ars-grin.gov/npgs/searchgrin.html). Points were positioned according to the collection site coordinates using Google Maps (maps.google.com). All entries carry VRN-D4 with the exception of CItr 8610 and CItr 10911, both collected in east-China (light gray).

(18)

Figure S11. Heat map of a kinship matrix of 100 T. aestivum accessions (5 subspecies). This study was based on 16,371 SNPs. Red= high similarity. Numbers to the right indicate line IDs, described in Table S6. (A) In the outer bar in gray, the dark gray color indicates the presence of VRN-D4 spring allele and the light gray its absence. (B) In the color bar, T. aestivum ssp.

aestivum accessions with a spring growth habit are indicated in green and those with a winter growth habit in light blue. T. aestivum ssp. compactum is indicated in orange, T. aestivum ssp.

sphaerococcum in dark blue, T. aestivum ssp. macha in yellow, and T. aestivum ssp. spelta in pink. Asterisks indicate the two accessions from China described as T. aestivum ssp.

(19)

Figure S12. Genetic diversity by subspecies and chromosomes. (A) Standardized genetic diversity* across complete chromosomes. Note that in T. aestivum ssp. sphaerococcum (sphaero.) chromosome 5D is the one with the lowest average diversity, and also that in the other four

subspecies the same chromosome shows higher diversity. (B) Standardized genetic diversity values for the distal and proximal regions of chromosomes from the D genome. Note that none of the other D genome chromosomes of T. aestivum ssp. sphaerococcum shows a reduction in standardized genetic diversity as severe as the one observed in the proximal region of

chromosome 5D.

* Genetic diversity was first calculated for each locus (14,236 SNP) and each subspecies using the Polymorphism Information Content (PIC) formula. PIC values were then averaged for each chromosome. To reduce the effect of ascertainment bias, the average diversity of each chromosome was divided by the average genetic diversity of the corresponding genome/subspecies (“standardized diversity”, Table S7, Supplementary Method S6).

(20)

Figure S13. Fixation indexes along the 5D chromosome of T. aestivum spp. sphaerococcum.

FST values were calculated using 120 polymorphic SNPs mapped on chromosome 5D (17). The arrow head indicates the predicted position of the centromere based on the arm location of the SNP markers. VRN-D4 is linked to the centromere of the chromosome 5D. (The horizontal dashed line represent the 95^th percentile of the FST values on this chromosome = 0.57).

(21)

Figure S14. Fixation indexes across T. aestivum spp. sphaerococcum chromosomes. Mean FST values were calculated using a sliding window of 7 mapped SNPs (17) with steps of 4 SNPs to generate smoother curves. The horizontal dashed line represent genome-specific 95^th

percentile of mean FST values (A genome=0.53, B genome = 0.53, and D genome = 0.40).

(22)

SI Appendix, Tables

Table S1. Analysis of natural and induced mutations found in VRN-D4.

Natural Induced mutations

mutations exon 4 exon 6 exon 7 exon 8

Line TDF Till-665 Till-526 Till-174 Till-481 Till278

GenBank KR422424 KR119063 KR119062 KR119061 KR119060 KR119059

nt. change A367C G10511A G472A G538A G550A G655A

aa. change K123Q Splice site E158K A180T D184N A219T

PROVEAN^a -3.527 - -3.292 -0.848 0.158 -0.306

PolyPhen-2^b 0.999 - 0.999 0.045 0.026 0.002

SIFT^c ^Not

tolerated - Not

tolerated tolerated tolerated tolerated

PSSM^d 1 - -3 After

K-box

After

K-box After K-box

BLOSUM62 1 - 1 -1 1 -1

Nucleotide positions are calculated from the start codon in the VRN-D4 coding sequence, except for the G10511A splicing site mutation that is indicated relative to the ATG in the genomic DNA sequence. Amino acid positions in the protein are indicated relative to the initial methionine. The letter before the number indicates the wild type allele and the letter after the number the mutant allele.

a PROVEAN scores were calculated at provean.jcvi.org. Scores < -2.5 are predicted to affect protein function.

b Phylophen-2 scores were calculated using genetics.bwh.harvard.edu/pph2/. Values close to 0 suggest limited effects on protein function, and values closer to 1 suggest more significant effects on protein structure and function.

c SIFT was calculated at sift.jcvi.org and its scores are finally summarized into a tolerated or non-tolerated score.

d Position-Specific Scoring Matrix (PSSM) scores are based on the conservation of amino acid positions among multiple aligned proteins. Scores below zero indicate amino acid changes that are predicted to have significant effects on the protein function. The PSSM score was calculated only for mutations within the K-box domain (pfam01486) and the VRN-A1 protein sequence (AAW73221.1) at NCBI PSSM viewer

(www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi).

(23)

Table S2. List of BACs from the Chinese Spring 5DS arm library selected for sequencing.

The BAC library was developed at the Institute of Experimental Botany (Czech Republic) and their nomenclature is followed. The sequence analysis confirmed that contigs CTG87 and CTG61 are connected.

No. in Figure 3 BAC Name Contig 1 TaeCsp5DShA_0045_O10 CTG87 2 TaeCsp5DShA_0085_P15 CTG87 3 TaeCsp5DShA_0056_K08 CTG87 4 TaeCsp5DShA_0076_N19 CTG87 5 TaeCsp5DShA_0038_M14 CTG87 6 TaeCsp5DShA_0036_J13 CTG87 7 TaeCsp5DShA_0082_O19 CTG61 8 TaeCsp5DShA_0043_I14 CTG61 9 TaeCsp5DShA_0035_M17 CTG61 10 TaeCsp5DShA_0058_E14 CTG61 11 TaeCsp5DShA_0086_K20 CTG61 12 TaeCsp5DShA_0039_M05 CTG61

(24)

Table S3. Accessions sequenced for the VRN1 RIP-3 region. ‘PI’ and ‘CItr’ numbers are from GRIN identifiers, ‘UH’ numbers are from the University of Haifa wheat germplasm collection (21, 22), and ‘MG’ and ‘ IDG’ are from the University of Bologna T. dicoccum collection.

Wheat Location Identifiers

T. urartu El Beqaa, Lebanon PI 428275, PI 428280, PI 428282, PI 428283, PI 428285, PI 428286, PI 428287, PI 428288PI 428291, PI 428292, PI 428294, PI 428295, PI 428296, PI 428304, PI 428321, PI 428327, PI 538741, PI 428279, PI 538740, PI 428293, PI 428276 Urfa, Turkey PI 428222, PI 428223, PI 428240, PI 428244, PI 428251, PI 428239, PI 538733 Mardin, Turkey PI 428201, PI 428202, PI 428228

T. dicoccoides Beir‐Oren, Israel UH 28 Daliyya, Israel (south Haifa) UH 29

Diyarbakir, Turkey PI 428028, PI 428041 Gamla, Golan Heights UH 08

Iraq UH 41, UH 42

J'aba, Mid‐north Israel UH 23

Kokhav‐Hashahar UH 19

Mt. Gerizim, Israel UH 17

Rosh‐Pinna UH 09

Seryia UH 40

Golan Heights UH 11

Yabad, West Bank UH 32

Yehudiyya UH 07

T. dicoccum Armenia MG 5274

Ethiopia MG 5306

Georgia MG 5314, MG 5315

Germany MG 5398

Iran MG 5273, MG 5275, MG 5276

Italy IDG 8634

Kenia MG 3428, MG 3430

Palestine PI 355496

Russia MG 5269, MG 5270

Turkey PI 94626

T. aestivum Canada Norstar (CItr 17735), Fife (PI 283820)

Argentina Barleta (CItr 8398)

Australia Falcon (PI 292578), Golden drop (PI 92399)

Sweden Kronen (PI 278526)

United States Little club (CItr 4066), Sterling (CItr 17859), Finch (PI 628640), Alpowa (PI 566596), Louise (PI 634865), Eltan (PI 536994)

(25)

Table S4. Geographical location of accessions postulated to carry VRN-D4 in previous studies. The presence of markers for the 5AL/5DS insertion borders, and the RIP-3^VRN-D4is indicated by a “+” and its absence by a “-”. VRN-A1 copy number variation (CNV) is described in Material and Methods. VRNA1 tandem duplication was detected as the presence/absence of a C/T double peak at the position 349 of the VRN-A1 coding region as described before (23). TDF, GABO, CS5402 and TDC were included as controls. Name of lines correspond to the previously published studies (24, 25). Names starting with ‘N’ correspond to the Institute of Cytology and Genetics (Novosibirsk, Russia).

Name Upstream border

Downstream border

VRN-D4 SNP A367C

RIP-3

VRN-D4 CNV VRN-A1 tandem

duplication [*] Location

TDF + + A/C + 2 absent GABO (N6827) + + A/C + 3 absent Australia

CS5402 - - A (+)¹ 1 absent TDC - - A - 1 absent CL035 - - A (+)¹ 1 absent China (south east)

IL193 - - A (+)¹ 1 absent Ethiopia CL030 - - A - 2 present China (south east)

IL346 - - A - 2 present Afghanistan (west) IL165 + + A/C + 3 absent Egypt IL430 + + A/C + 2 absent Pakistan (north)

SS23 + + A/C + 3 absent Australia IL027 + + A/C + 2 absent Afghanistan IL047 + + A/C + 2 absent Turkey IL114 + + A/C + 2 absent Nepal IL154 + + A/C + 4 absent Pakistan (south) IL163 + + A/C + 3 absent Egypt IL213 + + A/C + 2 absent Nepal IL216 + + A/C + 2 absent Nepal GR007 + + A/C + 2 absent India GR014 + + A/C + 2 absent India GR015 + + A/C + 2 absent India GR023 + + A/C + 2 absent India IL164 + + A/C + 3 absent Egypt IL175 + + A/C + 3 present Bhutan IL176 + + A/C + 3 present Bhutan IL008 + + A/C + 3 present Nepal IL431 + + A/C + 4 present Pakistan (north) GR001 + + A/C + 3 present India

N7451 + + A/C + 2 absent N6721 + + A/C + 3 present 1 These accessions carry the RIP-3^VRN-D4 SNPs in the VRN-A1 gene.

(26)

Table S5. VRN1 and VRN-D4 spring alleles present in the T. aestivum ssp. sphaerococcum accessions used in this study.

VRN1 promoter VRN1 intron deletion VRN-D4

Name VRN-

A1a¹ VRN-

B1b¹ VRN-

A1c¹ VRN-

B1a¹ VRN-

D1a¹ VRN-B3¹ Presence² Expression 1^st week³ CItr 4528 absent absent absent absent absent absent present Yes PI 337997 absent absent absent absent absent absent present Yes CItr 4530 absent absent absent absent absent absent present Yes PI 277164 absent absent absent absent absent absent present Yes PI 324492 absent absent absent absent absent absent present Yes PI 277141 absent absent absent absent absent absent present Yes PI 277165 absent absent absent absent absent absent present Yes PI 115818 absent absent absent absent absent absent present Yes PI 323439 absent absent absent absent absent absent present Yes PI 277142 absent absent absent absent absent absent present Yes CItr 9054 absent absent absent absent absent absent present Yes PI 324491 absent absent absent absent absent absent present Yes PI 282452 absent absent absent absent absent absent present Yes PI 282451 absent absent absent absent absent absent present Yes PI 182118 absent absent absent absent absent absent present Yes PI 272581 absent absent absent absent absent absent present Yes PI 352499 absent absent absent absent absent absent present Yes PI 190982 absent absent absent absent absent absent present Yes

CItr 8610 absent absent absent absent present absent absent NT

CItr 10911 absent absent absent absent present absent absent NT

PI 278650 absent absent absent absent absent absent present Yes CItr 4529 absent absent absent absent absent absent present Yes PI 40944 absent absent absent absent absent absent present Yes CItr 12212 absent absent absent absent absent absent present Yes PI 191301 absent absent absent absent absent absent present Yes PI 272580 absent absent absent absent absent absent present Yes PI 352498 absent absent absent absent absent absent present Yes PI 42013 absent absent absent absent absent absent present Yes CItr 4924 absent absent absent absent absent absent present Yes PI 330556 absent absent absent absent absent absent present Yes

K-5498 absent absent absent absent absent absent present Yes

1VRN-A1a (26)= duplication in promoter region. VRN-B1b (27) & VRN-B3 (= FT-B1) (28) = retroelement insertions in promoter. VRN-A1c, VRN-B1a and VRN-D1a (29) = intron deletions. Names starting with ‘PI’ or ‘CItr’

(27)

Table S6. Description of T. aestivum accessions used for the population structure analysis.

PI numbers correspond to the Germplasm Resources Information Network (GRIN) identifiers (www.ars-grin.gov/npgs) and ‘K’ numbers correspond to the Novosibirsk Institute catalog numbers. T. aestivum spring and winter lines were described before (17). The last two columns indicate the identification numbers used in Fig. 5B and SI Appendix, Fig. S11.

Identifier Description Location Growth habit Fig. 5B Fig. S11

PI 272580 sphaerococcum Pest Hungary ‐ 1 94

PI 352499 sphaerococcum India ‐ 2 96

PI 330556 sphaerococcum England United Kingdom ‐ 4 97

Tandem aestivum winter 5 1

Gent aestivum winter 6 2

Colt aestivum winter 7 3

Chisholm aestivum winter 8 5

Custer aestivum winter 9 6

OK05526 aestivum winter 10 4

Bronze aestivum winter 11 10

PI 352299 compactum ‐ 12 13

Antelope aestivum ‐ 13 8

OK04505 aestivum winter 14 7

OK05723W aestivum winter 17 9

PI 572913 macha ‐ 18 63

PI 542466 macha ‐ 23 61

PI 352466 macha ‐ 24 60

PI 361768 spelta ‐ 25 17

PI 355508 macha ‐ 26 62

PI 324491 sphaerococcum Delhi India ‐ 29 23

Cltr 4528 sphaerococcum Punjab Pakistan ‐ 32 81

PI 324492 sphaerococcum Delhi India ‐ 35 69

(28)

Table S6. Continuation.

PI 277141 sphaerococcum Germany ‐ 36 79

PI 277165 sphaerococcum Punjab Pakistan ‐ 37 80

PI 115818 sphaerococcum Punjab India ‐ 38 75

Cltr 9054 sphaerococcum Iraq ‐ 39 66

PI 272581 sphaerococcum Pest Hungary ‐ 41 67

Cltr 12212 sphaerococcum Kansas United States ‐ 43 78

PI 191301 sphaerococcum Portugal ‐ 44 77

K‐5498 sphaerococcum ‐ 46 73

PI 182118 sphaerococcum Sind Pakistan ‐ 50 71

PI 42013 sphaerococcum Uttar Pradesh India ‐ 51 84

Cltr 4924 sphaerococcum Uttar Pradesh India ‐ 52 86

PI 367200 spelta ‐ 55 87

PI 367199 spelta ‐ 57 88

Cltr 8610 sphaerococcum Jiangsu China ‐ 58 90

Cltr 10911 sphaerococcum Shanxi China ‐ 59 91

PI 572912 macha ‐ 61 50

PI 290507 macha ‐ 62 54

PI 355511 macha ‐ 63 52

PI 278660 macha ‐ 64 55

PI 323436 macha ‐ 65 51

PI 572905 macha ‐ 66 53

PI 355514 macha ‐ 67 56

PI 611470 macha ‐ 68 57

PI 361862 macha ‐ 69 59

PI 428146 macha ‐ 70 58

(29)

Table S6. Continuation.

WA8100 aestivum Spring 71 29

Tara2002 aestivum Spring 72 32

Macon aestivum Spring 73 34

H0900009 aestivum Spring 74 33

Alturas aestivum Spring 75 30

IDO377s aestivum Spring 76 28

H0800080 aestivum Spring 78 35

9245 aestivum Spring 79 25

9248 aestivum Spring 80 26

PI 190982 sphaerococcum Belgium ‐ 81 99

PI 323439 sphaerococcum Vienna Austria ‐ 82 98

PI 278650 sphaerococcum United Kingdom ‐ 83 100

Treasure aestivum Spring 85 27

PI 323438 spelta ‐ 88 40

PI 295059 spelta ‐ 89 37

PI 306550 spelta ‐ 90 36

PI 428178 macha ‐ 91 39

PI 355642 spelta ‐ 92 42

PI 631161 spelta ‐ 93 43

PI 348303 spelta ‐ 94 41

PI 221419 spelta ‐ 95 38

PI 192717 spelta ‐ 96 44

PI 348033 spelta ‐ 97 45

PI 348273 spelta ‐ 98 48

PI 378469 spelta ‐ 99 46

PI 469032 spelta ‐ 100 47

(30)

Table S7. Average genetic diversity by genome for five T. aestivum subspecies. A total of 14,263 mapped SNPs were selected from the 16,371 polymorphic SNPs identified in the 90K iSelect Illumina SNP chip (17) to test the genetic diversity of each genome.

No. of SNPs sphaerococcum macha spelta aestivum* compactum*

A genome 5675 0.202 0.216 0.226 0.388 0.390

B genome 7344 0.243 0.189 0.215 0.388 0.382

D genome 1217 0.269 0.156 0.283 0.385 0.354

*SNPs were developed in T. aestivum ssp. aestivum. Therefore, the higher diversity observed in this subspecies and in the related T. aestivum ssp. compactum, can be influenced by ascertainment bias. These numbers were used only to adjust average diversity values per chromosome and to provide standardized diversity values.

(31)

Table S8. PCR primers and RNA-probes used in this study.

Description Name Sequence

VRN-D4 cDNA

amplification cDNA-VRN1-A-5DS-F GTTCTCCACCGAGTCATGTA cDNA-VRN1-A-5DS-R TGTGGCTCACCATCCACG 5DS-BAC library 5P-BAC-F1 GCTCAAGAGGTAGCAATTCC screening 5P-BAC-R1 CATTTTCTTATTGGGAACCGT 3P-BAC-F1 CAAAATATGGAAGGGTGGAAG 3P-BAC-R1 TAGGTACCTGCACACACACA 3P-BAC-F2 ACAACGTAGATGCACACACA 3P-BAC-R2 CTCCATTGAATAGCTTGTCG VRN-D4 TILLING

Screening TILL-5DS-F1 GATCTTGAATCTTTGAATCgCC*

TILL-A-DS-R10 GCTGCAGCTTGCTACTTTAC

VRN-D4 A367C SNP detection**

VA1-3PRIME-F2 GCCTATTTGTAGCATTTCTGTCATT VRN-A1-CAPS-R2 GAACATCTCAGTCTAGAATCTGAT RIP-3 genome A

sequencing RIP-3-F AATCACACCTCAGGATTTCAT RIP-3-R GATGGGTCATAAGGTTTTGC

RNA-EMSA probes RIP-3 GUCAUUGUUGUUGGUAUGGAUCGUAUG-Biotin RIP-3^VRN-A1 GUCAUUGUUGUUGGUAUGGACCGUAUG-Biotin RIP-3^VRN-D4 GUCAUUGUUGUUGGUAUCGACUGUAUG-Biotin RIP-3L GUCAUUGUUGUUGGUAUGGAUCGUAUGCCAAAAA-

Biotin

RIP-3L^VRN-A1 GUCAUUGUUGUUGGUAUGGACCGUAUGCCAAAAA- Biotin

RIP-3L^VRN-D4 GUCAUUGUUGUUGGUAUCGACUGUAUGCCAAAAA- Biotin

* Lowercase letter indicates a SNP introduced to increase PCR specificity ** Primers for genomic DNA were design to be VRN- A1 specific ¹,this primers amplifiy both VRN-A1 and VRN-D4 transcripts that are then differentiated by BstNI difgestion.

(32)

References

1. Uauy C, et al. (2009) A modified TILLING approach to detect induced mutations in tetraploid and hexaploid wheat. BMC Plant Biol 9:115-128.

2. Chen A, Dubcovsky J (2012) Wheat TILLING mutants show that the vernalization gene VRN1 down-regulates the flowering repressor VRN2 in leaves but is not essential for flowering. PLoS Genet 8:e1003134.

3. Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688.

4. Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863-874.

5. Adzhubei IA, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248-249.

6. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725-2729.

7. Leroy P, et al. (2012) TriAnnot: a versatile and high performance pipeline for the automated annotation of plant genomes. Front Plant Sci 3:1-14.

8. Ling HQ, et al. (2013) Draft genome of the wheat A-genome progenitor Triticum urartu.

Nature 496:87-90.

9. Jia JZ, et al. (2013) Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496:91-95.

10. Luo MC, et al. (2013) A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor. Proc Natl Acad Sci USA 110:7940-7945.

11. Kippes N, et al. (2014) Fine mapping and epistatic interactions of the vernalization gene VRN-D4 in hexaploid wheat. Theor Appl Genet 289:47–62.

12. Gruber AR, Lorenz R, Bernhart SH, Neuboock R, Hofacker IL (2008) The Vienna RNA websuite. Nucleic Acids Res 36:W70-W74.

13. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using

(33)

14. Earl DA, Vonholdt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour 4:359-361.

15. Anderson JA, Churchill GA, Autrique JE, Tanksley SD, Sorrells ME (1993) Optimizing parental selection for genetic-linkage maps. Genome 36:181-186.

16. Weir BS (1996) Genetic data analyisis II (Sinauer Publishers Sunder-land, MA).

17. Wang S, et al. (2014) Characterization of polyploid wheat genomic diversity using a high-density 90,000 SNP array. Plant Biotechnol J 12:787-796.

18. Yan L, et al. (2003) Positional cloning of wheat vernalization gene VRN1. Proc Natl Acad Sci USA 100:6263-6268.

19. Leder V, et al. (2014) Mutational definition of binding requirements of an hnRNP-like protein in Arabidopsis using fluorescence correlation spectroscopy. Biochem Bioph Res Co 453:69-74.

20. Xiao J, et al. (2014) O-GlcNAc-mediated interaction between VER2 and TaGRP2 elicits TaVRN1 mRNA accumulation during vernalization in winter wheat. Nat Commun 5:4572.

21. Nevo E, Beiles A (1989) Genetic diversity of wild emmer wheat in Israel and Turkey - structure, evolution, and application in breeding. Theor Appl Genet 77:421-455.

22. Peleg Z, et al. (2005) Genetic diversity for drought resistance in wild emmer wheat and its ecogeographical associations. Plant Cell Environ 28:176-191.

23. Diaz A, Zikhali M, Turner AS, Isaac P, Laurie DA (2012) Copy number variation affecting the Photoperiod-B1 and Vernalization-A1 genes is associated with altered flowering time in wheat (Triticum aestivum). PLoS One 7:e33234.

24. Iwaki K, Haruna S, Niwa T, Kato K (2001) Adaptation and ecological differentiation in wheat with special reference to geographical variation of growth habit and Vrn genotype.

Plant Breeding 120:107-114.

25. Iwaki K, Nakagawa K, Kuno H, Kato K (2000) Ecogeographical differentiation in east Asian wheat, revealed from the geographical variation of growth habit and Vrn genotype.

Euphytica 111:137-143.

26. Yan L, et al. (2004) Allelic variation at the VRN-1 promoter region in polyploid wheat.

Theor Appl Genet 109:1677-1686.

(34)

27. Chu CG, et al. (2011) A novel retrotransposon inserted in the dominant Vrn-B1 allele confers spring growth habit in tetraploid wheat (Triticum turgidum L.). G3-Genes Genom Genet 1:637-645.

28. Yan L, et al. (2006) The wheat and barley vernalization gene VRN3 is an orthologue of FT. Proc Natl Acad Sci USA 103:19581-19586.

29. Fu D, et al. (2005) Large deletions within the first intron in VRN-1 are associated with spring growth habit in barley and wheat. Mol Genet Genomics 273:54-65.