Articles https://doi.org/10.1038/s41587-021-01058-4 Population genomic analysis of Aegilops tauschii identifies targets for bread wheat improvement Kumar Gaurav1,39, Sanu Arora   1,39, Paula Silva2,3,39, Javier Sánchez-Martín4,39, Richard Horsnell   5,39, Liangliang Gao   2, Gurcharn S. Brar   6,7, Victoria Widrig4, W. John Raupp2, Narinder Singh   2,36, Shuangye Wu2, Sandip M. Kale   8, Catherine Chinoy   1, Paul Nicholson   1, Jesús Quiroz-Chávez   1, James Simmonds   1, Sadiye Hayta   1, Mark A. Smedley   1, Wendy Harwood   1, Suzannah Pearce1, David Gilbert   1, Ngonidzashe Kangara1, Catherine Gardener1, Macarena Forner-Martínez1, Jiaqian Liu1,9, Guotai Yu1,37, Scott A. Boden1,10, Attilio Pascucci   1,11, Sreya Ghosh   1, Amber N. Hafeez   1, Tom O’Hara   1, Joshua Waites   1, Jitender Cheema1, Burkhard Steuernagel   1, Mehran Patpour   12, Annemarie Fejer Justesen   12, Shuyu Liu   13, Jackie C. Rudd   13, Raz Avni   14, Amir Sharon   14, Barbara Steiner   15, Rizky Pasthika Kirana   15,16, Hermann Buerstmayr   15, Ali A. Mehrabi   17, Firuza Y. Nasyrova18, Noam Chayut   19, Oadi Matny   20, Brian J. Steffenson   20, Nitika Sandhu   21, Parveen Chhuneja   21, Evans Lagudah   22, Ahmed F. Elkot23, Simon Tyrrell   24, Xingdong Bian   24, Robert P. Davey24, Martin Simonsen25, Leif Schauser25, Vijay K. Tiwari26, H. Randy Kutcher6, Pierre Hucl6, Aili Li27, Deng-Cai Liu   28, Long Mao   27, Steven Xu   29, Gina Brown-Guedira   30, Justin Faris   29, Jan Dvorak   31, Ming-Cheng Luo   31, Ksenia Krasileva   32, Thomas Lux   33, Susanne Artmeier   33, Klaus F. X. Mayer   33,34, Cristobal Uauy   1, Martin Mascher   8,35, Alison R. Bentley   5,38 ✉, Beat Keller   4 ✉, Jesse Poland   2,37 ✉ and Brande B. H. Wulff   1,37 ✉ Aegilops tauschii, the diploid wild progenitor of the D subgenome of bread wheat, is a reservoir of genetic diversity for improv- ing bread wheat performance and environmental resilience. Here we sequenced 242 Ae. tauschii accessions and compared them to the wheat D subgenome to characterize genomic diversity. We found that a rare lineage of Ae. tauschii geographically restricted to present-day Georgia contributed to the wheat D subgenome in the independent hybridizations that gave rise to modern bread wheat. Through k-mer-based association mapping, we identified discrete genomic regions with candidate genes for disease and pest resistance and demonstrated their functional transfer into wheat by transgenesis and wide crossing, including the generation of a library of hexaploids incorporating diverse Ae. tauschii genomes. Exploiting the genomic diversity of the Ae. tauschii ancestral diploid genome permits rapid trait discovery and functional genetic validation in a hexaploid back- ground amenable to breeding. The success of bread wheat (Triticum aestivum) as a major and a presumed extinct diploid (BB) species formed tetraploid worldwide crop is underpinned by its adaptability to diverse emmer wheat, T. turgidum (AABB), ~0.5 million years ago4. The environments, high grain yield and nutritional content1. With gradual process of domestication of T. turgidum started with its cul- the combined challenge of population expansion and hotter, less tivation in the Fertile Crescent some 10,000 years ago5. Subsequent favorable climates, wheat yields must be sustainably increased to hybridization with Ae. tauschii (DD) formed the hexaploid T. aes- ensure global food security. The rich reservoir of genetic diversity tivum (AABBDD)6. Whereas ancient gene flow incorporated the amongst the wild relatives of wheat provides a means to improve majority of the AABB genome diversity into hexaploid wheat, only productivity1,2. Maximizing the genetic potential of wheat requires a small fraction of the D genome diversity was captured7. Indeed, a deep understanding of the structure and function of its genome, hybridization between T. turgidum and Ae. tauschii was thought including its relationship with its wild progenitor species. to be restricted to a subpopulation of Ae. tauschii from the shores The evolution of bread wheat from its wild relatives is typically of the Caspian Sea in present-day Iran8. Despite sampling limited depicted as two sequential interspecific hybridization and genome diversity, this genomic innovation created a plant more widely duplication events leading to the genesis of the allohexaploid bread adapted to a broad range of environments and with end-use quali- wheat genome2,3. The first hybridization between T. urartu (AA) ties not found in its progenitors1. A full list of affiliations appears at the end of the paper. 422 NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles a L3 KAZ N Wheat GEO UZB KGZ TUR ARM L1 AZE TKM Fertile C TJK re CHN L2 s SYR cent L3 IRQ IRN AFG PAK b c d L1 L2 L3 R 1 K = 2 K = 3 0 1 K = 4 0 K = 5 1 K = 6 0 1 2 3 4 5 6 7 e f g CDC Stanley A B ~1 Ma 8 PI190962 (Spelt wheat) Norin 61 4 2 SY Mattis 13 Mace D D CDC Landmark 33 23 LongReach Lancer 17 Julius Jagger Chinese Spring ArinaLrFor 0 100 200 300 400 500 ABD Present Chromosome 1D genomic position (Mb) Fig. 1 | Characterization of a third lineage of Ae. tauschii and its contribution to the wheat D subgenome. The color code for all panels is shown for wheat and Ae. tauschii lineages (L1, L2, L3) in the top left corner. a, Distribution of the 242 Ae. tauschii samples used in this study. The five L3 accessions are indicated by an orange vertical arrow. country abbreviations are provided in Extended Data Fig. 1a. b, Phylogeny showing the D subgenome of 28 wheat landraces in relation to Ae. tauschii, a tetraploid (AABB genome) outgroup (O) and an Ae. tauschii rIL (labeled r) derived from L1 and L2. c, STrUcTUrE analysis of the randomly selected ten accessions from each of L1 and L2 along with the five accessions of L3 and the rIL. K denotes the number of subpopulations considered. d, Genome-wide fixation index (FST) estimates of the Ae. tauschii lineages. e, Venn diagram showing the percentage of lineage-specific and shared k-mers between the lineages. f,g, chromosome 1D of wheat cultivars/accessions colored according to their Ae. tauschii lineage-specific origin (f). The pattern of lineage-specific contribution to the wheat D subgenome, highlighted for one region by a dashed rectangle, suggests that at least two polyploidization events with distinct Ae. tauschii lineages, as shown in g, followed by intraspecific crossing gave rise to extant hexaploid bread wheat. Ma, million years ago. The low genetic diversity of the bread wheat D subgenome has In this study, we performed whole-genome shotgun short-read long motivated breeders to recruit diversity from Ae. tauschii. sequencing on a diverse panel of 242 Ae. tauschii accessions. We dis- The most common route involves hybridization between tetra- covered that an uncharacterized Ae. tauschii lineage contributed to ploid wheat and Ae. tauschii followed by chromosome doubling the initial gene flow into domesticated wheat, thus broadening our to create synthetic hexaploids9. Alternatively, direct hybridiza- understanding of the evolution of bread wheat. To facilitate the dis- tion between hexaploid wheat and Ae. tauschii is possible. This covery of useful genetic variation from Ae. tauschii, we established approach usually requires embryo rescue but has the advantage a k-mer-based association mapping pipeline and demonstrated the that it does not disrupt desirable allele combinations in the bread mobilization of the untapped diversity from Ae. tauschii into wheat wheat A and B subgenomes10,11. Notwithstanding, the products of through the use of synthetic wheats and genetic transformation for all these wide crosses require backcrossing to domesticated culti- biotic stress resistance genes. vars to remove unwanted agronomic traits from the wild progeni- tor and restore optimal end-use qualities. The boost to genetic Results diversity and resilience therefore comes at a cost to the breeder9. Multiple hybridizations shaped the bread wheat D subgenome. However, if haplotypes underlying useful traits could be directly We identified a set of 242 non-redundant Ae. tauschii accessions identified in Ae. tauschii, this would mitigate a critical limita- with minor residual heterogeneity after short-read sequencing tion in breeding wheat with Ae. tauschii; such haplotypes can of 306 accessions covering the geographical range spanned by be tagged with molecular markers for accelerated delivery into diverse Ae. tauschii collections (Fig. 1a, Extended Data Fig. 1a–d, domesticated wheat by combining marker-assisted selection12 Supplementary Tables 1–5 and Supplementary Note). To capture with rapid generation advancement13. Furthermore, a gene-level the genetic diversity of the Ae. tauschii species complex, we gen- understanding would permit next-generation breeding by gene erated a k-mer matrix specifying the presence and absence of a editing and transformation. comprehensive set of 51-mer variants in the sequenced accessions NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 423 R O FST FST FST L1 vs L2 L1 vs L3 L2 vs L3 Articles NaTurE BioTEcHNoloGy a b 15 SrTA1662 12 FT1 0–10 12 11–20 10 21–30 31–40 41–50 9 8 >200 6 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1D: 11.45–11.50 Mb 7D: 64.26–69.72 Mb SrTA1662 FLOWERING LOCUS T1 Fig. 2 | Genetic identification of candidate genes for stem rust resistance and flowering time by k-mer-based association mapping. a, k-mers significantly associated with resistance to Puccinia graminis f. sp. tritici race QThJc mapped to scaffolds of a de novo assembly of Ae. tauschii accession TOWWc0112 anchored to chromosomes 1 to 7 of the D subgenome of chinese Spring51. Points on the y axis show k-mers significantly associated with resistance (blue) and susceptibility (red). b, k-mers significantly associated with flowering time mapped to Ae. tauschii reference genome AL8/78 with early (red) or late (blue) flowering time association relative to the population mean across the diversity panel. candidate genes for both phenotypes are highlighted. Point size is proportional to the number of k-mers (see inset). The association score is defined as the –log10 of the P value obtained using the likelihood ratio test for nested models. The threshold of significant association scores is adjusted for multiple comparisons using the Bonferroni method. and a single-nucleotide polymorphism (SNP) matrix relative to the separated the population into three clusters corresponding to L1, AL8/78 reference genome14. L2 and L3 (Extended Data Fig. 1f). Computing the genome-wide Ae. tauschii is generally categorized into two lineages, lineage 1 pairwise fixation index (FST) between the three lineages using SNPs (L1) and lineage 2 (L2)15,16, with L2 considered the major contribu- in a sliding window of 1 megabases (Mb) with a step size of 100 kilo- tor to the wheat D subgenome8. To better understand the relation- bases (kb) indicated a high level of population differentiation across ship between Ae. tauschii and wheat, we randomly selected 100,000 the genome, with values near 1.0 in the centromeric regions and k-mers and checked their presence in the short-read sequences of around 0.3–0.5 near the telomeric ends (Fig. 1d). These observa- 28 hexaploid wheat landraces17. We used a tetraploid wheat acces- tions demonstrate the existence of a differentiated third lineage sion as an outgroup in the phylogenetic analysis and included a within Ae. tauschii. recent Ae. tauschii L1–L2 recombinant inbred line (RIL)15 as a con- Consistent with the above population structure, we found that trol in our population structure analysis. We generated a phylog- 64% of the Ae. tauschii k-mer space, obtained by summing up the eny based on the presence/absence of these k-mers and found it to percentages in the non-overlapping sections of the Venn diagram be consistent with earlier phylogenies generated using molecular (Fig. 1e), is lineage specific. We used the lineage-specific k-mers to markers in that Ae. tauschii L1 and L2 formed two major clades, understand the origin of the wheat D subgenome by representing whereas the wheat D subgenome formed a discrete and narrow the D subgenomes of the available chromosome-scale wheat assem- clade most closely related to L2 (Fig. 1b)8,15,16. This supports the L2 blies21 as 100-kb segments and assigning them to the Ae. tauschii origin of the wheat D subgenome and its limited genetic diversity lineage predominantly contributing lineage-specific k-mers to that relative to Ae. tauschii. A group of five accessions formed a distinct segment (Extended Data Fig. 2). To account for recent alien intro- clade separate from L1 and L2, as previously observed15,16, which gressions in modern cultivars due to breeding, only those k-mers seems to be a basal lineage based on the split from the outgroup. that were also present in the 28 hexaploid wheat landraces17 were Matsuoka et al. hypothesized that this group could be a separate used. The differential presence of L2 and L3 segments at multiple lineage18, whereas Singh et al. hypothesized that it could have arisen independent regions in these wheat lines (shown for chromosome from interlineage hybridization followed by isolated evolution15. 1D in Fig. 1f and chromosomes 2D–7D in Extended Data Fig. 3) To resolve this question, we conducted Bayesian clustering analy- suggests that at least two hybridization events gave rise to the extant sis using STRUCTURE19. Because this algorithm does not reliably wheat D subgenome (Fig. 1g) and that one of the D genome donors recover the correct population structure when sampling is uneven20, was of predominantly L2 origin, while the other was of predomi- we randomly selected ten accessions from L1 and from L2 for this nantly L3 origin. The total L3 contribution across all the seven chro- analysis along with the five accessions of the putative lineage 3 (L3) mosomes ranges from 0.5% for Spelt, T. aestivum spp. spelta, to 1.9% and the control L1–L2 RIL (Supplementary Table 6). Performing for T. aestivum ssp. aestivum ArinaLrFor, with an average of 1.1% for STRUCTURE analysis with the number of subpopulations, K = 2 all the 11 reference genomes (Extended Data Fig. 3). showed the putative L3 accessions as an admixture of L1 and L2, similar to the L1–L2 RIL; but with K = 3, these accessions were Discovery of Ae. tauschii trait–genotype correlations. assigned to a distinct lineage (Fig. 1c). Further increasing the value Identification of genes or haplotypes in Ae. tauschii underpinning of K did not reveal any discernible substructure. This interpretation useful variation would permit accelerated wheat improvement was supported by the ΔK curve, which showed a clear peak at K = 3 through wide crossing and marker-assisted selection or biotechno- (Extended Data Fig. 1e). Principal-component analysis (PCA) also logical approaches to introduce them into wheat. To identify this 424 NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology –log10P NaTurE BioTEcHNoloGy Articles a Trichome number Spikelet number Powdery mildew Wheat curl mite b 60 60 60 80 L1 L2 40 60 40 40 40 20 20 20 20 0 0 0 0 Low High Low High Resistant Susceptible Resistant Susceptible c 12 15 24 18 10 12 18 14 9 8 12 10 6 6 6 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 d 4D: 510.28–510.81 Mb 1D: 304.02–304.12 Mb 7D: 4.76–5.08 Mb 6D: 2.14–2.58 Mb α/β-hydrolase Trehalose NLR phosphate phosphatase WTK Fig. 3 | Genome-wide association mapping in Ae. tauschii for morphology, disease and pest resistance traits. a, representation of the scale of phenotypic variation observed. b, Frequency distribution of the different phenotypic scales corresponding to a. L1 and L2 are shown in dark and light gray, respectively. c, k-mer–based association mapping to a de novo assembly of accession TOWWc0112 anchored to the AL8/78 reference genome (trichome number, spikelet number) or accession TOWWc0106 anchored to AL8/78 (response to powdery mildew) or directly mapped to AL8/78 (response to wheat curl mite). k-mer color coding, association score, threshold and dot size are as in Fig. 2. d, Identification of genes under the peak in the GWAS plot with promising candidate(s) indicated. The WTK gene resides within a 60-kb insertion relative to the AL8/78 reference genome. variation, we adapted our k-mer-based association mapping pipe- assembly (Extended Data Fig. 4d), most of the scaffolds with the line, previously developed for resistance gene families obtained significant k-mers tend to concentrate around the true locus. We using sequence capture22, to whole-genome shotgun data (Extended also determined that the sequencing coverage could be reduced Data Fig. 4a). The significantly associated k-mers were not just from tenfold to fivefold with no appreciable loss of signal from the directly mapped to the Ae. tauschii AL8/78 reference genome but two control genes (Extended Data Fig. 5). To test our method fur- were also mapped to the de novo assembly of a relevant accession, ther, we performed association mapping for resistance to additional which was anchored to the reference genome. In theory, using a stem rust isolates and flowering time. For stem rust, we identified set of de novo assemblies (either reference or anchored to a refer- a peak within the genetic linkage group of SrTA1662 (ref. 23) (Fig. ence) covering the species diversity in this manner would enable us 2a, Extended Data Fig. 6a and Supplementary Table 8). Annotation to determine the genomic context of all the significant k-mers. To of the associated 50-kb linkage disequilibrium (LD) block revealed demonstrate the advantage of this approach, we generated a de novo two genes, of which one encoded the nucleotide-binding and assembly of accession TOWWC0112 (N50 = 196 kb; Supplementary leucine-rich repeat (NLR) gene previously identified in our sequence Table 7), which carries two cloned stem rust resistance genes that capture association pipeline22 (Fig. 2a, Supplementary Tables 9 and could be used as controls, and then anchored this assembly to a ref- 10 and Supplementary Note). We also recorded flowering time and erence genome14. This enabled identification of the cis-associated found that it mapped to a broad peak of 5.46 Mb on chromosome k-mers rather than those linked in repulsion to the corresponding arm 7DS containing 35 genes, including FLOWERING LOCUS T1 region in the reference genome (Extended Data Fig. 4c,d). Note the (Fig. 2b, Extended Data Fig. 6b,c and Supplementary Tables 8 and improvement in the association signal with the improvement in the 10), a well-known regulator of flowering time in dicots and mono- quality of de novo assembly; when the quality is poor (Extended cots, including wheat24–26. Data Fig. 4c), some of the short scaffolds with the significant k-mers We next screened the Ae. tauschii panel for leaf trichomes (a are anchored outside the true locus, but with the improved de novo biotic and abiotic resilience trait27,28), spikelet number per spike NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 425 –log10P No. of accessions Articles NaTurE BioTEcHNoloGy a Genome-wide TO T W O W W C TO W 0 W C 070 T W 0 O 0 W C 7 0 1 TO W 07 W C 2 TO W 01 W C 1 0 0 T W 0 O 2 W C0 0 T W 1 OW C 12 0 W 0 TO C 74 W 010 T W 3 O C W 0 T W 06 O C 1 W 0 W 02 C 6 TOW 0031 T W O C W 0 W 11 TO C 1 W 01 W 83 C0205 Cmc4 C0052 TOWW T C O 0 W 1 W 80 TOWW T C O 0 W 06 W 4 WC0216 TOW C0217 T C O 0 W 11 W 7 T C O 00 TOWW W 6 C0222 W 6 C T 0 O 1 TOWW C0215 W 21 WC TOWW WC0203 T 0 O 11 W 3 WC TOW 0 T 1 O 2 35 WTK W 4 WC0 T 0 TOWWC00 O 63 WWC00 TOWWC0278 L1 C0221 T 6 O 7 WWC0 TOWW 135 TOWWC0206 TOWWC0040 TOWWC0 TOWWC0218 016 5 TOWWC02 2 TOWWC0022 0220 TOWWC TOWWC 0107 TOWWC TOWWC0219 Trehalose 0050 TOWWC0256 TOW L2 WC0122 0 TOWW TOWWC02 4 C0088 TA10113 TOWWC0185 WS0498 TOWWC0171 TOWWC0232 TOWWC0140 TOWWC0270 TOWWC0137 TOWWC0269 Hydrolase TOWWC0169 TOWWC0266 L3 TOWWC0172 TOWWC0250 TOWWC0173 TOWWC0198 TOWWC0134 TOWWC0247 TOWWC0083 TOWWC0197 TOWWC0034 TOWWC0265 TOWWC0027 TOWWC0267 SrTA1662 TOWWC0149 TOWWC0268 TOWWC0148 Wheat TOWW TOWWC0116 C0015 WS0494 T TOWWC0087 OWW TOWWC0179 C0210 WS0476 0176 TOWWC TOWW WC0177 C02 W 11 TOW S0486 Sr45 TOWWC0127 TOWWC0 C T 2 O 2 W 4 TOWW 0178 WC0196 TOWWC0191 TOW TOWWC0051 WC T 0 O 3 W 03 33 WC0 C01 TO 2 W 57 TOWW TOWWC0130 W T C O 0 W 037 C0100 W T C O 0 W 2 WW 5 W 9 TO C0165 TO C0 TOWW W 2 W 60 Sr46 WWC0162 TO C W 02 W 58 TO TO C W 02 W 61 TO C W 02 T W 62 O C W 0 W 26 T C 4 OW 0 W 2 T 6 O C 3 W 0 T W 19 O 5 W C0 T 2 O W 71 W C W 0 T 2 O C 9 W 5 0 T 2 O W 3 W C 6 TO W 02 C 49 T W 0 O W 2 C 07 T W 0 O W 1 W C 02 W 02 C 9 0 4 293 b Sr46 haplotype c Cmc4 haplotype TOW W C K 0 S T W 2 O S 74 0 W 0 T W 4 5 O 86 K W C02 S 0 T W 7 O C 3 0 4 W 0 6 4 T W 22 1 4 O C 0 4 K W 0 W 03 7 - T C 5 O 0 5 W 0 W 22 T 7 T M O C W 02 A - W 5 M 4 TO C 0 W 02 2 W 08 T C 0 O C0 W 2 la 5 W 70 C r T 0 a OW 26 W 8 J C C TO 0 a 2 W 0 g L W 7 ale T C O 0 W 23 C0180 n W 4 e C02 TOWW 37 WC0111 TOWW TOW C0224 C0183 T Full T O e OW W W r WC TOW W 0 T 2 O 3 W 3 C WC0074 0 W TOW C0 T 1 T 2 O 64 O 7 WW W 2 WC0064 C0 TOW C0106 2 W C0061 T 2 C O 8 W W 0 WC TOW 029 T 1 T 4 C O O 69 WW W TOWW C TOWW 0020 02 W 00 25 0135 C TO 0 WW TOWWC 1 C 7 023 1 5 193 TOWWC01 TO T W O WC TOWWC0 W 0293 0044 WC TO TOWWC 0 0 W 0 WC 8 02 3 23 TOWWC 252 TOWWC0046 TOWW T C0 O 037 W 7 WC TOWWC004 0 TO 1 WW 3 C0 7 261 TOWWC0142 TOW TOWWC0120 WC0260 O TOWWC0059 verl T e OW y TOWWC0096 WC0254 TOWWC0119 TOWWC0200 SYMo TOWWC0153 nument TOWWC0153 TOWWC0292 TOWWC0058 TOWWC0277 TOWWC0187 KS060476M-6 TOWWC0143 TOWWC0134 TOWWC0060 TOWWC0288 TOWWC0069 KanMark TOWWC0187 TOWWC0276 TOWWC0075 TOWWC0265 TOWWC0073 HotRod TOWWC0186 TOWWC0275 TOWWC0027 TOWWC0218 TOWWC0141 Jagger TOWWC0182 TOWWC0240 TOWWC0002 TOWWC0201 TOWWC0089 Larry TOWWC012 TOWWC0302 TO 0 WWC0116 TOWWC0199 TOWWC0152 Aspen TOWWC012 TOWWC0282 TOWW 3 C013 0124 8 TOW TOWWC TOWWC01 0 90 TAM107 WC O 012 TOWWC0 63 T WWC0 2 2 00 T 5 O TOWWC009 WWC0 T 14 O 3 W Ruby-Lee TOWWC TOWWC0043 T 0 WC0079 O 11 W 2 C0291 TOW TOWW WC W 01 C C0252 TO 0 W 3 W TOWW C T 0 O 1 0279 T 0 O 0 W 96 W TOWWC W 10 C0 W TO 16 WS0478 W 5 W T C C O 0 0114 W 15 W 2 TOWWC0290 TOWW T C O 0 W 04 T W 2 O C0 W 2 T C O 0 W 01 W 2 W 59 T C O 0 T W 02 O C W 2 TO C W 02 W 01 W 07 T T W 60 O C W 01 O C T W 66 W 0 O C W 0 W 08 T W 25 T C 0 O O W 0 C 8 W 0 T 6 W 0 O C 7 W 01 W 1 W W 31 S C0 C 67 T 04 0 O 8 23 0 W 2 2 TO W 9 W C0 5 W 1 C 0 0 6 242 Fig. 4 | Comparison of genome-wide phylogeny with phylogenies of haplotypes surrounding specific genes. a, Genome-wide k-mer-based phylogeny of Ae. tauschii and hexaploid wheat landraces with designation of the presence of candidate and cloned genes/alleles for disease and pest resistance and morphological traits. The presence and absence of allele-specific polymorphisms is indicated by circles filled with black or white, respectively, for all but outgroup and rIL (gray edges). b, Phylogeny of Ae. tauschii L1 and L2 accessions based on SNPs restricted to the 200-kb region surrounding Sr46. c, Phylogeny based on SNPs of the 440-kb region in LD with Cmc4. Only the most resistant and susceptible Ae. tauschii accessions were included, along with resistant and susceptible modern elite wheat cultivars (different from the landraces shown in a). 426 NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 62 C02 W 17 C02 TOW W 21 5 38 TOW C0 W 22 W C01 TO C02 W W TOW 281 C0 W TOW 226 WWC0 TO C0269 TOW C011 3 W TOW 0162 WW 209 TO WC0 TOW 96 WC02 WWC 27 TO TOW 43 WC02 WC01 TOW TOWWC0238 TOW TOWWC0257 TOWWC0135 WWC0121 TO 111 TAM TOWWC0133 C0148 TOWW 0149 009 WWC0 WC TO TOW TOWWC0117 66 WC00 TOW 231 TOWWC0266 WWC0182 TO TOWWC0123 C0 W 4 0 007 TOW C029 TOWWC0104 W 69 WWC 017 C0 TO W 6 TOW C002 TOW C00 W W TOW C01 22 W W 03 1 C0 TO TOW W 48 C01 TOW W TOW 23 C00 8 W C00 4 TOW 11 4 W W TO C0 W 42 C00 TOW W 12 W TO C00 W 6 W TO C010 W TOW 0156 TOWWC0157 TOWWC TOWWC0131 166 WC0 TOW TOWWC0080 0163 WC TOW WC0164 TOW C0277 TOWW C0292 TOWW 482 WS0 0170 WWC TO C0242 TOWW 0199 WC TOW TOWWC0281 79 C02 W 80 C02 OW T 97 W W C02 TO 6 W W C02 9 TO 3 W C02 8 TOW 4 W W C02 8 TO W TOW T 0 OW W C 02 5 TO W 5 W C014 TO W W C 02 6 T C0 1 WO 2 49 W 1 0 W 56 T OW TO CW 01W 0C 7 TO WWC 0144 TO WWC0 12 9 TOW 04W T C O 0 M 2 W 07 T A W 5C024 T AM 115 TOW 4WC0 TAM 112 T 2 O 8 W 2 WC028 TOW WC 009 9 8 TOW WC0 051TOWWC0267 TOW TA16 18 WC0293 0242 TOWWC TOWW C 0276 TOWW KS96WG RC40 C0224 LS902 Radiant AAC-Elevate TOWWC0270 1863 TOWWC0262 2145 TOWWC Heyne 0303 TOWW est C0 Ever 257 TA2394 Gall aghe r TOWWC T 0 O 1 Kar l92 2 W 4 309 23K -5 WT C O 00 W 4 8 4 TOW W C02 2 C W 26 KS 0 WBR edh aw k W T CT O 0O W 1 W 0 C 02 282 TOW 37 0 W 3 TO WC W 2 W C0 ter T 0 TO W 27 Du s TO C O W 24 W 4 W 0 W C C023 W TO W 1 T 0 O 24 W 4 C02 14 W 1 W 7 C TO W TO CW W M1102 C 8 25 C020 7 W W 0 ar 1 29 T O TA T W 3ed O C 4 TO W 8 W W S0 OW 4 7 T W 7 T O C 8 48 WWC02 48 BC W 0W 1W 0 C 4 TOW TO 01 W 5C0 7241 2 4 K S09 00 49 K-8 T WW WS0018 C 40201 7 3 TO WWC0 22 3 OW W TOW C 02 C T W O CW 0W 28 C 9028 TO WWC0 27 2 TO WW 0131 WWC0 7 TOWWC 8 TO 02 12 TOW 02 WWC W 90 C0 TO291 TO TO WWC0 275TOWWC0 W 2W 13 C02 020 8 TO WWC02 76 TO 99 WW C WW TO 9 T C OW W C02 2 024 W 6 C TO W 023 1 T 0O 2W 3 W 9 WC TOWC T 0O 3W 02 C 023 5 TOW W WC0 2 4 WC 0 3T 2O 0W 9 W TOWC0 33 T 3 O 0 C02 W 1 WC0 TOW W SRR 7478 305 T 2 O 02 WWC0238 SRR 7478 284 TOWWC03 8 T 0 O 0 WWC SRR7 47 2 96 0200 TOWW SR R7478 259 C0251 SRR7 47830 2 TOWWC0254 SRR74 78317 TOWWC0240 SRR747 8283 RIL 60 SRR7478 2 TOWWC0029 SRR74783 08 TA2576 RR7478307S TA10928 SRR7478266 TA10103 SRR7478301 TA2582 SRR7478303 O SRR7478262 SRR7478249 SRR7478306 SRR7478299 SRR7478300 SRR7478293 SRR7478292 SRR7478248 SRR747 1 8246 TOWWC 0 86 SRR7478313 TOWW C0187 SRR 058 7478267 TOWW C0 SRR747830 S 4 TOWW C0060 RR 9 7478285 S TOW WC00 5 RR747 C009 2 82 S 8 TOW W R 2 R7478277 TOW WC0 120 TOW TOW WC0 119 WC T 00 O 9W 9 WC C01 04 0155 TOW W TO C01 93 WWC T 0O 1 TOW W W 52 WC 004 5 W T C O 015 TOW 3 C00 09 WW T C O 0W 19 W 0 TOW W C0143 TO WW C00 44 TO WWC00 47 WW T CO 0W 1 TO WWC0 046 W 44 TO TO C0141 C0 123 W T W O C W W 00 W 56 T 7 0 O C W C01 TO WWC0 013TO W 9 TO TO WWC0 115 W 0 TO W W 09 13 W C8 0 C 0 12 T OW W W W 5 1 S C T W 0 0 O C0 7 17 T OW C00 W T W WO 8W C TO W 01 C01 29 25 08 T 46 O 1 WC0 1 9 5 W T W W O C 7 W 0 9 W 17 T OW C01 5 WC TO W 00 78 T 009 WWC O W 40 5 TO 00 79 W T W O C 6 W W 01 C00 TO 7 TO T WC 7OW 0W 16 TO WW WWC 200 TO C 15 9 W 01 T OWWC00 50 WC C0 T 1 O 42 W TO W 6 W 64 W C0 1 C01 T OW C0 15 6 TO WWC00 77 TO W 82 C01 W C0 W W 1C 29 TO W C00 17 TO T WW 63 WW TO C W 00 TO 00 72 OW W 71 T C0 T OW WC W W W 04 3 T 0 O 00 C0 TO 5W 07 T W C0 5 O 0 O W 0 W 83 W C0 125 0 W 2 C0 T OW T 14 WC0 016 3 W C TO W C0 W 07 W 0 O 1 W 2W TO W S T 04 W OW 6 C0 113 W C0 06 9 C 60 7 TO W 0 W 1 C 1 C TO WW 0 TO 00 0 W 95 W C00 10 TO W C0 1 W TO W 1 W 0 C0 TO W TO 05 W 6WC0 TOW WC 011 2 T 1O 3W 3 WC0 TOW WC 007 8 1 T 0O 0 W C01 44 WWC T OW 0 TOW WC0 1571 T 6 O 9 WWC017 TOWW TO WW C008 7 1 C0034 00 48 TOWW TOW WC C0140 01 67 TOW TO WWC WC0137 TOW TOWW C0011 WC0088 0110 TO TOWW C WWC0185 C TOWW 0130 TOWWC0186 TOWWC 0191 TOWWC0013 TOWWC0 248 TOWWC0045 TOWWC021 4 TOWWC0149 TOWWC0015 TOWWC0172 TOWWC0236 TOWWC0173 WS0487 WS0476 TOWWC0278 TOWWC0256 TOWWC0219 TOWWC0239 TOWWC0301 TOWWC0 TOWWC 0206 299 TOWWC0246 TOWW C0204 TOWWC03 C0221 00 TOWW TOW TOWW C0203 WC0303 TOWWC0258 WC01 21 TOW TOWWC0 C020 2 196 TOW W TOWWC02 T 59 O TOW WC0 251 WWC02 T 4 O 7 TOW WC0 077 WWC0 T 1 O 97 W TOW WC 021 3 WC0 C02 89 W T 2O 67 TOW W TOW WC 028 7 WC T 0 O 19 W 8WC W 0S 22 0 94 TO WW C01 84 029 8 9 T 8 OW WWC W TO T C O 0232 TO WWC0 297 TA1011 TO WWC0 241 WWC0212 3 TO WWC0 24 4 TOWWC02 WC0 28 0 TOWW 6C 3 TO W T C0 29 5 OW 02 WW WC TO T C02 66 31 OW 021 WW T W 1O C TO C02 10 W 02 WW 49 4 S0 T W 7O C 1 TO W 0W W 10 02 05 TO C 2 C W 0 W T W 2 O 83 TO W C00 52 C0 W 28 W 2 72 W TO W 4 C0 W C TO 49 W 0 W 1 C02 C 9 TO W 50 W 21 TO W 6 NaTurE BioTEcHNoloGy Articles L1 L2 L3 70% 25% Landraces Synthetics Ae. tauschii 1 2 Polyploidization Enrichment by bottleneck synthetics Fig. 5 | Restricted gene flow from Ae. tauschii to wheat and the capture of Ae. tauschii diversity in a panel of synthetic hexaploid wheats. Genetic diversity private to Ae. tauschii L1, L2 and L3 is color coded blue, red and orange, respectively, whereas black dots represent k-mer sequences (51-mers) common to more than one lineage. The number of dots is proportional to the number of k-mers. The polyploidization bottleneck (1) incorporated 25% of the variant k-mers found in Ae. tauschii into wheat landraces. The addition of 32 synthetic hexaploid wheats (2) restored this to 70%. (a yield component), infection by Blumeria graminis f. sp. tritici The interval contained ten genes, including an NLR immune (cause of powdery mildew) and resistance to the wheat curl receptor, a gene class previously reported to confer arthropod mite Aceria tosichella (vector of wheat streak mosaic virus)29 resistance in melon and tomato41. These results highlight the abil- (Supplementary Table 8). All four phenotypes presented con- ity of the panel, with its rapid LD decay (Extended Data Fig. 8) and tinuous variation in the panel (Fig. 3a,b and Extended Data k-mer-based association mapping combined with de novo genome Fig. 7a). Mean trichome number along the leaf margin mapped assembly and annotation, to identify candidate genes, including to a 530-kb LD block on chromosome arm 4DL (Fig. 3c,d and those in insertions with respect to the reference genome, within Supplementary Table 10) within a 12.5-cM region previously discrete genomic regions for quantitative traits of agronomic value. defined by biparental linkage mapping30. The 530-kb interval contains seven genes, including an α/β-hydrolase, a gene class L1 and L2 share regions of low genetic divergence. We inves- with increased transcript abundance in developing trichomes tigated the population-wide distribution of the candidate genes of Arabidopsis thaliana31. The number of spikelets per spike was controlling disease resistance and morphology identified by asso- associated with a discrete 100-kb peak on chromosome arm 1DL ciation mapping (Figs. 2 and 3) across a genome-wide phylogeny containing six genes (Fig. 3c,d and Supplementary Table 10). of Ae. tauschii and a worldwide collection of 28 wheat landraces17. One of these encodes a trehalose-6-phosphate phosphatase that The absence of the alleles promoting disease resistance, more is homologous to RAMOSA3 and TPP4, known to control inflo- spikelets and higher trichome density in the wheat landraces for rescence branch number in maize32, and SISTER OF RAMOSA3 the new candidate genes suggest that they were not incorporated that influences spikelet fertility in barley33. Powdery mildew resis- into the initial gene flow into wheat (Fig. 4a). We next examined tance mapped to a 320-kb LD block on chromosome arm 7DS the distribution of these alleles between the three lineages of Ae. containing 19 genes in the resistant haplotype, including a ~60-kb tauschii. The Cmc4 gene candidate for resistance to wheat curl insertion with respect to the reference genome AL8/78 (Fig. 3c,d mite was largely confined to L1, whereas the allele variants pro- and Supplementary Table 10). No NLR immune receptor-encoding moting higher trichome density, spikelet number and resistance to gene was detected; however, the insertion contains a wheat-tandem wheat stem rust and powdery mildew were largely confined to L2 kinase (WTK), a gene class previously reported to confer resis- (Fig. 4a). Exceptions included three occurrences of the Sr46 gene tance to wheat stripe rust (Yr15)34, stem rust (Rpg1 and Sr60)35,36 in L1 and five occurrences of the candidate Cmc4 gene in L2. To and powdery mildew (Pm24)37. Resistance to wheat curl mite investigate whether this was due to a common genetic origin or mapped to a 440-kb LD block on chromosome arm 6DS within convergent evolution, we generated phylogenies based on the SNPs a region previously determined by biparental mapping38–40 within the respective 200-kb and 440-kb Sr46 and Cmc4 LD blocks. (Fig. 3c,d, Supplementary Table 10 and Supplementary Note). This showed that all functional haplotypes clustered together Fig. 6 | Functional transfer of disease and pest resistance from Ae. tauschii into wheat. a, WTK4 gene structure represented by rectangles (exons E1 to E12) joined by lines (introns). Kinase domains are shown in blue and orange. Exons used for designing VIGS target 1 (T1) and target 2 (T2) are shown in brown and red, respectively. Below, schematic of the cross between Ae. tauschii accession Ent-079 (contains WTK4) and T. turgidum durum line hoh-501 (lacks WTK4) that generated the synthetic hexaploid wheat line NIAB-144. Leaf segments from plants subjected to VIGS with empty vector (EV), T1, T2 or non-virus control (Φ) and super-infected with B. graminis f. sp. tritici isolate Bgt96224 avirulent to WTK4. b, Introgression of the Cmc4 locus from Ae. tauschii accession TA1618 into wheat. The 440-kb Cmc4 LD block (black) resides within a 7.9-Mb introgressed segment on chromosome 6D (light brown) in wheat cultivar TAM 115. Below, drawings of wheat curl mite-induced phenotypes. c, Structure of the SrTA1662 candidate gene. The predicted 970-amino acid protein has domains with homology to coiled-coil (cc), nucleotide-binding (NB-Arc) and leucine-rich repeats (Lrr). right, transformation with an SrTA1662 genomic construct into cv. Fielder and response to P. graminis f. sp. tritici isolate UK-01 (avirulent to SrTA1662) of single-copy hemizygous transformants (1, DPrM0059; 2, DPrM0051; 3, DPrM0071) and non-transgenic controls. NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 427 Articles NaTurE BioTEcHNoloGy irrespective of genome-wide lineage assortment, indicative of a After domestication delivery of Ae. tauschii genes into wheat. The common genetic origin and not convergent evolution (Fig. 4b,c, ability to precisely identify Ae. tauschii haplotypes and candidate Supplementary Table 11 and Supplementary Note). genes for target traits provides an opportunity for accelerating their a Powdery mildew resistance E1 E2 E4 E12 500 bp × Resynthesis Ae. tauschii DD (2×) T. turgidum Synthetic wheat AABB (4×) AABBDD (6×) VIGS VIGS ϕ EV T1 T2 ϕ EV T1 T2 b Curl mite resistance Cmc4 (440 kb) Cmc4 × Introgression Ae. tauschii Wheat Introgression line 6D 6D 6D Resistant Susceptible Resistant c Single copy Non-transformed Stem rust resistance E1 E2 E3 Transformation CC NBARC LRR 500 bp 1 2 3 428 NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 7.9 Mb NaTurE BioTEcHNoloGy Articles introduction into cultivated wheat. We selected 32 non-redundant Our population genomic analysis revealed the existence of a third and genetically diverse Ae. tauschii accessions, which capture 70% lineage of Ae. tauschii, L3, which also contributed to the extant of the genetic diversity across all lineages, and crossed them to tet- wheat genome. For example, a glutenin allele required for superior raploid durum wheat (T. turgidum var. durum; AABB) to generate dough quality was recently found to be of L3 origin46. L3 accessions independent synthetic hexaploid wheat lines (Fig. 5, Supplementary are restricted to present-day Georgia and may represent a relict Table 12 and Supplementary Note). From this ‘library’, we selected population from a glacial refugium as observed in Arabidopsis47. four synthetic lines with the powdery mildew WTK candidate resis- We observed genomic signatures specific to L2 and L3 in hexaploid tance gene. These synthetics as well as their respective Ae. tauschii wheat supporting the multiple hybridization hypothesis (Fig. 1g). donors were resistant to powdery mildew, while the durum line The creation of hexaploid bread wheat, while giving rise to a was susceptible (Fig. 6a and Extended Data Fig. 9a). Annotation crop better adapted to a wider range of environments and end uses1, of WTK identified seven alternative transcripts, of which only came at the cost of a pronounced genetic bottleneck7. Our analy- one, accounting for ~80% of the transcripts, leads to a complete sis suggested that only 25% of the genetic diversity of Ae. tauschii 2,160-base pair (bp) 12-exon open reading frame (Fig. 6a, Extended contributed to the initial gene flow into hexaploid wheat (Fig. 5). Data Fig. 9b, Supplementary Tables 13 and 14 and Supplementary To explore this diversity, we performed association mapping and Note). Next, we targeted two exons with very low homology to discovered new gene candidates for disease and pest resistance and other genes for virus-induced gene silencing (VIGS; Supplementary agromorphological traits underpinning abiotic stress tolerance and Note). WTK-containing Ae. tauschii and synthetics inoculated with yield, exemplifying the potential of Ae. tauschii for wheat improve- the WTK-VIGS constructs became susceptible to powdery mildew, ment (Fig. 6). We obtained discrete LD blocks of 50 to 520 kb, whereas empty vector-inoculated plants remained resistant (Fig. with the exception of flowering time, which resulted in a broad 6a and Extended Data Fig. 9a). This supports the conclusion that LD block of 5.5 Mb around the FT1 locus (Figs. 2 and 3). The low WTK, hereafter designated WTK4, is required for powdery mildew degree of historical recombination around FT1 is likely imposed resistance and remains effective in synthetic hexaploids. Thus, these by the reduced probability of intraspecies hybridization between synthetic lines can serve as prebreeding stocks for introduction of populations carrying alleles promoting different flowering times. In the trait into elite wheat. contrast to the discrete mostly submegabase mapping intervals we Developing wheat cultivars improved with traits from Ae. obtained by association mapping with k-mer-based marker satura- tauschii can also be achieved by direct crossing between the dip- tion, conventional biparental mapping studies on the D subgenome loid and hexaploid species10. The wheat curl mite resistance resulted in large intervals with a median of 10 Mb (Supplementary gene Cmc4 was originally transferred by crossing of Ae. tauschii Table 16 and Supplementary Note). accession TA2397 (L1) into wheat42,43 and genetically localized to In polyploid wheat, recessive variants are not readily observed; chromosome 6D in agreement with our association mapping38–40. hence, genetics and genomics in wheat have mostly focused on rare Given the common resistant haplotype of Cmc4 in L1 and L2 (Fig. dominant or semidominant variants48. Reflecting this, of 69 genes 4), we hypothesized that Cmc4 is the same as a gene originating from cloned in polyploid wheat by forward genetics, at least 62 have L2 accession TA1618, which was introgressed at the same locus into dominant or semidominant modes of action (Supplementary Table wheat cv. TAM 112 via a synthetic wheat39,43. Consistent with this 17). This constraint is removed in Ae. tauschii by virtue of being hypothesis, we observed the same haplotype at the wheat curl mite diploid, which along with its rapid LD decay makes it an ideal plat- resistance locus across all derived resistant hexaploid wheat lines form for gene discovery by association mapping. Genes and allelic and in the Ae. tauschii donors of Cmc4 and CmcTAM112 (Fig. 4c). We variants discovered in Ae. tauschii can subsequently be studied in delimited the length of the introgressed Ae. tauschii wheat curl mite wheat by generating transgenics or mutants or by using synthetic fragments by comparing SNP data for resistant wheat lines and the wheats. The first synthetic wheats were created in the middle of corresponding Ae. tauschii donors. The TA2397 (L1) introgres- the last century by E. Sears and E. McFadden49, and since the late sion spanned 41.5 Mb, whereas the TA1618 (L2) introgression was 1980s, synthetic wheats have been used extensively in breeding, reduced to 7.9 Mb in wheat cv. TAM 115 (Fig. 6b, Extended Data for example, by the International Maize and Wheat Improvement Fig. 7b,c and Supplementary Note). Center (CIMMYT)50. However, without the use of high-resolution As an alternative to conventional breeding, we targeted the genomic information, the use of synthetic wheats was not precisely SrTA1662 candidate stem rust resistance gene (Fig. 2d) for intro- tracked. As illustrated here for wheat curl mite resistance, this led duction into wheat by direct transformation. We cloned a 10,541-bp to the same gene being introgressed from two different Ae. taus- genomic fragment encompassing the complete SrTA1662 tran- chii lineages. Our study highlights how synthetic wheats can now be scribed region as well as >3 kb of 3′- and 5′-untranslated region explored in a more directed manner. Our public library of synthetic (UTR) putative regulatory sequences; this was sufficient to wheats, which captures 70% of the diversity present across all three confer full race-specific stem rust resistance in transgenic wheat Ae. tauschii lineages, allows immediate trait assessment in a hexa- (Fig. 6c, Extended Data Fig. 10, Supplementary Table 15 and ploid background. The trait-associated haplotypes can be used to Supplementary Note). design molecular markers to precisely track the desired gene in a breeding program. In conclusion, our study provides an end-to-end Discussion pipeline for rapid and systematic exploration of the Ae. tauschii gene The origin of hexaploid bread wheat has long been the subject of pool for improving modern bread wheat. intense scrutiny. Archeological and genetic evidence suggests that diploid and tetraploid wheats were first cultivated 10,000 years ago Online content in the Fertile Crescent (Fig. 1a)5,6. The expansion of tetraploid wheat Any methods, additional references, Nature Research report- cultivation northeast into Caspian Iran and towards the Caucasus ing summaries, source data, extended data, supplementary infor- region resulted in sympatry with Ae. tauschii and the emergence of mation, acknowledgements, peer review information; details of hexaploid bread wheat6. Ae tauschii displays a high level of genetic author contributions and competing interests; and statements of differentiation among local populations, and genetic marker analy- data and code availability are available at https://doi.org/10.1038/ sis suggests that the wheat D subgenome donor was recruited from s41587-021-01058-4. an L2 population of Ae. tauschii in the southwestern coastal area of the Caspian Sea8. However, not all the diversity within the wheat D Received: 1 February 2021; Accepted: 16 August 2021; subgenome can be explained by a single hybridization event6,44,45. Published online: 1 November 2021 NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 429 Articles NaTurE BioTEcHNoloGy References 30. Wan, H., Yang, Y., Li, J., Zhang, Z. & Yang, W. Mapping a major QTL for 1. Dubcovsky, J. & Dvorak, J. Genome plasticity a key factor in the success of hairy leaf sheath introgressed from Aegilops tauschii and its association with polyploid wheat under domestication. Science 316, 1862–1866 (2007). enhanced grain yield in bread wheat. Euphytica 205, 275–285 (2015). 2. Pont, C. et al. Tracing the ancestry of modern bread wheats. Nat. Genet. 51, 31. Jakoby, M. J. et al. Transcriptional profiling of mature Arabidopsis trichomes 905–911 (2019). reveals that NOECK encodes the MIXTA-like transcriptional regulator 3. Marcussen, T. et al. Ancient hybridizations among the ancestral genomes of MYB106. Plant Physiol. 148, 1583–1602 (2008). bread wheat. Science 345, 1250092 (2014). 32. Claeys, H. et al. Control of meristem determinacy by trehalose 6-phosphate 4. Huang, S. et al. Genes encoding plastid acetyl-CoA carboxylase and phosphatases is uncoupled from enzymatic activity. Nat. Plants 5, 3-phosphoglycerate kinase of the Triticum/Aegilops complex and the 352–357 (2019). evolutionary history of polyploid wheat. Proc. Natl Acad. Sci. USA 99, 33. Koppolu, R. et al. Six-rowed spike4 (Vrs4) controls spikelet determinacy and 8133–8138 (2002). row-type in barley. Proc. Natl Acad. Sci. USA 110, 13198–13203 (2013). 5. Zohary, D., Hopf, M. & Weiss, E. Domestication of Plants in the Old World: 34. Klymiuk, V. et al. Cloning of the wheat Yr15 resistance gene sheds light on The Origin and Spread of Domesticated Plants in Southwest Asia, Europe, and the plant tandem kinase-pseudokinase family. Nat. Commun. 9, 3735 (2018). the Mediterranean Basin 4th edn (Oxford Scholarship Online, 2012). 35. Brueggeman, R. et al. The barley stem rust-resistance gene Rpg1 is a novel 6. Giles, R. J. & Brown, T. A. GluDy allele variations in Aegilops tauschii and disease-resistance gene with homology to receptor kinases. Proc. Natl Acad. Triticum aestivum: implications for the origins of hexaploid wheats. Theor. Sci. USA 99, 9328–9333 (2002). Appl. Genet. 112, 1563–1572 (2006). 36. Chen, S. et al. Wheat gene Sr60 encodes a protein with two putative 7. Zhou, Y. et al. Triticum population sequencing provides insights into wheat kinase domains that confers resistance to stem rust. New Phytol. 225, adaptation. Nat. Genet. 52, 1412–1422 (2020). 948–959 (2020). 8. Wang, J. et al. Aegilops tauschii single nucleotide polymorphisms shed light 37. Lu, P. et al. A rare gain of function mutation in a wheat tandem kinase on the origins of wheat D-genome genetic diversity and pinpoint the confers resistance to powdery mildew. Nat. Commun. 11, 680 (2020). geographic origin of hexaploid wheat. New Phytol. 198, 925–937 (2013). 38. Malik, R., Brown-Guedira, G. L., Smith, C. M., Harvey, T. L. & Gill, B. S. 9. Li, A., Liu, D., Yang, W., Kishii, M. & Mao, L. Synthetic hexaploid wheat: Genetic mapping of wheat curl mite resistance genes Cmc3 and Cmc4 in yesterday, today, and tomorrow. Engineering 4, 552–558 (2018). common wheat. Crop Sci. 43, 644–650 (2003). 10. Gill, B. S. & Raupp, W. J. Direct genetic transfers from Aegilops squarrosa L. 39. Dhakal, S. et al. Mapping and KASP marker development for wheat curl mite to hexaploid wheat. Crop Sci. 27, 445–450 (1987). resistance in ‘TAM 112’ wheat using linkage and association analysis. Mol. 11. Gill, B. S. et al. Wheat Genetics Resource Center: the first 25 years. Adv. Breed. 38, 119 (2018). Agron. 89, 73–136 (2006). 40. Zhao, J. et al. Development of single nucleotide polymorphism markers for 12. Paux, E., Sourdille, P., Mackay, I. & Feuillet, C. Sequence-based marker the wheat curl mite resistance gene Cmc4. Crop Sci. 59, 1567–1575 (2019). development in wheat: advances and applications to breeding. Biotechnol. 41. Smith, C. M. & Clement, S. L. Molecular bases of plant resistance to Adv. 30, 1071–1088 (2012). arthropods. Annu. Rev. Entomol. 57, 309–328 (2012). 13. Watson, A. et al. Speed breeding is a powerful tool to accelerate crop research 42. Cox, T. S. et al. Registration of KS96WGRC40 hard red winter wheat and breeding. Nat. Plants 4, 23–29 (2018). germplasm resistant to wheat curl mite, Stagnospora leaf blotch, and Septoria 14. Luo, M. C. et al. Genome sequence of the progenitor of the wheat D genome leaf blotch. Crop Sci. 39, 597–597 (1999). Aegilops tauschii. Nature 551, 498–502 (2017). 43. Rudd, J. C. et al. ‘TAM 112’ wheat, resistant to greenbug and wheat curl mite 15. Singh, N. et al. Genomic analysis confirms population structure and identifies and adapted to the dryland production system in the Southern High Plains. J. inter-lineage hybrids in Aegilops tauschii. Front. Plant Sci. 10, 9 (2019). Plant Regist. 8, 291–297 (2014). 16. Mizuno, N., Yamasaki, M., Matsuoka, Y., Kawahara, T. & Takumi, S. 44. Talbert, L. E., Smith, L. Y. & Blake, N. K. More than one origin of hexaploid Population structure of wild wheat D-genome progenitor Aegilops tauschii wheat is indicated by sequence comparison of low-copy DNA. Genome 41, Coss.: implications for intraspecific lineage diversification and evolution of 402–407 (1998). common wheat. Mol. Ecol. 19, 999–1013 (2010). 45. Dvorak, J., Luo, M. C. & Yang, Z.-L. Genetic evidence on the origin of 17. Cheng, H. et al. Frequent intra- and inter-species introgression shapes the Triticum aestivum L. In The Origins of Agriculture and Crop Domestication, landscape of genetic variation in bread wheat. Genome Biol. 20, 136 (2019). Proceedings of the Harlan Symposium, Aleppo, Syria (eds Damania, A. B. et al) 18. Matsuoka, Y. et al. Genetic basis for spontaneous hybrid genome doubling 235–251 (ICARDA, 1997). during allopolyploid speciation of common wheat shown by natural variation 46. Delorean, E. et al. High molecular weight glutenin gene diversity in Aegilops analyses of the paternal species. PLoS ONE 8, e68310 (2013). tauschii demonstrates unique origin of superior wheat quality. Commun. Biol. 19. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure https://doi.org/10.1038/s42003-021-02563-7 (2021). using multilocus genotype data. Genetics 155, 945–959 (2000). 47. Alonso-Blanco, C. et al. 1,135 genomes reveal the global pattern of 20. Puechmaille, S. J. The program STRUCTURE does not reliably recover the polymorphism in Arabidopsis thaliana. Cell 166, 481–491 (2016). correct population structure when sampling is uneven: subsampling and new 48. Uauy, C., Wulff, B. B. H. & Dubcovsky, J. Combining traditional mutagenesis estimators alleviate the problem. Mol. Ecol. Resour. 16, 608–627 (2016). with new high-throughput sequencing and genome editing to reveal hidden 21. Walkowiak, S. et al. Multiple wheat genomes reveal global variation in variation in polyploid wheat. Annu. Rev. Genet. 51, 435–454 (2017). modern breeding. Nature 588, 277–283 (2020). 49. McFadden, E. S. & Sears, E. R. The origin of Triticum spelta and its 22. Arora, S. et al. Resistance gene discovery and cloning by sequence capture free-threshing hexaploid relatives. J. Hered. 37, 81–89 (1946). and association genetics. Nat. Biotechnol. 37, 139–143 (2019). 50. Das, M. K., Bai, G., Mujeeb-Kazi, A. & Rajaram, S. Genetic diversity among 23. Olson, E. L. et al. Simultaneous transfer, introgression, and genomic synthetic hexaploid wheat accessions (Triticum aestivum) with resistance to localization of genes for resistance to stem rust race TTKSK (Ug99) from several fungal diseases. Genet. Resour. Crop Evol. 63, 1285–1296 (2016). Aegilops tauschii to wheat. Theor. Appl. Genet. 126, 1179–1188 (2013). 51. International Wheat Genome Sequencing Consortium (IWGSC) et al. 24. Yan, L. et al. The wheat and barley vernalization gene VRN3 is an orthologue Shifting the limits in wheat research and breeding using a fully annotated of FT. Proc. Natl Acad. Sci. USA 103, 19581–19586 (2006). reference genome. Science 361, eaar7191 (2018). 25. Bonnin, I. et al. FT genome A and D polymorphisms are associated with the variation of earliness components in hexaploid wheat. Theor. Appl. Genet. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in 116, 383–394 (2008). published maps and institutional affiliations. 26. Dixon, L. E. et al. Developmental responses of bread wheat to changes in Open Access This article is licensed under a Creative Commons ambient temperature following deletion of a locus that includes FLOWERING Attribution 4.0 International License, which permits use, sharing, adap- LOCUS T1. Plant. Cell Environ. 41, 1715–1725 (2018). tation, distribution and reproduction in any medium or format, as long 27. Pshenichnikova, T. A. et al. Quantitative characteristics of pubescence as you give appropriate credit to the original author(s) and the source, provide a link to in wheat (Triticum aestivum L.) are associated with photosynthetic the Creative Commons license, and indicate if changes were made. The images or other parameters under conditions of normal and limited water supply. Planta 249, third party material in this article are included in the article’s Creative Commons license, 839–847 (2019). unless indicated otherwise in a credit line to the material. If material is not included in 28. Glas, J. J. et al. Plant glandular trichomes as targets for breeding or engineering the article’s Creative Commons license and your intended use is not permitted by statu- of resistance to herbivores. Int. J. Mol. Sci. 13, 17077–17103 (2012). tory regulation or exceeds the permitted use, you will need to obtain permission directly 29. Navia, D. et al. Wheat curl mite, Aceria tosichella, and transmitted viruses: from the copyright holder. To view a copy of this license, visit http://creativecommons. an expanding pest complex affecting cereal crops. Exp. Appl. Acarol. 59, org/licenses/by/4.0/. 95–143 (2013). © The Author(s) 2021 430 NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles 1John Innes centre, Norwich research Park, Norwich, UK. 2Department of Plant Pathology and Wheat Genetics resource center, Kansas State University, Manhattan, KS, USA. 3Programa Nacional de cultivos de Secano, Instituto Nacional de Investigación Agropecuaria (INIA), Estación Experimental La Estanzuela, colonia, Uruguay. 4Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland. 5The John Bingham Laboratory, NIAB, cambridge, UK. 6crop Development centre, Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, canada. 7Faculty of Land and Food Systems, The University of British columbia, Vancouver, British columbia, canada. 8Leibniz-Institute of Plant Genetics and crop Plant research (IPK) Gatersleben, Seeland, Germany. 9National Key Laboratory of crop Genetics and Germplasm Enhancement, cytogenetics Institute, Nanjing Agricultural University/JcIc-McP, Nanjing, china. 10School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, South Australia, Australia. 11Department of Agricultural and Food Sciences, Alma Mater Studiorum, University of Bologna, Bologna, Italy. 12Department of Agroecology, Global rust reference center, Aarhus University, Slagelse, Denmark. 13Texas A&M AgriLife research, Amarillo, TX, USA. 14Institute for cereal crops Improvement, School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv, Israel. 15Department of Agrobiotechnology (IFA-Tulln), Institute of Biotechnology in Plant Production, University of Natural resources and Life Sciences, Vienna, Austria. 16Laboratory of Plant Breeding, Department of Agronomy, Faculty of Agriculture, Universitas Gadjah Mada, Yogyakarta, Indonesia. 17Department of Agronomy and Plant Breeding, Ilam University, Ilam, Iran. 18Institute of Botany, Plant Physiology and Genetics, Tajik National Academy of Sciences, Dushanbe, Tajikistan. 19Germplasm resources Unit, John Innes centre, Norwich research Park, Norwich, UK. 20Department of Plant Pathology, University of Minnesota, Saint Paul, MN, USA. 21School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India. 22commonwealth Scientific and Industrial research Organization (cSIrO), Agriculture and Food, canberra, Australian capital Territory, Australia. 23Wheat research Department, Field crops research Institute, Agricultural research center, Giza, Egypt. 24Earlham Institute, Norwich research Park, Norwich, UK. 25QIAGEN Aarhus A/S, Aarhus, Denmark. 26Department of Plant Science and Landscape Architecture, University of Maryland, college Park, MD, USA. 27Institute of crop Science, chinese Academy of Agricultural Sciences, Beijing, china. 28Triticeae research Institute, Sichuan Agricultural University, chengdu, china. 29USDA-ArS cereal crops research Unit, Edward T. Schafer Agricultural research center, Fargo, ND, USA. 30USDA-ArS, Plant Science research Unit, raleigh, Nc, USA. 31Department of Plant Sciences, University of california, Davis, cA, USA. 32Department of Plant and Microbial Biology, University of california, Berkeley, cA, USA. 33Plant Genome and Systems Biology, helmholtz center Munich, Neuherberg, Germany. 34Faculty of Life Sciences, Technical University Munich, Weihenstephan, Germany. 35German centre for Integrative Biodiversity research (iDiv) halle-Jena-Leipzig, Leipzig, Germany. 36Present address: Bayer r&D Services LLc, Kansas city, MO, USA. 37Present address: center for Desert Agriculture, Biological and Environmental Science and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia. 38Present address: International Maize and Wheat Improvement center (cIMMYT), Texcoco, Mexico. 39These authors contributed equally: Kumar Gaurav, Sanu Arora, Paula Silva, Javier Sánchez-Martín, richard horsnell. ✉e-mail: a.bentley@cgiar.org; bkeller@botinst.uzh.ch; jpoland@ksu.edu; brande.wulff@kaust.edu.sa NATURE BiOTECHNOLOGY | VOL 40 | MArch 2022 | 422–431 | www.nature.com/naturebiotechnology 431 Articles NaTurE BioTEcHNoloGy Methods Trichomes. For counting trichomes and measuring flowering time in Ae. tauschii, SNP calling relative to the AL8/78 reference genome. Following whole-genome 50 L1 accessions and 150 L2 accessions were pregerminated at ~4 °C in Petri shotgun sequencing, we called SNPs across the panel relative to the Ae. tauschii dishes on wet filter paper for 2 d in the dark. They were transferred to room AL8/78 reference genome assembly. The 306 Ae. tauschii samples were aligned to temperature (~20 °C) and daylight for 4 d. Three seedlings of each genotype the Ae. tauschii AL8/78 reference genome14 using HISAT2 default parameters52. All were transplanted on 22 January 2019 into 96-cell trays filled with a mixture of alignment BAM files were sorted and duplicates removed using SAMtools (v.1.9 peat and sand and then grown under natural vernalization in a glasshouse with ‘view’, ‘sort’ and ‘rmdup’ sub-commands). All BAM files were fed into the variant no additional light source or heating at the John Innes Centre, Norwich, UK. call pipeline using BCFtools (-q 20 -a DP,DV | call -mv -f GQ) with parallelization Trichome phenotyping was conducted 1 month later. Close-up photographs of ‘-r $region’ of 4-Mb windows for a total of 1,010 intervals (regions). The raw the second leaf from seedlings at the three-leaf stage were taken and visualized variant files were filtered or recalled using a published AWK script based on DP/ in ImageJ, and trichomes were counted along one side of a 20-mm leaf margin in DV ratios (the ratio of non-reference read depth and total read depth) with default the mid-leaf region. Measurements were taken from three biological replicates parameters (https://bitbucket.org/ipk_dg_public/vcf_filtering/src/master/) except (Supplementary Table 8). minPresent parameter (we used minPresent = 0.8 and minPresent = 0.1). The minPresent=0.8 dataset was used for redundancy analysis. The minPresent = 0.1 Flowering time, biological replicate 1. Three seedlings used for trichome and minPresent = 0.8 were both used for genome-wide association study (GWAS) phenotyping (see above) were transferred on 25 March into individual 2 l pots analysis. The resulting matrix (104 million SNPs for minPresent = 0.1 concatenated filled with cereal mix soil60. Flowering time was recorded when the first five spikes using BCFtools v.1.11) were uploaded to Zenodo. were three-fourths emerged from the flag leaf sheath, equivalent to a 55 on the Zadoks growth scale61 (Supplementary Table 8). Quality control for redundancy and residual heterogeneity. A total of 100,900 (100 every 4-Mb window) SNPs were randomly chosen to compute pairwise Flowering time, biological replicates 2 and 3. A total of 147 Ae. tauschii L2 accessions identity by state among all samples for a total of 46,665 comparisons using custom were grown in the winters of 2018/2019 and 2019/2020 in the greenhouse at the R and AWK scripts (https://github.com/wheatgenetics/owwc). For every sample Department of Agrobiotechnology, University of Natural Resources and Life pair, a percent identity greater than 99.5% was deemed redundant based on the Sciences, Vienna, Austria. Seeds of each accession were sown in multitrays in histogram distribution of all identity by state values (Extended Data Fig. 1c). This a mixture of heat-sterilized compost and sand and stratified for 1 week before analysis confirmed the results of the KASP analysis conducted on the L2 accessions germination at 4 °C with a 12 h day/12 h night light regimen. Thereafter, the seeds (Extended Data Fig. 1b and Supplementary Note). were germinated at 22 °C and at the one-leaf stage vernalized for 11 weeks. Five For each accession (except TOWWC0193, which is related to the reference seedlings per accession were transplanted to 4 l pots (18 cm in diameter, 21 cm genome AL8/78), the fraction of heterozygous SNPs in the total number of biallelic in height) filled with a mixture of heat-sterilized compost, peat, sand and rock SNPs was computed. Based on the distribution of these values (Extended Data Fig. flour. In the winter of 2018/2019, one pot (= one replicate) per accession was 1d and Supplementary Table 3), 0.1 was deemed to indicate a low degree of residual planted, whereas in 2019/2020, two pots (= two replicates) were planted. The pots heterogeneity. BW_26042, with a value of 0.17, was found to be the only outlier were randomly arranged in the greenhouse and maintained at a temperature of exceeding this threshold. 14/10 °C day/night with a 12 h photoperiod for the first 40 d. At spike emergence, Based on these quality control analyses, a non-redundant and genetically stable the temperature was increased to 22/18 °C day/night with a 16 h photoperiod at set of 242 accessions was retained for further analysis. The redundant pairs, along 15,000 lx. At least ten spikes per pot were evaluated for beginning of anthesis, taken with the different similarity scores, are given in Supplementary Table 4, and the set as 60 on the Zadoks growth scale61, resulting in a minimum of 30 assessed spikes of 242 non-redundant accessions is provided in Supplementary Table 5. per accession. Flowering time was recorded every second day. The flowering date was analyzed using a linear mixed model, which considered De novo assembly from whole-genome shotgun short-read data. The primary subsampling of individual spikes within each pot as follows: sequence data of non-redundant accessions were trimmed using Trimmomatic Yijkl = μ + gi + ej + geij + rjk + pijk + εijkl v.0.238 and de novo assembled with the MEGAHIT v.1.1.3 assembler using default parameters53. The output of the assembler for each accession was a FASTA file Here, Yijkl denotes the flowering date observation of the individual spikes, μ is containing all the contig sequences. The assemblies are available from Zenodo. the grand mean and gi is the genetic effect of the ith accession. The environment effect, ej, is defined as the effect of the jth year, and the genotype-by-environment Genome assembly of Ae. tauschii accession TOWWC0112. TOWWC0112 (line interaction is described by geij. rjk is the effect of the kth replication within the jth BW_01111) was assembled by combining paired-end and mate-pair sequencing year, pijk is the effect of the ith pot within the kth replication and jth year and εijkl is reads using TRITEX54, an open-source computational workflow. A PCR-free the residual term. Analysis was performed with R v.3.5.1 (ref. 62) using the package 250-bp paired-end library with an insert size range of 400–500 bp was sequenced sommer63 with all effects considered as random except gi, which was modeled as a to a coverage of ~70. Mate-pair libraries MP3 and MP6, with insert size ranges of fixed effect to obtain the best linear unbiased estimates (Supplementary Table 8). 2–4 kb and 5–7 kb, respectively, were sequenced to a coverage of ~20. The assembly generated had an N50 of 196 kb (Supplementary Table 7). The assembly is available Spikelets per spike. For Ae. tauschii spikelet phenotyping, 151 accessions from L2 from the electronic Data Archive Library (e!DAL). were vernalized at a constant temperature of 4 °C for 8 weeks in a growth chamber (Conviron). After vernalization, the accessions were transplanted to 3.8 l pots in Genome assembly of Ae. tauschii accession TOWWC0106. Accession potting mix (peat moss and vermiculite) and placed in a temperature-controlled TOWWC0106 (line BW_01105) was sequenced on a PacBio Sequel II platform Conviron growth chamber with diurnal temperatures gradually changing from (Pacific Biosciences) with single-molecule, real-time chemistry and on the 12 °C at 02:00 to 17 °C at 14:00 with a 16 h photoperiod and 80% relative humidity. Illumina platform. For single-molecule, real-time library preparation, ~7 μg of To represent biological replication, each accession was grown in two pots, and high-quality genomic DNA was fragmented to a 20-kb target size and assessed each pot contained two plants. At the transplanting stage, 10 g of a slow-release on an Agilent 2100 Bioanalyzer55. The sheared DNA was end repaired, ligated N-P-K fertilizer was added to each pot. At physiological maturity, 5–15 main stem/ to blunt-end adaptors and size selected. The libraries were sequenced by Berry tiller spikes per replication (that is, per pot) were collected, and the number of Genomics. A standard Illumina protocol was followed to make libraries for immature as well as mature spikelets were counted. Any obvious weak heads from PCR-free paired-end genome sequencing with ~1 μg of genomic DNA that late-growing tillers were not included. Least square means for each replication were was fragmented and size selected (350 bp) by agarose gel electrophoresis. The used for k-mer-based association genetic analysis (Supplementary Table 8). size-selected DNA fragments were end blunted, provided with an A-base overhang and then ligated to sequencing adapters. A total of 251.8 Gb of Powdery mildew. Resistance to B. graminis f. sp. tritici was assessed with Bgt96224, high-quality 150 paired-end PCR-free reads were generated and sequenced on the a highly avirulent isolate from Switzerland64, using inoculation procedures NovaSeq sequencing platform. previously described65. Disease levels were assessed 7–9 d after inoculation as A set of 11.35 million PacBio long reads (289.6 Gb), representing a ~66-fold one of five classes of host reactions: resistance (R; 0–10% of leaf area covered), genome coverage, was assembled using the CANU pipeline with default intermediate resistance (IR; 10–25% of leaf area covered), intermediate (I; 25–50% parameters56. The assembled contigs were polished with 251.8 Gb of PCR-free reads of leaf area covered), intermediate susceptible (IS; 50–75% of leaf area covered) and using Pilon default parameters57. The resulting assembly had an N50 of 1.5 Mb susceptible (S; >75% of leaf area covered) (Supplementary Table 8). (Supplementary Table 7). The assembly is available from e!DAL. Wheat curl mite. A total of 210 Ae. tauschii accessions, 102 from L1 and 108 from Phenotyping the Ae. tauschii diversity panel and synthetic hexaploid wheat L2 (Supplementary Table 8), were screened for their response against wheat lines. Wheat stem rust. The wheat stem rust phenotypes with P. graminis f. sp. curl mite. Aceria tosichella (Keifer) biotype 1 colonies (courtesy of M. Smith, tritici isolate 04KEN156/04, race TTKSK, and isolate 75ND717C, race QTHJC, Department of Entomology, Kansas State University) were mass reared under were obtained from Arora et al.22. As part of this study, we also phenotyped the controlled conditions at 24 °C in a 14 h light/10 h dark cycle using the susceptible same Ae. tauschii lines with isolate UK-01 (race TKTTF)58 (Supplementary wheat cv. Jagger. The biotype 1 colony was previously reported as avirulent toward Table 8) using the same procedures as described in ref. 59. UK-01 was obtained all Cmc resistance genes38,66–68. A single colony consisted of an individual pot with from Limagrain. ~50 seedlings, and 20 colonies were grown to have sufficient mite inoculum to NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles conduct the phenotyping. Colonies were placed inside 45 cm × 45 cm × 75 cm Ae. tauschii if at least 20% of 100,000 k-mers within that segment were usable as mite-proof cages covered with a 36-µm mesh screen (ELKO Filtering Co.) to avoid well as present in at least one non-redundant Ae. tauschii accession. A segment contamination until being used to infest the Ae. tauschii accessions. Accessions assigned to Ae. tauschii was further assigned to one of the three lineages (L1, L2 from L1 and L2 were evaluated in independent experiments. Six plants per and L3) if the count of usable k-mers specific to that lineage exceeded the count of accession were individually grown in 5 cm × 5 cm × 5 cm pots under controlled those specific to the other lineages by at least 0.01% of 100,000 k-mers. Scripts to conditions at 24 °C in a 14 h light/10 h dark cycle. Pots were arranged randomly in determine the counts of lineage-specific and total Ae. tauschii k-mers per 100-kb an incomplete block design where the block was the tray fitting 32 pots (8 rows and segment are published at https://github.com/wheatgenetics/owwc, and the output 4 columns). A single pot with the susceptible check cv. Jagger was included in each files obtained for 11 wheat assemblies were collated in an Excel file that is available tray. Accessions were infested at the two-leaf stage, with mite colonies collected from Zenodo. from infested pieces of leaves from the susceptible plants and spread as straw over the pots. Plants were evaluated individually 10–14 d after infestation. Wheat curl Anchoring of a de novo assembly to a reference genome. The contigs of a mite damage was assessed as curled or trapped leaves using a visual scale from de novo assembly were ordered along a chromosome-level reference genome using 0 to 4, with 0 indicating no symptoms and 1 to 4 indicating increasing levels of minimap2 (ref. 75) (version 2.14 or above), and the genomic coordinates of their curliness or trapped leaves (Extended Data Fig. 7a). longest hits were assigned. The adjusted mean or best linear unbiased estimator for each accession was calculated with the ‘lme4’ R package69 using the following linear regression model: Correlation prefiltering. For each of the assembly k-mers (including those present at multiple loci), if also present in the precalculated presence/absence matrix, yijkl = μ + Gi + Tj + Rk(j) + Cl(j) + eijkl Pearson’s correlation between the vector of that k-mer’s presence/absence and the vector of the phenotype scores was calculated. Only those k-mers for which the Here, yijkl is the phenotypic value, µ is the overall mean, Gi is the fixed effect of absolute value of correlation obtained was higher than a threshold (0.2 by default) the ith accession (genotype), Tj is the random effect of the jth tray assumed as were retained to reduce the computational burden of association mapping using independent and identically distributed (iid) Tj ≈ N(0, σ2 T), Rk(j) is the random linear regression. effect of the kth row nested within the jth tray assumed distributed as iid Rk(j) ≈ N(0, σ2 R), Cl(j) is the random effect of the lth column nested within the Linear regression model accounting for population structure. To each filtered jth tray assumed distributed as iid Cl(j) ≈ N(0, σ2 C) and eijkl is the residual error k-mer from the previous step, a P value was assigned using linear regression with distributed as iid eijkl ≈ N(0, σ2 e). a number of leading PCA dimensions as covariates to control for the population structure. PCA was computed using the aforementioned set of 100,000 k-mers. k-mer presence/absence matrix. k-mers (k = 51) were counted in trimmed raw The exact number of leading PCA dimensions was chosen heuristically. Too data per accession using Jellyfish70 (version 2.2.6 or above). k-mers with a count of high a number might overcorrect for population structure, while too few might less than two in an accession were discarded immediately. k-mer counts from all undercorrect. In the context of this study, three dimensions were found to accessions were integrated to create a presence/absence matrix with one row per represent a good trade-off. k-mer and one column per accession. The entries were reduced to 1 (presence) and 0 (absence). k-mers occurring in less than two accessions or in all but one Approximate Bonferroni threshold computation. For each phenotype in this accession were removed during the construction of the matrix. Programs to study, the total number of k-mers used in association mapping varied between process the data were implemented in Python and are published at https://github. 3,000,000,000 and 5,000,000,000. In general, if the k-mer size is 51, a SNP or any com/wheatgenetics/owwc. The k-mer matrix is available from e!DAL. other structural variant would give rise to at least 51 k-mer variants. Therefore, the total number of tested k-mer variants should be divided by 51 to get the effective Phylogenetic tree construction. A random set of 100,000 k-mers was extracted number of variants to adjust the P value threshold for multiple testing. Assuming from the k-mer matrix to build an unweighted pair group method with arithmetic a P value threshold of 0.05, a Bonferroni-adjusted –log P value threshold between mean (UPGMA) tree with 100 bootstraps using the Bio.Phylo module from the 9.1 and 9.3 was obtained for each phenotype. The more stringent cutoff of 9.3 was Biopython v.1.77 (http://biopython.org) package. Further, a Python script was chosen throughout this study. used to generate an iTOL-compatible (https://itol.embl.de/) tree for rendering and annotation. The Python script and the random set of 100,000 k-mers used for Generating association mapping plots. Association mapping plots were generated generating the tree are available at https://github.com/wheatgenetics/owwc. using Python. For a chromosome-level reference assembly, each integer on the x axis corresponds to a 10-kb genomic block starting from that position. For Bayesian cluster analysis using STRUCTURE. Bayesian clustering implemented an anchored assembly, each integer on the x axis represents the scaffold that is in STRUCTURE19 version 2.3.4 was used to investigate the number of distinct anchored starting from that position. Dots on the plot represent the –log P values lineages of Ae. tauschii. To control the bias due to the highly unbalanced of the filtered k-mers within each block. Dot size is proportional to the number of proportion of the three groups20 in the non-redundant sequenced accessions k-mers with the specific –log P value. The plotting script is published at https:// (119 accessions of L2, 118 accessions of L1 and 5 accessions of putative L3), 10 github.com/wheatgenetics/owwc. accessions each of L1 and L2 were randomly selected for each STRUCTURE run along with the 5 accessions of the putative L3 and the control L1–L2 RIL. The Optimization of k-mer GWAS in Ae. tauschii. We used previously generated stem random selection of 10 accessions each of L1 and L2 was performed 11 times rust phenotype data for P. graminis f. sp. tritici isolate 04KEN156/04, race TTKSK, without replacement, thus covering a total of 110 accessions each of L1 and L2 on 142 Ae. tauschii L2 accessions22. Mapping k-mers with an association score of over 11 STRUCTURE runs (Supplementary Table 6). STRUCTURE simulations >6 to the Ae. tauschii reference genome AL8/78 gave rise to significant peaks for were run using a random set of 100,000 k-mers with a burn-in length of 100,000 the positive controls Sr45 and Sr46 (Extended Data Fig. 4a). The peaks contain iterations followed by 150,000 Markov chain Monte Carlo iterations for five k-mers that are negatively correlated with resistance (shown as red dots) because replicates each of K ranging from 1 to 6. STRUCTURE output was uploaded to the AL8/78 reference accession does not contain Sr45 and Sr46. To identify the Structure Harvester (http://taylor0.biology.ucla.edu/structureHarvester; Web true Sr45 and Sr46 haplotypes, accession TOWWC0112 (which contains Sr45 and v.0.6.94 July 2014; Plot vA.1 November 2012; Core vA.2 July 2014)71 to generate Sr46)22 was assembled from tenfold whole-genome shotgun data using MEGAHIT a ΔK plot for each run. For each STRUCTURE run, a clear peak was observed (N50 = 1.1 kb) and used in association mapping. However, noise masked the at K = 3 in the ΔK plot, suggesting that there are three distinct lineages of Ae. positive signals from Sr45 and Sr46 when the short scaffolds were distributed tauschii19,71. STRUCTURE results were processed and plotted using CLUMPAK72,73 randomly along the x axis (Extended Data Fig. 4b). Anchoring the scaffolds to the (http://clumpak.tau.ac.il/; beta version accessed on 11 May 2021) to maintain the AL8/78 reference genome considerably improved the plot and produced positive label collinearity for multiple replicates of each K. signals for Sr45 and Sr46 (blue peaks; Extended Data Fig. 4c). An improved assembly (N50 = 196 kb), generated with mate-pair libraries and again anchored to Determination of genome-wide fixation index. Genome-wide pairwise fixation AL8/78, further reduced the background noise (Extended Data Fig. 4d). index (FST) between the three Ae. tauschii lineages was computed using VCFtools74 v.0.1.15 with the parameters ‘–fst-window-size’ and ‘–fst-window-step’ set to Performing k-mer GWAS in Ae. tauschii with reduced coverage. The trimmed 1,000,000 and 100,000, respectively. sequence data of each non-redundant accession was randomly subsampled to reduce the coverage to 7.5-fold, 5-fold, 3-fold and 1-fold. For each coverage point, Admixture analysis of the wheat D subgenome. To assign segments of the the k-mer GWAS pipeline was applied, and k-mers with an association score of >6 wheat D subgenome to Ae. tauschii lineages for each of the 11 chromosome-scale were mapped to the Ae. tauschii reference genome AL8/78 (Extended Data Fig. 5). wheat assemblies21, we considered only those k-mers as usable that were present at a single locus in the D subgenome. Furthermore, out of these k-mers, for Computing genome-wide LD. The Ae. tauschii AL8/78 reference genome was nine modern cultivars, only those k-mers were considered usable that were also partitioned into five segments (R1, R2a, C, R2b and R3; Extended Data Fig. 8) present in the short-read sequences from 28 hexaploid wheat landraces17. For the based on the distribution of the recombination rate, where the boundaries between assembled wheat genomes, each chromosome of the D subgenome was divided these regions were imputed using the boundaries established for the Chinese into 100-kb non-overlapping segments. A 100-kb segment was assigned to Spring RefSeqv1.0 D subgenome51. PopLDdecay76 v.3.41 with the parameter NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy ‘-MaxDist’ set to 5 Mb was used to determine the LD decay in these regions for 65. Sánchez-Martín, J. et al. Rapid gene isolation in barley and wheat by mutant both L1 and L2. For L2, the value of mean r2 in the telomeric regions R1 and R3 chromosome sequencing. Genome Biol. 17, 221 (2016). dropped below 0.1 at genomic distances of 291 kb and 476 kb, respectively, while 66. Aguirre-Rojas, L. et al. Resistance to wheat curl mite in arthropod-resistant for L1, the corresponding genomic distances were 661 kb and 561 kb, respectively. rye-wheat translocation lines. Agronomy 7, 74 (2017). 67. Chuang, W. P. et al. Wheat genotypes with combined resistance to wheat curl Reporting Summary. Further information on research design is available in the mite, wheat streak mosaic virus, wheat mosaic virus, and Triticum mosaic Nature Research Reporting Summary linked to this article virus. J. Econ. Entomol. 110, 711–718 (2017). 68. Harvey, T. L., Seifers, D. L., Martin, T. J., Brown-Guedira, G. & Gill, B. S. Data availability Survival of wheat curl mites on different sources of resistance in wheat. Crop The raw PacBio and Illumina sequences used for the assembly of Ae. tauschii Sci. 39, 1887–1889 (1999). accession TOWWC0106 have been submitted to the Genome Sequence Archive 69. Bates, D., Mächler, M., Bolker, B. M. & Walker, S. C. Fitting linear (GSA) of the National Genomics Data Center hosted by the Beijing Genomics mixed-effects models using lme4. J. Stat. Softw. https://doi.org/10.18637/jss. Institute, Beijing, under the accession number CRA002681 and to NCBI under v067.i01 (2015). study number PRJNA730363. The genome assemblies and annotations of 70. Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel TOWWC0112 and TOWWC0106 are available from the Leibniz Institute of counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011). Plant Genetics and Crop Plant Research (IPK) at https://doi.ipk-gatersleben. 71. Earl, D. A. & vonHoldt, B. M. STRUCTURE HARVESTER: a website and de/DOI/4bb6f03f-3a15-429a-b542-9962cb676e63/953a2d8a-5ade-479a-9304- program for visualizing STRUCTURE output and implementing the Evanno 6fdd12da7ce4/2/1847940088. The 150-bp paired-end Illumina sequences for the method. Conserv. Genet. Resour. 4, 359–361 (2012). 306 Ae. tauschii accessions, the 250-bp paired-end and mate-pair libraries for 72. Jakobsson, M. & Rosenberg, N. A. CLUMPP: a cluster matching and accession T0WW0112 and the RNA sequencing data for 8 Ae. tauschii accessions permutation program for dealing with label switching and multimodality in are available from NCBI, study number PRJNA685125. The 150-bp paired-end analysis of population structure. Bioinformatics 23, 1801–1806 (2007). Illumina sequences for the hexaploid wheat accessions and the two additional Ae. 73. Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A. & Mayrose, I. tauschii accessions used in the Cmc4 and CmcTAM112 haplotype analysis (Fig. 4 and Clumpak: a program for identifying clustering modes and packaging Extended Data Fig. 7b,c) are available from NCBI, study number PRJNA694980. population structure inferences across K. Mol. Ecol. Resour. 15, 1179–1191 The k-mer matrix for 305 Ae. tauschii accessions and the tetraploid donor T. durum (2015). Hoh-501 used to generate synthetic hexaploids can be obtained from https:// 74. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, doi.ipk-gatersleben.de/DOI/dfc2d351-b5fe-41e6-bd6c-efe96cfcc7aa/0cef0e89- 2156–2158 (2011). acf2-451c-8efc-a71c0368fec4/2/1847940088. The variant call (SNP) file for 306 75. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics Ae. tauschii accessions based on the AL8/78 reference is available from Zenodo 34, 3094–3100 (2018). at https://doi.org/10.5281/zenodo.4317950. Counts of lineage-specific k-mers in 76. Zhang, C., Dong, S. S., Xu, J. Y., He, W. M. & Yang, T. L. PopLDdecay: a fast wheat genome assemblies are available from Zenodo at https://doi.org/10.5281/ and effective tool for linkage disequilibrium decay analysis based on variant zenodo.4474428. MEGAHIT assemblies for 303 Ae. tauschii accessions (including call format files. Bioinformatics 35, 1786–1788 (2019). the 242 non-redundant accessions) are available from Zenodo at https://doi. org/10.5281/zenodo.4430803, https://doi.org/10.5281/zenodo.4430872 and https:// Acknowledgements doi.org/10.5281/zenodo.4430891. A 29,243-bp fragment extracted from contig We are grateful to the germplasm banks at Kansas State University Wheat Genetics 00015145 of the Ae. tauschii TOWWC0106 assembly was deposited in the NCBI Resource Center, International Center for Agricultural Research in the Dry Areas, GenBank along with the coordinates of the WTK4 transcript SV01 under study USDA-ARS National Small Grains Collection, Leibniz Institute of Plant Genetics and number MW295405. The SrTA1662 gene and transcript sequence have been Crop Plant Research, Ilam University, Tajikistan Academy of Sciences and the N. I. deposited in NCBI Genbank under accession number MW526949. Figures that Vavilov Research Institute of Plant Industry for providing seed and/or collection data have associated raw data include Figs. 1–6 and Extended Data Figs. 1–9. of Ae. tauschii. We thank our colleagues Y. Yue, P. Crane and S. Burrows and John Innes Centre (JIC) Horticultural Services for plant husbandry, M. Ambrose for help with public Code availability distribution of germplasm, M. Craze and S. Bowden for help with creation of synthetic Scripts for SNP calling, k-mer matrix generation, redundancy analysis, wheats, H. Jones for help with elucidating provenance of Ae. tauschii donors used for determination of residual heterogeneity and phylogenetic tree construction, synthetic wheats, C. Kling for developing and making available the durum wheat line including iTOL.nwk files, admixture analysis, k-mer GWAS and SNP GWAS, can Hoh-501 used for generating synthetic wheats, R. Graf for supplying wheat cv. Radiant, be found in the repository https://github.com/wheatgenetics/owwc. M. Feldman for help with delimiting the Fertile Crescent in Fig. 1, H. Cherry Guo for managing Illumina sequencing, T. Olsson for Illumina data handling, C. Michael Smith for maintenance of wheat curl mite colonies, H. Ahlers for creating graphics, M. References Buttner for helpful discussions, A. Galvin and A. Lawn for OWWC communications, 52. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based A. Meldrum for drafting the OWWC research agreement, the JIC NBI Computing genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Infrastructure for Science group and the Kansas State University (KSU) BEOCAT for Biotechnol. 37, 907–915 (2019). HPC access and maintenance and S. Krattinger for reviewing the draft manuscript. 53. Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an This research was financed by the UK Biotechnology and Biological Sciences Research ultra-fast single-node solution for large and complex metagenomics assembly Council (BBSRC) Wheat Improvement Strategic Programme BB/I002561/1 to R.H. and via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015). A.R.B.; BBSRC Designing Future Wheat Institute Strategic Programme BB/P016855/1 54. Monat, C. et al. TRITEX: chromosome-scale sequence assembly of Triticeae to R.H., A.R.B., P.N., S.A.B., X.B., R.P.D., C.U. and B.B.H.W.; BBSRC Earlham Institute genomes with open-source tools. Genome Biol. 20, 284 (2019). Strategic Programme BBS/E/T/000PR9817 to R.P.D.; BBSRC-Embrapa Newton Fund BB/ 55. Pendleton, M. et al. Assembly and diploid architecture of an individual N019113/1 to P.N.; BBSRC grant BB/PPR1740/1 to W.H.; BBSRC National Capability human genome via single-molecule technologies. Nat. Methods 12, 780–786 award BBS/E/T/000PR9814 to R.P.D.; UK Research and Innovation-BBSRC National (2015). Capability grant BBS/E/J/000PR8000 to N.C.; a UKRI BBSRC Norwich Research Park 56. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive Biosciences Doctoral Training Partnership scholarship (BB/M011216/1) to A.N.H.; k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017). US National Science Foundation (NSF) Industry-University Cooperative Research 57. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial Center (IUCRC) Award 1822162 to J.P.; Phase II IUCRC at the KSU Center for Wheat variant detection and genome assembly improvement. PLoS ONE 9, e112963 Genetic Resources to J.P.; US-NSF award grant/FAIN 1339389 to J.P.; Kansas Wheat (2014). Commission award B65336 to J.P.; US-NSF award IOS-1238231 to J.D. and M.-C.L.; 58. Lewis, C. M. et al. Potential for re-emergence of wheat stem rust in the United States Department of Agriculture (USDA) to G.B.-G., S.X. and J.F.; National United Kingdom. Commun. Biol. 1, 13 (2018). Institute of Food and Agriculture-USDA awards to V.K.T. (2020-67013-31460) and 59. Kangara, N. et al. Mutagenesis of Puccinia graminis f. sp. tritici and selection L.G.; a Fulbright Scholars Program to P.S.; Swiss National Science Foundation award of gain-of-virulence mutants. Front. Plant Sci. 11, 570180 (2020). 310030B_182833 to B.K.; Newton-Mosharafa Fund award 332408563 to A.F.E. and 60. Ghosh, S. et al. Speed breeding in growth chambers and glasshouses for crop B.B.H.W.; a JIC Institute Development Grant to B.B.H.W; Agriculture Development Fund breeding and model plant research. Nat. Protoc. 13, 2944–2963 (2018). of the Saskatchewan Ministry of Agriculture project 20180095 to G.S.B. and H.R.K.; 61. Zadoks, J. C., Chang, T. T. & Konzak, C. F. A decimal code for the growth Saskatchewan Wheat Development Commission to G.S.B. and H.R.K.; Alberta Wheat stages of cereals. Weed Res. 14, 415–421 (1974). Development Commission to G.S.B. and H.R.K.; Manitoba Crop Alliance to G.S.B. 62. R Core Team. R: A Language and Environment for Statistical Computing (R and H.R.K.; Government of Saskatchewan Ministry of Agriculture to P.H.; European Foundation for Statistical Computing, 2017). Research Council award ERC-2016-STG-716233-MIREDI to K.K.; a Consejo Nacional 63. Covarrubias-Pazaran, G. Genome-assisted prediction of quantitative traits de Ciencia y Tecnología scholarship to J.Q.-C.; JIC International Scholarships to J.Q.-C. using the R package sommer. PLoS ONE 11, e0156744 (2016). and S.G.; Monsanto’s (now Bayer) Beachell-Borlaug International Scholars’ Program 64. Wicker, T. et al. The wheat powdery mildew genome shows the unique fellowship to S.G.; 2Blades Foundation to S.G. and B.B.H.W.; John Innes Foundation evolution of an obligate biotroph. Nat. Genet. 45, 1092–1096 (2013). to J.W.; European Union’s Horizon 2020 research and innovation programme Marie NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Skłodowska-Curie grant agreement 674964 to N.K., B.B.H.W. and C.U.; JIC Science performed genome-wide LD analysis. S.M.K., K.G. and L.G. performed GWAS control For Africa Initiative to N.K.; The Royal Society award UF150081 to S.A.B.; Australian experiments. J.Q.-C. and C.U. interpreted Ae. tauschii trait–genotype relationships for Research Council award DP210103744 to S.A.B.; a Università di Bologna scholarship to flowering time, S.P. and S. Arora for trichomes, S.A.B, J.W., K.G. and J.L. for spikelets, A.P.; Innovation Fund Denmark award 4105-00022B to M.P. and A.F.J.; Jewish National J.S.-M., S. Arora and K.G. for powdery mildew and P.S., L.G. and S. Arora for wheat curl Fund of Australia to R.A. and A.S.; Ministry of Education and Culture of the Republic mite. K.G. determined gene level and K.G., P.S., S. Arora and L.G. determined haplotype of Indonesia and the Austrian Agency for International Cooperation in Education and distribution in Ae. tauschii and wheat. K.G. estimated genetic diversity captured by wheat Research (OeAD-GmbH) in cooperation with ASEA-UNINET to R.P.K.; Department of landraces and synthetic wheats. S. Arora and J.S.-M. annotated WTK4. J.S.-M. and V.W. Biotechnology, India award BT/PR30871/BIC/101/1159/2018 to N. Sandhu and award determined WTK4 gene structure and/or performed functional analysis. R.H. and A.B. BT/IN/Indo-UK/CGAT/14/PC/2014-15 to P.C.; Science and Technology Development generated synthetic wheats. S.L. and J.C.R. developed wheat germplasm for curl mite Fund, Egypt-UK Newton-Mosharafa Institutional Links award 30718 to A.F.E. and resistance. S. Arora annotated SrTA1662, M.A.S. and S. Arora designed and engineered B.B.H.W. National Science Foundation of China grants 91731305 and 31661143007 to binary constructs, S.H. and W.H. transformed wheat and N.K., M.P., A.F.J. and S. L.M.; Knowledge Innovation Program of Chinese Academy of Agricultural Sciences Arora phenotyped transgenics. S. Arora, K.G., J.S.-M., P.S., C.G., T.L., B.B.H.W. and J.P. award CAAS-DRW202002 to LM and the breeding companies KWS, Limagrain, designed figures. K.G., S. Arora, P.S., J.S.-M., L.G., G.S.B., C.C., C.U., M.M., A.R.B., B.K., Syngenta and Bayer to the Open Wild Wheat Consortium. J.P. and B.B.H.W. conceived and designed experiments. B.B.H.W., K.G., P.S., J.S.-M., R.H., S. Arora, J.P., D.G., R.A., L.G., C.G., N. Sandhu, A.P., S.H., M.S., M.P., C.U., M.M., B.K., Author contributions K.F.X.M. and A.S. drafted the manuscript. B.B.H.W., J.P. and B. Steuernagel conceived, S. Arora, M.F.-M., C.G., N. Singh, W.J.R., N.C., S.G., A.N.H., T.O. and J.L. configured, founded and/or managed OWWC. All authors read and approved the manuscript. bulked and/or distributed Ae. tauschii germplasm. A.A.M. and F.Y.N. collected and curated new Ae. tauschii accessions from Iran and Tajikistan, respectively. M.F.-M., S.W., Competing interests J.L., A.P., S. Arora and G.Y. extracted plant DNA, and A.F.E. and S. Arora extracted plant K.G. and B.B.H.W. are inventors on UK patent application PC931335GB, and S. Arora, RNA. S.W. prepared DNA libraries. B.B.H.W., J.P., J.F., G.B.-G., S.X., P.C., K.K., A.S., B. Steuernagel and B.B.H.W. are inventors on PCT/US2019/013430; these patents are E.L., J.D., M.-C.L., K.F.X.M., A.R.B., B.J.S. and V.K.T. acquired DNA sequences. K.G., based on part of the work presented here. The remaining authors declare no competing J.C., M.M., S. Arora, B. Steuernagel, L.G., S.T., X.B., R.P.D., M.S. and L.S. undertook interests. sequence data curation, back-up and/or distribution. L.G. and K.G. performed variant calling and filtering, L.G., S. Arora and K.G. performed Ae. tauschii redundancy analysis and K.G. performed the heterogeneity analysis. K.G. and M.M. assembled genomes Additional information of TOWW0112, A.L., L.M. and D.-C.L. assembled genomes of TOWWC0106 and Extended data is available for this paper at https://doi.org/10.1038/s41587-021-01058-4. K.G. assembled the diversity panel. T.L., S. Artmeier and K.F.X.M. performed genome Supplementary information The online version contains supplementary material annotations. K.G., S. Arora and J.C. performed genome-wide phylogenetic analysis. available at https://doi.org/10.1038/s41587-021-01058-4. K.G. characterized L3 and discovered its contribution to wheat. S. Arora performed the Correspondence and requests for materials should be addressed to FST analysis. N.K., S. Arora, O.M. and B.J.S. phenotyped Ae. tauschii accessions for stem Alison R. Bentley, Beat Keller, Jesse Poland or Brande B. H. Wulff. rust, J.Q.-C., J.S., C.U., B. Steiner, R.P.K. and H.B. phenotyped flowering time, C.C., S.P. and P.N. phenotyped trichomes, G.S.B., H.R.K. and P.H. phenotyped spikelets, J.S.-M. Peer review information Nature Biotechnology thanks Rudi Appels and the other, phenotyped powdery mildew and P.S. phenotyped wheat curl mite. K.G. established anonymous, reviewer(s) for their contribution to the peer review of this work. k-mer GWAS methodology and discovered candidate genes. K.G. and S. Arora. Reprints and permissions information is available at www.nature.com/reprints. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 1 | Configuration and genetic structure of the Aegilops tauschii diversity panel used in this study. a, Geographical distribution of 242 Ae. tauschii accessions. Filled squares and circles represent accessions sequenced as part of this study, while accessions represented by unfilled squares and circles were not sequenced. Accessions highlighted in green were used as D genome donors to generate synthetic hexaploid wheat (ShW) lines. Three accessions outside of the map, one from Turkey and two from china, are indicated by white arrow heads. AFG, Afghanistan; ArM, Armenia; AZE, Azerbaijan; chN, china; GEO, Georgia; IrN, Iran; IrQ, Iraq; KAZ, Kazakhstan; KGZ, Kyrgyzstan; PAK, Pakistan; SYr, Syria; TJK, Tajikistan; TUr, Turkey; TKM, Turkmenistan; UZB, Uzbekistan. The Fertile crescent follows the shaded area in Fig. 1 of harlan and Zohary (1966) and is bound by the Mediterranean in the west, by chains of large and high mountain ranges in the north and east (the Amanos in northwestern Syria, the Taurus in southern Turkey, Ararat in north-eastern Turkey and the Zagros in western Iran), and in the south by the Syrio-Arabian desert, with its western extension (for example, Paran desert) in the Sinai Peninsula. b, Identification of non-redundant Ae. tauschii accessions using KASP markers on 195 accessions and: c, 100,000 random SNPs obtained from whole genome shotgun sequencing of 306 accessions. The vertical red line in both histogram similarity plots indicates the redundancy cut-off at which the peak of the high similarity values is clearly separated from the rest. d, Identification of Ae. tauschii accessions with minimal residual heterogeneity. The histogram of heterozygosity scores was generated using all the bi-allelic SNPs obtained from whole genome shotgun sequencing of 305 accessions (excluding TOWWc0193). The vertical red line indicates the cut-off at which the cluster of the low heterozygosity values is clearly separated. e, ΔK plot for a STrUcTUrE run with 10 randomly selected accessions each of L1 and L2 along with the five accessions of the putative L3 and the control L1-L2 rIL. f, Principal component Analysis with the same set of accessions as used in panel a. The recombinant inbred control line is indicated by r. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Extended Data Fig. 2 | See next page for caption. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 2 | Fraction of lineage-specific k-mers in non-overlapping 100 kb windows of Chromosome 1D for the 11 wheat genome assemblies. For the nine modern cultivars21, only those k-mers were considered which were also present in the short-read sequences of 28 hexaploid wheat landraces17. chromosomes are colored according to their Ae. tauschii lineage-specific origin as displayed in Fig. 1. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Extended Data Fig. 3 | Lineage-specific origin of extant wheat D-subgenomes. chromosomes 2D-7D of 11 wheat cultivars colored according to their Ae. tauschii lineage-specific origin as in Fig. 1f. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 4 | Optimization of k-mer GWAS with the positive controls Sr45 and Sr46. Blue/red dots on the y-axis represent one or more k-mers significantly associated with resistance/susceptibility, respectively, to Puccinia graminis f. sp. tritici isolate 04KEN156/04 (race TTKSK) across the diversity panel. Definition of association score, threshold, and dot size (which is proportional to the number of k-mers having the specific value on the y-axis), is as in Fig. 2. a, Significantly associated k-mers mapped to AL8/78 which is susceptible to TTKSK. The peaks marked Sr45 and Sr46 contain the non-functional (not providing resistance to TTKSK) alleles of Sr45 and Sr46. The x-axis represents the seven chromosomes of Ae. tauschii reference accession, AL8/78. Each dot column represents a 10 kb interval. b, Significantly associated k-mers mapped to the unordered de novo assembly of TOWWc0112 (N50 1.1 kb), an Ae. tauschii accession resistant to TTKSK. Each dot-column on the x-axis represents an unordered contig from the de novo assembly. c, Significantly associated k-mers mapped to the same assembly of TOWWc0112 as in (b), but now each contig has been ordered by anchoring to the reference genome of AL8/78 (x-axis). d, Association mapping with an improved TOWWc0112 assembly (N50 196 kb) anchored to the AL8/78 reference genome (x-axis). NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Extended Data Fig. 5 | impact of sequencing coverage on the power to detect the positive controls, Sr45 and Sr46. Sequencing coverage was artificially reduced by sub-sampling the original 10-fold coverage sequencing reads and mapping associated k-mers to AL8/78. Definition of association score, threshold, and dot size is as in Fig. 2. a, Plot obtained with 7.5-fold coverage (compare with 10-fold coverage in Extended Data Fig. 7a). b, Plot obtained with 5-fold coverage. c, Plot obtained with 3-fold coverage. d, Plot obtained with 1-fold coverage. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 6 | k-mers significantly associated with FLOWERING LOCUS T1 and SrTA1662 identified by GWAS. Definition of association score, threshold, and dot size is as in Fig. 2. a, resistance to Puccinia graminis f. sp. tritici isolate UK-01 maps to the SrTA1662 locus. The peak indicated by the arrow contains the region delimited by the SrTA1662 LD block obtained with P. graminis f. sp. tritici race QThJc. b, Biological replicate 2 and c, biological replicate 3 for flowering time identify FLOWERING LOCUS T1. The associated k-mers were mapped to the Aegilops tauschii AL8/78 reference genome where they define a peak similar to that in Fig. 2b. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Extended Data Fig. 7 | Wheat curl mite (WCM) symptoms in Aegilops tauschii and introgression of WCM resistance into wheat. a, Phenotype scale used to characterize Ae. tauschii response to WcM infestation. Symptoms used were leaf trapping and leaf curliness. The visual scale ranged from 0 to 4, with 0 equivalent to no symptoms and 1 to 4 denoting increasing levels of curliness or trapped leaves indicative of susceptibility. b, Delineation of Ae. tauschii Lineage 1 accession TA2397 carrying wheat curl mite resistance introgressed into wheat line KS96WGrc40. The retained polymorphic markers were obtained by pairwise comparisons of the Ae. tauschii donor with the corresponding wheat line. KS96WGrc40 is the original line where Cmc4 was mapped. c, The donor of resistance in wheat line TAM 112 is the Lineage 2 accession TA1618. Wheat lines TAM 115 and TAM 204 are both resistant through TAM 112. The black vertical line indicates the Cmc4 position. The three grey dashed vertical lines denote the size of the introgressed fragments, 7.9 Mb, 11.9 Mb, and 41.5 Mb, in the wheat lines TAM 115, TAM 112 and TAM 204, and KS96WGrc40, respectively. SNP density is based on number of SNPs within 1 Mb bins. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 8 | See next page for caption. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Extended Data Fig. 8 | Genome-wide decay of linkage disequilibrium (LD) in Aegilops tauschii. Genomic regions (r1, r2a, c, r2b, r3) in L2 (top) and L1 (bottom) were determined based on the distribution of the recombination rate in T. aestivum cv. chinese Spring. The distance at which r2 for a region drops below 0.1 is highlighted. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 9 | See next page for caption. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology NaTurE BioTEcHNoloGy Articles Extended Data Fig. 9 | Analysis of powdery mildew resistance in Aegilops tauschii and durum donors and their derived synthetic hexaploid wheat lines. a, Top, disease reactions to Blumeria graminis f. sp. tritici Bgt96224 are displayed for the Ae. tauchii accessions Ent-079, Ent-080, Ent-085 and Ent-102. Bottom, disease reactions to Bgt96224 are displayed for the corresponding synthetic hexaploid lines (NIAB_144, derived from Ent-079; NIAB_088 derived from Ent-080; NIAB_149 derived from Ent-085; and NIAB_090 derived from Ent-102) using the tetraploid durum wheat donor line hoh-501, which is highly susceptible to Bgt96224. Each Ae. tauschii and its corresponding synthetic hexaploid line was not inoculated with BSMV (Ø) or with a BSMV construct as empty vector (EV) or targeting for silencing the WTK4 exon 8 (target 1, T1) or exon 10 (target 2, T2), respectively, and then super-infected with Bgt96224. b, Alternative splicing of WTK4. Alternative splicing variants (SV1-7) revealed by sequencing 51 WTK4 cDNAs. At the top, in black, is shown the splicing variant SV01, which encodes the complete WTK4 protein. Below SV01, six aberrant alternative splicing variants (SV02 to SV07) are shown in in grey. The number of clones identified for each SV is identified in parenthesis. Diamond arrowed red lines point to the first stops codons at the protein level. NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology Articles NaTurE BioTEcHNoloGy Extended Data Fig. 10 | The Aegilops tauschii stem rust resistance gene SrTA1662 maintains race specificity as a transgene in wheat. The SrTA1662 gene was transformed into the stem rust susceptible wheat cultivar Fielder. Shown are T2 generation lines selected to be homozygous for the transgene or to be non-transgenic segregants. a, Inoculation with isolate IT200a/18 (race TKKTF). b, Inoculation with isolate IT16a/18 (race TTrTF). c, Inoculation with isolate ET11a/18 (TKTTF). d, Inoculation with isolate KE184a/18 (Kenya). Numbering refers to 1 = DPrM0050 (null of DPrM0051), 2 = DPrM0051, 3 = DPrM0059, 4 = DPrM0062 (null of DPrM0059), 5 = DPrM0071, 6 = DPrM0072 (= null of DPrM0071) (see Supplementary Table E). NATURE BiOTECHNOLOGY | www.nature.com/naturebiotechnology