Diversity and Distributions. 2020;00:1–13.  |  1wileyonlinelibrary.com/journal/ddi Received: 24 June 2019  |  Revised: 31 January 2020  |  Accepted: 4 February 2020 DOI: 10.1111/ddi.13046 B I O D I V E R S I T Y R E S E A R C H A gap analysis modelling framework to prioritize collecting for ex situ conservation of crop landraces Julian Ramirez-Villegas1,2  | Colin K. Khoury1,3,4  | Harold A. Achicanoy1 | Andres C. Mendez1 | Maria Victoria Diaz1 | Chrystian C. Sosa1 | Daniel G. Debouck1 | Zakaria Kehel5 | Luigi Guarino6 This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Diversity and Distributions Published by John Wiley & Sons Ltd. 1International Center for Tropical Agriculture (CIAT), Cali, Colombia 2CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS), c/o CIAT, Cali, Colombia 3United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO, USA 4Department of Biology, Saint Louis University, St. Louis, MO, USA 5International Center for Agricultural Research in the Dry Areas (ICARDA), Rabat, Morocco 6Global Crop Diversity Trust, Bonn, Germany Correspondence Julian Ramirez-Villegas, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali-Palmira, 763537, Cali, Colombia. Email: j.r.villegas@cgiar.org Editor: Martin Jung Abstract Aim: The conservation and effective use of crop genetic diversity are crucial to over- come challenges related to human nutrition and agricultural sustainability. Farmers’ traditional varieties (“landraces”) are major sources of genetic variation. The degree of representation of crop landrace diversity in ex situ conservation is poorly under- stood, partly due to a lack of methods that can negotiate both the anthropogenic and environmental determinants of their geographic distributions. Here, we describe a novel spatial modelling and ex situ conservation gap analysis modelling framework for crop landraces, using common bean (Phaseolus vulgaris L.) as a case study. Location: The Americas. Methods: The modelling framework includes five main steps: (a) determining relevant landrace groups using literature to develop and test classification models; (b) model- ling the potential geographic distributions of these groups using occurrence (landrace presences) combined with environmental and socioeconomic predictor data; (c) cal- culating geographic and environmental gap scores for current genebank collections; (d) mapping ex situ conservation gaps; and (e) compiling expert inputs. Results: Modelled distributions and conservation gaps for the two genepools of com- mon bean (Andean and Mesoamerican) were robustly predicted and align well with expert opinions. Both genepools are relatively well conserved, with Andean ex situ collections representing 78.5% and Mesoamerican 98.2% of their predicted geo- graphic distributions. Modelling revealed additional collection priorities for Andean landraces occur primarily in Chile, Peru, Colombia and, to a lesser extent, Venezuela. Mesoamerican landrace collecting priorities are concentrated in Mexico, Belize and Guatemala. Conclusions: The modelling framework represents an advance in tools that can be deployed to model the geographic distributions of cultivated crop diversity, to as- sess the comprehensiveness of conservation of this diversity ex situ and to highlight geographic areas where further collecting may be conducted to fill gaps in ex situ conservation. 2  |     RAMIREZ-VILLEGAS Et AL. 1  | INTRODUC TION The effective use of crop genetic resources—including both traditional farmer varieties (or “landraces”) and wild relatives—is important in ef- forts to overcome challenges related to human nutrition and agricul- tural sustainability (Burke, Lobell, & Guarino, 2009; Esquinas-Alcázar, 2005; Khoury et al., 2016). Progress in plant breeding and crop diversi- fication is dependent on crop understanding and utilizing the available genetic resources (Glaszmann, Kilian, Upadhyaya, & Varshney, 2010; Hajjar & Hodgkin, 2007). The erosion of genetic diversity within many common crops has occurred over the last century through a combina- tion of land use change, habitat degradation and the ongoing adoption of improved crop varieties or the substitution of crop species by farm- ing communities (Hoisington et al., 1999; van de Wouw, Kik, Hintum, Treuren, & Visser, 2010). In some crops, only a fraction of the genetic diversity once present is still found today in farmers’ fields, for example wheat landraces in the Fertile Crescent (Gepts, 2006; Harlan, 1975). Consequently, ex situ crop genebanks have become essential not only for distributing of genetic resources to various users (e.g. breeders, other genebanks), but also for their conservation of such resources (Gepts, 2006; Hoisington et al., 1999). Understanding the representation of crop diversity in ex situ re- positories provides a foundation for conservation planning (Castañeda- Álvarez et al., 2016; García, Parra-Quijano, & Iriondo, 2017; van Treuren, Engels, Hoekstra, & Hintum, 2009). Methods to assess the current degree of representation, and to inform further collecting ef- forts, have increasingly been developed for over more than a decade [e.g. Rodrigues et al., (2004); Maxted, Dulloo, Ford-Lloyd, Iriondo, and Jarvis (2008)]. Due to the general lack of genetic data, these methods are generally based on ecogeographic methodologies as a proxy for assessments of genetic diversity (Khoury et al., 2019; Ramirez-Villegas, Khoury, Jarvis, Debouck, & Guarino, 2010). Such methods have proved useful in estimating the representation of wild relatives and other wild species in genebanks in comparison with standing extant diversity in their natural environments (Castañeda-Álvarez et al., 2016; Khoury et al., 2019; Syfert et al., 2016). However, their application to cultivated plants, whose spatial distributions are determined by anthropogenic factors as well as environmental drivers, is limited (Fuller, 2007; Hilbert et al., 2017; Morris et al., 2013). This represents a critical gap, since cul- tivated materials are generally preferred over wild relatives for use by plant breeders (Camacho Villa, Maxted, Scholten, & Ford-Lloyd, 2005; Hammer, Knüpffer, Xhuveli, & Perrino, 1996). Here, we present a conservation gap analysis modelling frame- work for cultivated crop diversity, that improves on current ecogeo- graphic methods, using landraces of the common bean (Phaseolus vulgaris L.) as a case study. As opposed to previous analyses of the distributions of cultivated crop diversity [e.g. Upadhyaya, Reddy, Irshad Ahmed, and Gowda (2012), Upadhyaya et al. (2017)], our methods explicitly aim to include anthropogenic drivers in the modelling of the distributions of landraces. The results predict geo- graphic areas that are likely gaps in ex situ landrace conservation col- lections and provide metrics that can be used to track conservation progress. These results are supplemented with expert knowledge, which is vital for elucidating spatial patterns and drivers of range change that are difficult to model. Common bean is the most widely human-consumed grain legume, playing an essential role in food and nutritional security, particularly in Latin America and Sub-Saharan Africa (Beebe, 2012; Broughton et al., 2003). Two independent domestication events of wild P. vul- garis have been identified—one in Mexico and Central America, and the second in the Andes mountains of South America (Gepts, Osborn, Rashka, & Bliss, 1986). Significant movement of genetic material and gene exchange between genepools has occurred since domestication, with considerable overlap in current geographic dis- tributions, both in the Neotropics and across other major cultivation areas (Singh, 1989; Singh, Gepts, & Debouck, 1991). These processes have resulted in recognized secondary regions of diversity in Brazil, Europe, Africa and Asia (Escribano & De Ron, 1991; Lobo Burle et al., 2011; Logozzo et al., 2007). Globally, there are some 250 ex situ collections of cultivated P. vulgaris, with the largest and most diverse maintained at the International Centre for Tropical Agriculture (CIAT) with ~40,000 accessions, and the United States Department of Agriculture (USDA) National Genetic Resources Program with ~15,000 accessions (Debouck, 2014). Here, we assess the representation of common bean landraces in such major genebank collections, including esti- mating overall conservation and identifying gaps. 2  | MATERIAL S AND METHODS Our modelling framework first necessitates the defining of the study area, gathering of landrace occurrence and characterization data, and compilation of environmental and socioeconomic spatial predic- tor information. The modelling and conservation gap analysis is then performed, consisting of five main steps: (a) determining relevant landrace groups using the literature to develop and test classifica- tion models; (b) modelling the potential geographic distributions of these groups using the occurrence and predictor data; (c) calculat- ing geographic and environmental gap scores for current genebank collections; (d) mapping ex situ conservation gaps; and (e) compiling expert inputs. The overall process is depicted in Figure 1. 2.1 | Study area Crop landraces have been defined as “dynamic population(s) of a cultivated plant that has historical origin, distinct identity and K E Y W O R D S common bean, crop diversity, gap analysis, landrace, plant genetic resources      |  3RAMIREZ-VILLEGAS Et AL. lacks formal crop improvement, as well as often being genetically diverse, locally adapted and associated with traditional farming systems” (Camacho Villa et al., 2005; Casañas, Simó, Casals, & Prohens, 2017). A landrace can be further classified as autochtho- nous when grown in the original location where it developed its unique genetic and socioeconomic characteristics through grower selection and allochthonous when introduced from another region and then locally adapted. “Secondary” landraces may also be rec- ognized, developed by the formal plant breeding sector but now maintained through repeated farmer selection and seed saving (Zeven, 1998). While landraces cultivated over time in any given location may possess novel traits useful for plant breeding, our distribution modelling method rests on the premise that these varieties have distinct, local environmental adaptations (see 2.4.1–2.4.2). As ad- aptation to environment is developed over time, the geographic areas where landraces have occurred the longest—the origins and primary regions of diversity—would be considered to have the most significant association between environmental adaptation and genetic variation (Khoury et al., 2016). For this reason, land- race distribution modelling may focus foremost on autochthonous ranges. For our case study, we focused on the Americas as the centre of domestication and primary region of diversity for P. vulgaris (Gepts et al., 1986). We included all areas extending from the southern United States to central Chile and northern Argentina, including the Caribbean, as this broadly includes the two reported domesti- cation events and distributions of the progenitor and close relatives of the species (Chacon, Pickersgill, & Debouck, 2005; Gepts et al., 1986). We also included Brazil since it is geographically close to the putative regions of domestication and because existing evidence suggests clear relationships between Brazilian bean landraces and Andean and Mesoamerican types (Lobo Burle et al., 2011; Lobo Burle, Fonseca, Kami, & Gepts, 2010). 2.2 | Landrace occurrence and characterization data Our distribution modelling and conservation gap analysis model- ling framework requires geographic occurrence (presence) data for landraces and information on the locations where these lan- draces have been previously collected for conservation ex situ, as well as characterization data on the landrace accessions. To assess the world's common bean landrace collections, we com- piled available genebank accession-level passport (i.e. site where collected) data from major online germplasm databases, including the Genesys plant genetic resources portal (Global Crop Diversity Trust, 2019) and the United Nations Food and Agriculture Organization World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture (WIEWS) (FAO, F I G U R E 1   Conservation gap analysis modelling framework implemented in this study 4  |     RAMIREZ-VILLEGAS Et AL. 2019). To ensure inclusion of the crop's major germplasm col- lections, we specifically gathered occurrence and characteriza- tion data from the CIAT database (CIAT, 2018), freely available at and from the United States Department of Agriculture (USDA) Genetic Resources Information Network (GRIN)–Global (USDA ARS NPGS, 2018). Additional occurrences were gathered from the Global Biodiversity Information Facility (GBIF) (GBIF.org, 2019), which contained 25,670 observations from herbaria, botanic gardens and other plant repositories, to provide independent data from non-genebank sources. We compiled the datasets into a single database and performed a thorough quality check of all records. Duplicated observations were eliminated with preference to main- tain original data, for example, USDA-GRIN or CGIAR records in- cluded in Genesys or WIEWS were discarded. Coordinates were corrected, or if not possible, eliminated, when latitude and longi- tude were equal to zero, located in inland water bodies or in the ocean, located in the wrong country, had an inverted sign in the latitude and/or longitude or had low coordinate precision (i.e. with less than 2 decimal places). Our full occurrence dataset for P. vul- garis is available in Dataset S1. 2.3 | Spatial predictors With the aim of compiling a robust global dataset of important en- vironmental and anthropogenic drivers of the geographic distribu- tions of crop landraces, we gathered and/or calculated spatially explicit (gridded) information for a total of 50 potential predictors, including climate, topography, diversity and domestication and so- cioeconomic variables (Table S2.1). For climate, we used a total of 40 variables, derived from a combination of the WorldClim version 2 (Fick & Hijmans, 2017) and the Environmental Rasters for Ecological Modelling (ENVIREM) (Title & Bemmels, 2018) databases. We in- cluded topography from the Shuttle Radar Topography Mission (SRTM) dataset of the CGIAR-Consortium on Geospatial Information (CSI) portal (Jarvis, Reuter, Nelson, & Guevara, 2008; Reuter, Nelson, & Jarvis, 2007). Two crop genetic diversity and domestication proxy variables were included, namely the distance to known common bean wild relative populations and the distance to human settle- ments before year AD 1500. Regarding socioeconomic variables (8 in total), we included datasets on the geographic distribution of ethnic groups (Weidmann, Rød, & Cederman, 2010); crop yield, har- vested area and crop production quantity (You et al., 2017); popula- tion density (CIESIN, 2018); population accessibility (Nelson, 2008); distance to navigable rivers (Natural Earth, 2019); and percentage of area under irrigation (Siebert, Henrich, Frenken, & Burke, 2013). All spatial predictor data were scaled to or computed on a common 2.5 arc-min grid, using the geographic coordinate system (GCS) with WGS84 as datum. A complete description of these data sources and their justification for inclusion is provided in Text S2.1 and Table S2.1. The full dataset of ecogeographic and socioeconomic variables is available in Dataset S1. 2.4 | Landrace distribution modelling and conservation gap analysis 2.4.1 | Determination of landrace groups Crop landraces are domesticated, locally adapted varieties of crops, developed through farmer selection over time in specific agricultural ecosystems (Camacho Villa et al., 2005; Jones et al., 2008) and, for most crops, are considered to number in the thousands (Harlan, 1975; Jones et al., 2008). Crop landraces are associated with specific local adaptation traits and farmer preferences, and an understand- ing of these drivers is important to modelling their potential distri- butions. Given the large number of landraces and the knowledge necessary to distinguish their biocultural and ecological differences, our method seeks a compromise between the recognition of this complexity and performance of spatial modelling at scales which are feasible and permit comparison with existing genebank collections. Therefore, the first step of our modelling method was to identify recognized groups within the crop that could be tested for whether they have distinct environmental and socioeconomic niches. We used Google Scholar™ to identify and review publications that, through morphological, physiological, chemical, genetic, nomencla- tural or other characters, establish or propose groups of landraces (e.g. by identifying genepools, races, domestication centre(s), ge- netic clusters or other acknowledged groupings) (Table S2.2). We then used classification models to test the significance of these classifications. The classification models allowed us to de- termine whether the classes identified could be predicted on the basis of the spatial predictors from Section 2.3. This process used data from the occurrence database (if the distinguishing characters of the identified landrace groups were reported in the database) or from training datasets containing both characters and geographic coordinates, compiled from the literature review. For this analysis, we used random forest (RF) (Pal, 2005), support vector machine (SVM) (Meyer, Leisch, & Hornik, 2003), K-nearest neighbour (KNN) (Guo, Wang, Bell, Bi, & Greer, 2003) and artificial neural networks (ANN) (Dreiseitl & Ohno-Machado, 2002). The response variable in all models was the group in which a given accession was assigned, whereas the explanatory variables were the spatial predictors. Models were combined into an ensemble using the mode (i.e. the most frequent predicted value amongst models) and tested using 15-fold cross-validation (80% training, 20% testing). We accepted a given classification if each of its classes was predicted with an av- erage cross-validated accuracy of at least 80% (i.e. 8 of every 10 ac- cessions are predicted correctly). Finally, we used the trained models to predict the corresponding class for any records in the database missing such information. 2.4.2 | Modelling landrace geographic distributions The objective of this step was to develop a Landrace Distribution Model (LDM) which describes the probability of occurrence of the      |  5RAMIREZ-VILLEGAS Et AL. landrace groups derived from Section 2.4.1. To predict the probability of occurrence for each landrace group, we fitted a MaxEnt model (Elith et al., 2010; Phillips, Anderson, & Schapire, 2006) using the “maxnet” R package (Phillips, Anderson, Dudík, Schapire, & Blair, 2017). We chose MaxEnt as a standard and very commonly used tool for species dis- tribution modelling (Costa, Nogueira, Machado, & Colli, 2010; Elith et al., 2006). MaxEnt has been demonstrated to yield robust results when compared with other species distributions modelling algorithms (Barbet-Massin, Jiguet, Albert, & Thuiller, 2012; Elith et al., 2006; Giovanelli, Siqueira, Haddad, & Alexandrino, 2010). Variables used in the model were sub-selected from the envi- ronmental and socioeconomic predictors using a combination of the variance inflation factor (VIF) and a principal component analysis (PCA) to control for unwarranted model complexity and collinearity between explanatory variables (Warren & Seifert, 2011). We first removed any variables that did not contribute significantly (defined as contributing <15% to the first component) to the variance in the PCA and then discarded any variables with a VIF greater than 10 (Braunisch et al., 2013). The list of variables selected (or alternatively eliminated) for use in modelling are available in Table S2.1. We tried different model configurations (i.e. only climate, only non-climate and both) but present only the best-performing one (i.e. where all variables are used). Other results are presented in Text S2.2. Background points (pseudo-absences) were generated based on the three-step method of Senay, Worner, and Ikeda (2013). In short, we took a random sample of pseudo-absences from areas that (a) were within the same ecological land units [as reported by Sayre et al. (2014)] as the occurrence points, (b) were deemed as poten- tially suitable according to a support vector machine (SVM) classifier that uses all occurrences and predictor variables and (c) were further than 5 km from any occurrence. The number of pseudo-absences drawn was equivalent to 10 times the total number of unique occur- rences for a given landrace group. MaxEnt models were fitted through a fivefold (K = 5) cross-valida- tion process in which 80% of the occurrences (and pseudo-absences) were used to train the models, and the remaining 20% were used for testing. For each fold, we calculated the area under the receiving op- erating characteristic curve (AUC), sensitivity, specificity and Cohen's kappa as measures of model performance. To create a single prediction that represents the probability of occurrence for the landrace group, we computed the median across models. Finally, any areas above the probability value at the maximum sum of sensitivity and specificity were considered the final Landrace Distribution Model (LDM). 2.4.3 | Calculating geographic and environmental gap scores We developed three scores that compare the geographic and en- vironmental diversity in existing ex situ conservation collections against the LDM, revealing ex situ conservation gaps. The accession connectivity score (SCON) was formed with Delaunay triangulation (Lee & Schachter, 1980), that is, triangles linking every three (closest) accession occurrence locations, using the “deldir” R package (Turner, 2019). For each 2.5 arc-min pixel within each Delaunay triangle, we computed SCON following Equation 1. where, AT−i is the area of the triangle (km 2) where the pixel is located (i.e. the i-th triangle), max (AT−i, … AT−n) is the area of the largest trian- gle amongst all triangles, DC−i is the Euclidean distance from the pixel to the centroid of the triangle where it is located, normalized by the longest distance (using all pixels) within the given triangle, DNV−i is the Euclidean distance from the pixel to the nearest vertex of the triangle where it is located, normalized by the longest distance (using all pixels) within the given triangle. From Equation 1, it is clear that SCON for any given pixel is largest (i.e. increases the likelihood of gaps) when the triangle is large (i.e. high area), when the pixel is close to the centroid of the triangle (i.e., where there are no accessions) and when the distance to the vertices (where the accessions are located) is high. The accession accessibility score (SACC) was calculated by com- puting travel time from each pixel within the LDM to the nearest genebank accession, following Weiss et al. (2018). Travel time was in this case estimated through a product of the distance and the speed of travel (defined by a friction surface). Once the travel time from each location was computed, it was normalized by dividing pixel val- ues by the longest travel time within the LDM, to derive a metric in the range 0–1, with high values reflecting long travel time. The environmental score (SENV) measures how well the environ- ments where the landraces are distributed are represented in ex situ collections. We first performed a hierarchical clustering anal- ysis (Ward's method) for the pixels in the LDM using the predictor variables used to construct the LDM. On a per cluster basis, we computed the Mahalanobis distance between each pixel and the en- vironmentally closest germplasm accession. The distance was finally normalized (0–1), with high values indicative of large distances to sites with similar environments that have previously been collected for ex situ conservation. 2.4.4 | Mapping ex situ conservation gaps Spatial ex situ conservation gaps were calculated from the conser- vation gap scores using a cross-validation procedure to derive a threshold for each landrace group and each of the gap scores (SCON, SACC, SENV). To do so, we created synthetic (artificial) gaps by remov- ing genebank occurrences in five randomly chosen circular areas of 100 km radius within the LDM. We then tested whether these syn- thetic gaps could be predicted by our method and determined the threshold value of each gap score that would maximize the predic- tion of these synthetic gaps. Performance for each of the five syn- thetically created gaps was assessed using the AUC, sensitivity and specificity. Finally, the average threshold value of each gap score, (1)SCON= AT−i max ( AT−i,⋯ ,AT−n ) ∗ ( 1−DC−i ) ∗DNV−i 6  |     RAMIREZ-VILLEGAS Et AL. maximizing the prediction of the synthetic gaps (balanced with mini- mizing false positives), was used to discretize the gap score datasets into areas with a high priority for further collecting (areas with gap score above the threshold, assigned a value of 1) as opposed to rela- tively well-conserved areas (areas with gap score below the thresh- old, assigned a value of 0). We then summed the three binary gap score maps, resulting in a map with values from 0 to 3. Areas with a value of 0 indicate that there are no accession connectivity, accessibility or environmental gaps (i.e. well-conserved areas); areas with a value of 1 indicate gaps exist due any of accession connectivity, accessibility or environment (low confidence gaps); areas with a value of 2 indicate gaps exist due to two metrics (medium confidence gaps), and values of three indi- cate gaps for all metrics (highest confidence gaps). We termed this 3-value area our “final gaps map.” Once the final gaps map was calculated, we estimated the cov- erage of existing germplasm collections. The coverage is simply the area considered as gap divided by the total area of the LDM. We compute only the coverage resulting from the agreement of the three gap metrics, as an upper-level coverage estimation. 2.4.5 | Compilation of expert inputs Gap analysis is a tool for assessing collection completeness as well as to plan collecting (García et al., 2017; Marinoni, Bortoluzzi, Parra- Quijano, Zabala, & Pensiero, 2015). Collecting based on model pre- dictions may require extensive discussion with local institutions and crop experts including botanists, collectors, agronomists and breed- ers. This is because agricultural landscapes are highly dynamic, and areas predicted with gaps may have been subject to recent land use change, varietal replacement by improved or foreign material or sig- nificant genetic drift, resulting in loss of uncollected genetic material predicted to be of value (Hammer et al., 1996; van Heerwaarden, Hellin, Visser, & Eeuwijk, 2009; van de Wouw et al., 2010). This means that while the “final gaps map” resulting from Section 2.4.4 provides a detailed regional picture of collecting priorities, the plan- ning of collecting missions will effectively require discussion with experts and further analysis (Greene et al., 1999a,1999b; Jarvis et al., 2005). In this sense, gap analysis results are a discussion support tool that aims at guiding, rather than prescribing where and how collect- ing may be done. Here, we illustrate this by conducting a semi-struc- tured interview process with two relevant crop landrace experts. These inputs were used to add additional value to the model results. 3  | RESULTS 3.1 | Environmentally distinguishable groups of common bean landraces Our literature review indicated that a single major classifica- tion system based on genetic, morphological and physiological characteristics has been accepted for common bean landraces. This system, first proposed by Singh et al. (1991), classifies beans into two genepools—Andean and Mesoamerican. The Andean genepool, derived from the domestication event proposed to have occurred around Peru, Chile and Bolivia, is composed of typically larger- seeded genotypes. The Mesoamerican genepool, derived from the domestication event in Mexico and Central America, is typically composed of smaller-seeded genotypes (Singh et al., 1991). These and subsequent authors divide these genepools into races accord- ing to morphological criteria, agro-ecological adaptation and genetic data (see Table S2.2 for a complete list of publications reviewed). The Andean genepool is divided into races Chile, Nueva Granada and Peru, whereas the Mesoamerican genepool contains races Guatemala, Durango–Jalisco and Mesoamerica (Blair, Díaz, Hidalgo, Díaz, & Duque, 2007; Blair, Díaz, Buendía, & Duque, 2009; Singh et al., 1991). We tested a variety of accession-level data pertinent to common bean genepools, including seed protein type; seed weight, colour shape and brightness; and landrace names. Based on degree of ac- ceptance in published literature and availability of accession-level data with geographic coordinates, we ultimately based our training data on genepool designations given in the CIAT accessions dataset and specific accession numbers gathered from the reviewed litera- ture (Table S2.2). Our average classification accuracy at the genepool level was 86% (88.3% for Andean and 85% for Mesoamerican landraces), indi- cating that these two genepools have distinct environmental and so- cioeconomic signatures, with Mesoamerican beans being present in lower, drier and hotter places compared to Andean beans. Identified predictors (see Figure S2.1) for the classification models agree with previously reported predictors of domesticated and wild bean dis- tributions (Cortes, Monserrate, Ramirez-Villegas, Madrinan, & Blair, 2013; Ramirez-Villegas et al., 2010). At the race level, the classifica- tion accuracy was low 58.5% as a mean across all races and hence deemed not informative. Based on these results, we concluded that the genepool level was the most appropriate for all subsequent dis- tribution modelling and conservation gap analysis steps. Hence, in all following sections we show results separately for Andean and Mesoamerican common bean landrace groups. 3.2 | Geographic distributions of common bean landrace groups Figure 2 shows the predicted geographic distributions of Andean (Figure 2a) and Mesoamerican (Figure 2b) landraces. Cross-validated MaxEnt models performed well with mean AUC values of 0.973 (Andean) and 0.996 (Mesoamerican). The MaxEnt-based LDMs also indicated that 23 variables were important for the geographic pre- diction of landrace presence. Importantly, seven of these are non- climatic variables (Table S2.1), and amongst these, we find that accessibility and the geographic distribution of ethnic groups con- tribute substantially to the model.      |  7RAMIREZ-VILLEGAS Et AL. As expected, Andean landraces were predicted to be mostly dis- tributed across the Andes mountains and to a lesser extent in Mexico and Central America. The converse was true for Mesoamerican landraces. Andean landraces were also predicted to occur in Brazil, which is considered a secondary diversity centre for common beans (Lobo Burle et al., 2010, 2011). Overlap was particularly evident in the geographic intermediate zone in Central America, (Beebe, Rengifo, Gaitan, Duque, & Tohme, 2001; Beebe et al., 2000) and in some areas of Peru. 3.3 | Conservation gap maps for common bean landraces Conservation gap maps, displaying the overlap of results for the three gap scores per pixel, are shown in Figure 3. Figure S2.2 shows the individual gap scores, whereas Figure S2.3 shows model performance and coverage estimation. Overall gaps are larger for Andean com- pared to Mesoamerican beans, with representation of their distribu- tions in genebanks estimated at 78.5% for the Andean and 98.2% for F I G U R E 2   Predicted geographic distributions of Andean (a) and Mesoamerican (b) common bean (Phaseolus vulgaris L.) landrace groups F I G U R E 3   Final gaps map for Andean (a) and Mesoamerican (b) common bean (Phaseolus vulgaris L.) landrace groups. Red indicates areas where the three gap scores (SCON, SACC, SENV) agree in identifying a gap 8  |     RAMIREZ-VILLEGAS Et AL. the Mesoamerican genepool. There is significant agreement amongst the gap areas identified by the accessibility, connectivity and environ- mental scores, and all performed well at predicting gaps. For Andean beans, overlapping gap areas were found in the northern Venezuelan Andes, the Santander department in Colombia, specific pockets in the Andean hillsides between the Central and East cordillera in Colombia, the highlands of Ecuador, several areas in western and southern Peru, a major area in northern and central Chile, and central Brazil. In Mexico and Central America, noting that Andean bean variation is considered less diverse compared to South America (Becerra Velasquez & Gepts, 1994; Beebe et al., 2001), gaps were identified in the states of Oaxaca and to a lesser extent in Chiapas. Gaps were also predicted for Andean beans in Guatemala and Panama. For the Mesoamerican genepool, the largest overlapping pre- dicted gap was found in the area around Belize–Guatemala–south- ern Mexico (state of Campeche). Smaller overlapping gap areas were predicted in the states of San Luis Potosi, Jalisco and Sinaloa in Mexico. Across South America, southern Peru is predicted to be a gap. 3.4 | Expert inputs for common bean landrace distributions and conservation gaps To illustrate how gap analysis results may be used to discuss col- lecting priorities, semi-structured interviews were carried out with two national and international Phaseolus scientists from the study region. One expert, Daniel G. Debouck (DGD), member of many collecting missions for the genus across many countries in the Americas, and expert in bean taxonomy, ecology, domestication and diversity and conservation (both in situ and ex situ) (Freytag & Debouck, 2002). He discussed both Andean and Mesoamerican beans for the entire Americas. The second expert was Eduardo Peralta (EP), a retired scientist, bean expert and breeder from Ecuador, pioneer in bean breeding in the Andean region that helped consolidate the National Legumes Program in Ecuador. He discussed Andean beans in the Andes. Detailed maps are shown in Figure S2.4. Regarding areas of interest for collection for the Mesoamerican genepool, the experts indicated collecting should be prioritized in predicted gaps in San Luis Potosí, Oaxaca and Chiapas (Mexico), as well as in Belize and Ecuador. Notably, Ecuador is not predicted to be a gap by our method. For Andean landraces, the experts suggested collecting in the Venezuelan Andes and in the Santander department of Colombia. For the Colombian and Ecuadorian Andes, however, they indicated that collecting work would need to be done with pre- cision (i.e. collecting only in specific sites and genotypes) rather than in an extensive manner. Many areas were also identified by the two experts as unlikely to be considered collecting priorities. There were many areas, es- pecially for Andean beans, where the experts indicated that it is likely that landraces are already lost due to traditional cropping practice replacement. This is the case in northern Chile and in southern and coastal Peru, where beans have been replaced by grape cropped for wine and pisco. Other areas were considered by experts to not be collecting priorities since these are mostly “documentation” gaps (e.g. central Brazil for Andean beans); this is because these materials are mostly in national collections, and passport information (including coordinates of collection sites) from these collections was not available or had insufficient quality for inclusion in our analyses. 4  | DISCUSSION Here, we documented the development of a novel modelling frame- work to predict the distributions of crop landraces and to identify gaps in ex situ germplasm collections with relation to geographic and environmental variation in their distributions. We base our framework on the rationale that the distributions of landraces can be predicted using environmental and socioeconomic drivers, and that important conservation gaps can be identified by character- izing the geographic (accessibility and connectivity) and environ- mental space across which previous collecting has been carried out. Previous studies assessing gaps in landrace collections only used cli- mate drivers and did not explicitly assess gap prediction robustness (Upadhyaya et al., 2012, 2017; Upadhyaya, Reddy, Irshad Ahmed, Gowda, & Haussmann, 2010) nor introduce expert inputs to prior- itize collecting. Our analysis suggests that both genepools of P. vulgaris are rel- atively well conserved and that progress towards comprehensive representation ex situ may be relatively fast if targeted collecting is performed in the areas outlined in the results. This contrasts with results for common bean wild relatives, for which research indicates that about two-thirds of the wild species in the genus need further conservation action, and about half are considered high priority for further collecting (Castañeda-Álvarez et al., 2016; Ramirez-Villegas et al., 2010). For Andean beans, gaps were predicted throughout most bean-producing countries in South America, with the highest prior- ity being Chile, Peru, Colombia and specific spots in the Venezuelan Andes. For Mesoamerican landraces, the results target regions of Mexico, Belize, Guatemala and to a lesser extent South America (mostly Peru) for further collecting. While current common bean col- lections already hold substantial diversity from across the Americas (Beebe et al., 2000, 2001), our results, supplemented by expert opin- ion, indicated that further collecting is warranted, especially where valuable traits such as phosphorous use efficiency (Beebe, Lynch, Galwey, Tohme, & Ochoa, 1997) or heat stress tolerance (CGIAR, 2015) may be found. Our ongoing review of other crop landraces indicates that the classification approach, based on recognized groups, can be widely applicable to other crops (van Heerwaarden et al., 2011; Lasky et al., 2015; Ndjiondjop et al., 2018). Moreover, the continuous generation of new genetic diversity data and related knowledge      |  9RAMIREZ-VILLEGAS Et AL. (Crossa et al., 2016; Halewood et al., 2018) will facilitate the fur- ther application of our methods, which are ultimately dependent on the availability of robust classification, occurrence and charac- terization data. While our framework contributes to revealing existing gaps in current germplasm collections and to highlighting geographic areas where novel diversity may be collected, the question re- mains as to the extent to which the results can support on-the- ground collecting work. Our discussion with experts indicates that priorities for collecting can be drawn using our predicted gap maps. Moreover, previous ecogeographic analyses have proven useful for collecting planning (García et al., 2017; Jarvis et al., 2005; Marinoni et al., 2015). To further translate our re- sults for action, designing tools for real-time collecting mission support (e.g. route tracing) that combine the outputs with exist- ing technologies for map visualization and navigation would be advantageous. 4.1 | Challenges and limitations to landrace distribution modelling and conservation gap analysis Predicting the distributions of cultivated plants, whose ranges are determined by anthropogenic along with environmental drivers, presents a challenge that has not been fully resolved in geospatial sciences. While we attempted to gather the widest range of qual- ity input occurrence and predictor data and used state of the art approaches to ensure high species distribution model (SDM) perfor- mance, several further improvements can be suggested. With regard to occurrence information, particularly for gene- bank collections, we incorporated data from the two central global repositories for such information (Genesys and WIEWS) and in ad- dition (due to our focus here on common bean) insured the full com- pilation of data from the world's two largest P. vulgaris collections (CIAT and USDA). This said, these sources are not fully representa- tive of all common bean collections worldwide, including collections such as the Agricultural Research Institute (CIAP) in Cuba. Ongoing initiatives, such as Genesys that list in a single location passport and (eventually) characterization data for many genebanks (Global Crop Diversity Trust, 2019), may help resolve this data challenge in the future. On the other hand, national policies influencing germplasm distributions hinder the international accessibility of many such “low-visibility” collections (Castiñeiras, Esquive, Lioi, & Hammer, 1991; Lobo Burle et al., 2011). We also note that coordinate information, which is an essen- tial input into our methods, is missing for many current genebank accessions. Further efforts to georeference records missing co- ordinates but possessing locality information, and to make this information easily available online, will facilitate a more robust assessment of the state of conservation of crop landraces ex situ. Distributions of crop landraces are influenced by factors be- yond the environmental and socioeconomic predictors used here. These may include other abiotic (e.g. soil parent material and other edaphic characteristics), biotic (e.g. mycorrhizae, pathogens and pol- linators), and agriculturally relevant socioeconomic (e.g. farm sizes and farming systems) factors. Further development of high-resolu- tion global datasets will be needed to incorporate such information into our analyses. Similarly, we note that model uncertainty can be a challenge and highlight the need to use model results as a “dis- cussion support” tool to prioritize collecting. Finally, while we em- ploy a widely used distribution modelling algorithm, it is possible that incorporating other methods, or forming ensembles of multiple methods, could improve our prediction of gaps (Grenouillet, Buisson, Casajus, & Lek, 2011). 4.2 | Landrace conservation gap analysis for global targets The high value of crop landrace diversity in breeding programmes and for farm-level resilience (Camacho Villa et al., 2005; van Etten et al., 2019; van de Wouw et al., 2010), and the evident erosion of these resources in their primary and secondary centres of diver- sity (van Heerwaarden et al., 2009; Mekbib, 2008) justify urgent action to secure ex situ the diversity of landrace still cultivated by farmers and in addition (though not discussed in this arti- cle) to invest in farmer-based (i.e. in situ/on farm) conservation (Bellon, Dulloo, Sardos, Thormann, & Burdon, 2017). The United Nations Sustainable Development Goal (SDG) 2.5, the Convention on Biological Diversity (CBD), Strategic Plan for Biodiversity 2011–2020, Aichi Biodiversity Target 13 (CBD, 2010a) and Global Strategy for Plant Conservation (GSPC) Target 9 (CBD, 2010b) and Article 5 of the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA) (FAO, 2002) all discuss and/ or establish targets for the maintenance of genetic diversity of cultivated plants and their wild relatives, both in situ and ex situ. Recently, Khoury et al. (2019) proposed an indicator to track the conservation of useful wild plants, which furthers tested gap anal- ysis methodologies for wild flora (Ramirez-Villegas et al., 2010). Here, we developed a coverage metric that, if implemented for a sufficiently large number of crops, could be used to track progress towards the conservation of cultivated plants for SDG 2.5, Aichi 13 and other important international goals. ACKNOWLEDG EMENTS This work was carried out under the CGIAR Genebanks Platform (https ://www.geneb anks.org). The CGIAR Genebanks Platform ena- bles CGIAR Research Centers to fulfil their legal obligation to con- serve and make available accessions of crops and trees on behalf of the global community under the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA). Authors thank Andy Jarvis from CIAT, Chris Richards from USDA and Paul Evangelista from Colorado State University for input on the method- ology during early stages of this project, and Angela M. Hernández from CIAT for providing bean passport and phenotypic characteri- zation data for the analyses. Authors also thank CGIAR Genebank 10  |     RAMIREZ-VILLEGAS Et AL. Managers and Scientists present at the Annual Genebanks Meetings (AGM) in 2017 (Brussels) and 2018 (Fortaleza) for their feedback on the methodology. CONFLIC T OF INTERE S T The authors declare no conflict of interest. DATA AVAIL ABILIT Y S TATEMENT Occurrence and predictor data are provided in Dataset S1. The data used as input of this research are available open access at the origi- nal sources, all of which are cited in the respective sections of the text. The R code for performing the analyses is available at: https :// github.com/CIAT-DAPA/gap_analy sis_landr aces. ORCID Julian Ramirez-Villegas https://orcid. org/0000-0002-8044-583X Colin K. Khoury https://orcid.org/0000-0001-7893-5744 R E FE R E N C E S Barbet-Massin, M., Jiguet, F., Albert, C. H., & Thuiller, W. (2012). Selecting pseudo-absences for species distribution models: How, where and how many? Methods in Ecology and Evolution, 3, 327–338. https ://doi. org/10.1111/j.2041-210X.2011.00172.x Becerra Velasquez, V. L., & Gepts, P. (1994). RFLP diversity of common bean (Phaseolus vulgaris) in its centres of origin. Genome, 37, 256–263. Beebe, S. (2012). Common bean breeding in the tropics. Plant breeding re- views (pp. 357–426). Hoboken, NJ, USA: John Wiley & Sons Inc. Beebe, S., Lynch, J., Galwey, N., Tohme, J., & Ochoa, I. (1997). A geo- graphical approach to identify phosphorus-efficient genotypes among landraces and wild ancestors of common bean. Euphytica, 95, 325–338. Beebe, S., Rengifo, J., Gaitan, E., Duque, M. C., & Tohme, J. (2001). Diversity and Origin of Andean Landraces of Common Bean. Crop Science, 41, 854. https ://doi.org/10.2135/crops ci2001.413854x Beebe, S., Skroch, P. W., Tohme, J., Duque, M. C., Pedraza, F., & Nienhuis, J. (2000). Structure of genetic diversity among common bean land- races of Middle American origin based on correspondence analy- sis of RAPD. Crop Science, 40, 264. https ://doi.org/10.2135/crops ci2000.401264x Bellon, M. R., Dulloo, E., Sardos, J., Thormann, I., & Burdon, J. J. (2017). In situ conservation-harnessing natural and human-derived evolution- ary forces to ensure future crop adaptation. Evolutionary Applications, 10, 965–977. https ://doi.org/10.1111/eva.12521 Blair, M. W., Díaz, J. M., Hidalgo, R., Díaz, L. M., & Duque, M. C. (2007). Microsatellite characterization of Andean races of common bean (Phaseolus vulgaris L.). Theoretical and Applied Genetics, 116, 29–43. https ://doi.org/10.1007/s00122-007-0644-8 Blair, M. W., Díaz, L. M., Buendía, H. F., & Duque, M. C. (2009). Genetic diversity, seed size associations and population structure of a core collection of common beans (Phaseolus vulgaris L.). Theoretical and Applied Genetics, 119, 955–972. https ://doi.org/10.1007/ s00122-009-1064-8 Braunisch, V., Coppes, J., Arlettaz, R., Suchant, R., Schmid, H., & Bollmann, K. (2013). Selecting from correlated climate variables: A major source of uncertainty for predicting species distribu- tions under climate change. Ecography, 36, 971–983. https ://doi. org/10.1111/j.1600-0587.2013.00138.x Broughton, W. J., Hernández, G., Blair, M., Beebe, S., Gepts, P., & Vanderleyden, J. (2003). Beans (Phaseolus spp.) – model food legumes. Plant and Soil, 252, 55–128. https ://doi.org/10.1023/A:10241 46710611 Burke, M. B., Lobell, D. B., & Guarino, L. (2009). Shifts in African crop climates by 2050, and the implications for crop improvement and genetic resources conservation. Global Environmental Change, 19, 317–325. https ://doi.org/10.1016/j.gloen vcha.2009.04.003 Camacho Villa, T. C., Maxted, N., Scholten, M., & Ford-Lloyd, B. (2005). Defining and identifying crop landraces. Plant Genetic Resources: Characterization and Utilization, 3, 373–384. https ://doi.org/10.1079/ PGR20 0591 Casañas, F., Simó, J., Casals, J., & Prohens, J. (2017) Toward an evolved concept of landrace. Frontiers in Plant Science, 8, 145. https ://doi. org/10.3389/fpls.2017.00145 Castañeda-Álvarez, N. P., Khoury, C. K., Achicanoy, H. A., Bernau, V., Dempewolf, H., Eastwood, R. J., … Toll, J. (2016). Global conservation priorities for crop wild relatives. Nature Plants, 2, 16022. https ://doi. org/10.1038/nplan ts.2016.22 Castiñeiras, L., Esquive, M., Lioi, L., & Hammer, K. (1991). Origin, diversity and utilization of the Cuban germplasm of common bean (Phaseolus vulgaris L.). Euphytica, 57, 1–8. CBD (2010a) Aichi biodiversity targets. https ://www.cbd.int/sp/targe ts/. Accessed August 8, 2018. CBD (2010b) Global strategy for plant conservation. The targets, 2011– 2020. https ://www.cbd.int/gspc/targe ts.shtml . Accessed August 8, 2018. CGIAR (2015). Developing beans that can beat the heat. Cali, Colombia: CGIAR. Chacon, M. I., Pickersgill, B., & Debouck, D. G. (2005). Domestication patterns in common bean (Phaseolus vulgaris L.) and the origin of the Mesoamerican and Andean cultivated races. Theoretical and Applied Genetics, 110, 432–444. https ://doi.org/10.1007/ s00122-004-1842-2 CIAT (2018). CIAT Genebank Database, version 2018. CIESIN (2018). Gridded population of the world, version 4 (GPWv4): Population density, revision 11. Palisades, USA. Cortes, A. J., Monserrate, F. A., Ramirez-Villegas, J., Madrinan, S., & Blair, M. W. (2013). Drought tolerance in wild plant populations: The case of common beans (Phaseolus vulgaris L.). PLoS ONE, 8, e62898. https ://doi.org/10.1371/journ al.pone.0062898 Costa, G., Nogueira, C., Machado, R., & Colli, G. (2010). Sampling bias and the use of ecological niche modeling in conservation planning: A field evaluation in a biodiversity hotspot. Biodiversity and Conservation, 19, 883–899. https ://doi.org/10.1007/ s10531-009-9746-8 Crossa, J., Jarquín, D., Franco, J., Pérez-Rodríguez, P., Burgueño, J., Saint- Pierre, C., … Singh, S. (2016). Genomic prediction of gene bank wheat landraces. G3, Genes|genomes|genetics, 6, 1819–1834. https ://doi. org/10.1534/g3.116.029637 Debouck, D. G. (2014). Conservation of Phaseolus beans genetic resources: A strategy. Rome, Italy: Global Crop Diversity Trust. Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: A methodology review. Journal of Biomedical Informatics, 35, 352–359. https ://doi.org/10.1016/ S1532-0464(03)00034-0 Elith, J., Graham, C., Anderson, R., Dudík, M., Ferrier, S., Guisan, A., … Zimmermann, N. (2006). Novel methods improve prediction of spe- cies distributions from occurrence data. Ecography, 29, 129–151. https ://doi.org/10.1111/j.2006.0906-7590.04596.x Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2010). A statistical explanation of MaxEnt for ecol- ogists. Diversity and Distributions, 17, 43–57. https ://doi. org/10.1111/j.1472-4642.2010.00725.x Escribano, M., & De Ron, A. M. (1991). Taxonomical relationships among common bean populations from northern Spain. Anales de la Estación Experimental de Aula Dei, 20, 17–27.      |  11RAMIREZ-VILLEGAS Et AL. Esquinas-Alcázar, J. (2005). Protecting crop genetic diversity for food security: Political, ethical and technical challenges. Nature Reviews Genetics, 6, 946–953. https ://doi.org/10.1038/nrg1729 FAO (2002). The international treaty on plant genetic resources for food and agriculture. Rome, Italy: FAO. FAO (2019). United Nations food and agriculture organization world infor- mation and early warning system on plant genetic resources for food and Agriculture (WIEWS). Rome, Italy: Food and Agriculture Organization of the United Nations (FAO). Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1-km spatial reso- lution climate surfaces for global land areas. International Journal of Climatology, 37, 4302–4315. https ://doi.org/10.1002/joc.5086 Freytag, G. F., & Debouck, D. G. (2002). Taxonomy, distribution, and ecology of the genus Phaseolus in North America, Mexico and Central America. Forth Worth, TX, USA: Botanical Research Institute Fuller, D. Q. (2007). Contrasting patterns in crop domestication and do- mestication rates: Recent archaeobotanical insights from the old world. Annals of Botany, 100, 903–924. https ://doi.org/10.1093/aob/ mcm048 García, R. M., Parra-Quijano, M., & Iriondo, J. M. (2017). A multispecies collecting strategy for crop wild relatives based on complementary areas with a high density of ecogeographical gaps. Crop Science, 57, 1059. https ://doi.org/10.2135/crops ci2016.10.0860 GBIF.org (2019). Global Biodiversity Information Facility (GBIF) occur- rence download. Gepts, P. (2006). Plant genetic resources conservation and utilization. Crop Science, 46, 2278. Gepts, P., Osborn, T. C., Rashka, K., & Bliss, F. A. (1986). Phaseolin- protein variability in wild forms and landraces of the common bean (Phaseolus vulgaris): Evidence for multiple centers of domes- tication. Economic Botany, 40, 451–468. https ://doi.org/10.1007/ BF028 59659 Giovanelli, J. G. R., de Siqueira, M. F., Haddad, C. F. B., & Alexandrino, J. (2010). Modeling a spatially restricted distribution in the Neotropics: How the size of calibration area affects the performance of five pres- ence-only methods. Ecological Modelling, 221, 215–224. https ://doi. org/10.1016/j.ecolm odel.2009.10.009 Glaszmann, J., Kilian, B., Upadhyaya, H., & Varshney, R. (2010). Accessing genetic diversity for crop improvement. Current Opinion in Plant Biology, 13, 167–173. https ://doi.org/10.1016/j. pbi.2010.01.004 Global Crop Diversity Trust (2019) Genesys-PGR: A gateway to genetic re- sources. Bonn, Germany: Global Crop Diversity Trust Greene, S. L., Hart, T. C., & Afonin, A. (1999a). Using geographic infor- mation to acquire wild crop germplasm for ex situ collections: I. Map development and field use. Crop Science, 39, 836. https ://doi. org/10.2135/crops ci1999.00111 83X00 39000 30037x Greene, S. L., Hart, T. C., & Afonin, A. (1999b). Using geographic informa- tion to acquire wild crop germplasm for ex situ collections: II. Post- collection analysis. Crop Science, 39, 843. https ://doi.org/10.2135/ crops ci1999.00111 83X00 39000 30038x Grenouillet, G., Buisson, L., Casajus, N., & Lek, S. (2011). Ensemble modelling of species distribution: The effects of geographi- cal and environmental ranges. Ecography, 34, 9–17. https ://doi. org/10.1111/j.1600-0587.2010.06152.x Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In: R. Meersman, Z. Tari, D.C. Schmidt (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, 2888. (pp. 986–996). Berlin, Heidelberg: Springer. Hajjar, R., & Hodgkin, T. (2007). The use of wild relatives in crop improve- ment: A survey of developments over the last 20 years. Euphytica, 156, 1–13. https ://doi.org/10.1007/s10681-007-9363-0 Halewood, M., Lopez Noriega, I., Ellis, D., Roa, C., Rouard, M., & Sackville Hamilton, R. (2018). Using genomic sequence information to increase conservation and sustainable use of crop diversity and benefit-shar- ing. Biopreservation and Biobanking, 16, 368–376. https ://doi. org/10.1089/bio.2018.0043 Hammer, K., Knüpffer, H., Xhuveli, L., & Perrino, P. (1996). Estimating ge- netic erosion in landraces — two case studies. Genetic Resources and Crop Evolution, 43, 329–336. https ://doi.org/10.1007/BF001 32952 Harlan, J. R. (1975). Our vanishing genetic resources. Science, 188, 617– 621. https ://doi.org/10.1126/scien ce.188.4188.617 Hilbert, L., Neves, E. G., Pugliese, F., Whitney, B. S., Shock, M., Veasey, E., … Iriarte, J. (2017). Evidence for mid-Holocene rice domestication in the Americas. Nature Ecology & Evolution, 1, 1693–1698. https ://doi. org/10.1038/s41559-017-0322-4 Hoisington, D., Khairallah, M., Reeves, T., Ribaut, J.-M., Skovmand, B., Taba, S., & Warburton, M. (1999). Plant genetic resources: What can they contribute toward increased crop productivity? Proceedings of the National Academy of Sciences of the United States of America, 96, 5937–5943. https ://doi.org/10.1073/pnas.96.11.5937 Jarvis, A., Reuter, H. I., Nelson, A., & Guevara, E. (2008). Hole-filled seamless SRTM data V4. Jarvis, A., Williams, K., Williams, D., Guarino, L., Caballero, P., & Mottram, G. (2005). Use of GIS for optimizing a collecting mission for a rare wild pepper (Capsicum flexuosum Sendtn.) in Paraguay. Genetic Resources and Crop Evolution, 52, 671–682. https ://doi.org/10.1007/ s10722-003-6020-x Jones, H., Lister, D. L., Bower, M. A., Leigh, F. J., Smith, L. M., & Jones, M. K. (2008). Approaches and constraints of using existing landrace and extant plant material to understand agricultural spread in prehistory. Plant Genetic Resources: Characterization and Utilization, 6, 98–112. https ://doi.org/10.1017/S1479 26210 8993138 Khoury, C. K., Achicanoy, H. A., Bjorkman, A. D., Navarro-Racines, C., Guarino, L., Flores-Palacios, X., … Struik, P. C. (2016). Origins of food crops connect countries worldwide. Proceedings of the Royal Society B: Biological Sciences, 283, 20160792. https ://doi.org/10.1098/ rspb.2016.0792 Khoury, C. K., Amariles, D., Soto, J. S., Diaz, M. V., Sotelo, S., Sosa, C. C., … Jarvis, A. (2019). Comprehensiveness of conservation of useful wild plants: An operational indicator for biodiversity and sustainable development targets. Ecological Indicators, 98, 420–429. https ://doi. org/10.1016/j.ecoli nd.2018.11.016 Lasky, J. R., Upadhyaya, H. D., Ramu, P., Deshpande, S., Hash, C. T., Bonnette, J., … Morris, G. P. (2015). Genome-environment associa- tions in sorghum landraces predict adaptive traits. Science Advances, 1, e1400218–e1400218. https ://doi.org/10.1126/sciadv.1400218 Lee, D. T., & Schachter, B. J. (1980). Two algorithms for constructing a Delaunay triangulation. International Journal of Computer & Information Sciences, 9, 219–242. https ://doi.org/10.1007/BF009 77785 Lobo Burle, M. L., Fonseca, J. R., Jose del Peloso, M., Melo, L. C., Temple, S. R., & Gepts, P. (2011). Integrating phenotypic evaluations with a molecular diversity assessment of a Brazilian collection of common bean landraces. Crop Science, 51, 2668. https ://doi.org/10.2135/ crops ci2010.12.0710 Lobo Burle, M., Fonseca, J. R., Kami, J. A., & Gepts, P. (2010). Microsatellite diversity and genetic structure among common bean (Phaseolus vulgaris L.) landraces in Brazil, a secondary center of di- versity. Theoretical and Applied Genetics, 121, 801–813. https ://doi. org/10.1007/s00122-010-1350-5 Logozzo, G., Donnoli, R., Macaluso, L., Papa, R., Knüpffer, H., & Zeuli, P. S. (2007). Analysis of the contribution of mesoamerican and andean gene pools to European common bean (Phaseolus vulgaris L.) germ- plasm and strategies to establish a core collection. Genetic Resources and Crop Evolution, 54, 1763–1779. https ://doi.org/10.1007/ s10722-006-9185-2 Marinoni, L., Bortoluzzi, A., Parra-Quijano, M., Zabala, J. M., & Pensiero, J. F. (2015). Evaluation and improvement of the ecogeographical rep- resentativeness of a collection of the genus Trichloris in Argentina. 12  |     RAMIREZ-VILLEGAS Et AL. Genetic Resources and Crop Evolution, 62, 593–604. https ://doi. org/10.1007/s10722-014-0184-4 Maxted, N., Dulloo, E., Ford-Lloyd, V. B., Iriondo, J. M., & Jarvis, A. (2008). Gap analysis: A tool for complementary genetic conservation assessment. Diversity and Distributions, 14, 1018–1030. https ://doi. org/10.1111/j.1472-4642.2008.00512.x Mekbib, F. (2008). Genetic erosion of sorghum (Sorghum bicolor (L.) Moench) in the centre of diversity, Ethiopia. Genetic Resources and Crop Evolution, 55, 351–364. Meyer, D., Leisch, F., & Hornik, K. (2003). The support vector machine under test. Neurocomputing, 55, 169–186. https ://doi.org/10.1016/ S0925-2312(03)00431-4 Morris, G. P., Ramu, P., Deshpande, S. P., Hash, C. T., Shah, T., Upadhyaya, H. D., … Kresovich, S. (2013). Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proceedings of the National Academy of Sciences of the United States of America, 110, 453–458. https ://doi.org/10.1073/pnas.12159 85110 Natural Earth (2019). Rivers + lake centerlines. Ndjiondjop, M. N., Semagn, K., Sow, M., Manneh, B., Gouda, A. C., Kpeki, S. B., … Warburton, M. L. (2018) Assessment of genetic variation and population structure of diverse rice genotypes adapted to lowland and upland ecologies in Africa using SNPs. Frontiers in Plant Science, 9, 446. https ://doi.org/10.3389/fpls.2018.00446 Nelson, A. (2008) A global map of Accessiblity. Published by the Office for Official Publications of the European Communities. Luxembourg. https ://doi.org/10.2788/95835 . ISBN: 978-92-79-09771-3 Pal, M. (2005). Random forest classifier for remote sensing classifica- tion. International Journal of Remote Sensing, 26, 217–222. https ://doi. org/10.1080/01431 16041 23312 69698 Phillips, S. J., Anderson, R. P., Dudík, M., Schapire, R. E., & Blair, M. E. (2017). Opening the black box: An open-source release of Maxent. Ecography, 40, 887–893. https ://doi.org/10.1111/ecog.03049 Phillips, S., Anderson, R., & Schapire, R. (2006). Maximum entropy mod- eling of species geographic distributions. Ecological Modelling, 190, 231–259. https ://doi.org/10.1016/j.ecolm odel.2005.03.026 Ramirez-Villegas, J., Khoury, C., Jarvis, A., Debouck, D. G., & Guarino, L. (2010). A gap analysis methodology for collecting crop genepools: A case study with Phaseolus beans. PLoS ONE, 5, e13497. https ://doi. org/10.1371/journ al.pone.0013497 Reuter, H. I., Nelson, A., & Jarvis, A. (2007). An evaluation of void-fill- ing interpolation methods for SRTM data. International Journal of Geographical Information Science, 21, 983–1008. https ://doi. org/10.1080/13658 81060 1169899 Rodrigues, A. S. L., Akçakaya, H. R., Andelman, S. J., Bakarr, M. I., Boitani, L., Brooks, T. M., … Yan, X. (2004). Global gap analysis: Priority regions for expanding the global protected-area network. BioScience, 54, 1092–1100. https ://doi.org/10.1641/0006-3568(2004)054[1092:G- GAPR F]2.0.CO;2 Sayre, R., Dangermond, J., Frye, C., Vaughan, R., Aniello, P., Breyer, S., … Comer, P. (2014). A new map of global ecological land units — An eco- physiographic stratification approach. Washington, DC. Senay, S. D., Worner, S. P., & Ikeda, T. (2013). Novel three-step pseudo-ab- sence selection technique for improved species distribution modelling. PLoS ONE, 8, e71218. https ://doi.org/10.1371/journ al.pone.0071218 Siebert, S., Henrich, V., Frenken, K., & Burke, J. (2013) Update of the Global Map of Irrigation Areas to version 5. Project report. Singh, S. (1989). Patterns of variation in cultivated common bean (Phaseolus vulgaris Fabaceae). Economic Botany, 43, 39–57. Singh, S., Gepts, P., & Debouck, D. (1991). Races of common bean (Phaseolus vulgaris, Fabaceae). Economic Botany, 45, 379–396. https ://doi.org/10.1007/BF028 87079 Syfert, M. M., Castaneda-Alvarez, N. P., Khoury, C. K., Sarkinen, T., Sosa, C. C., Achicanoy, H. A., … Knapp, S. (2016). Crop wild relatives of the brinjal eggplant (Solanum melongena): Poorly represented in gen- ebanks and many species at risk of extinction. American Journal of Botany, 103, 635–651. Title, P. O., & Bemmels, J. B. (2018). ENVIREM: An expanded set of bio- climatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography, 41, 291–307. https ://doi.org/10.1111/ecog.02880 Turner, R. (2019). deldir: Delaunay Triangulation and Dirichlet (Voronoi) Tessellation. R package version 0.1-16. Upadhyaya, H. D., Reddy, K. N., Irshad Ahmed, M., & Gowda, C. L. L. (2012). Identification of gaps in pearl millet germplasm from East and Southern Africa conserved at the ICRISAT genebank. Plant Genetic Resources, 10, 202–213. https ://doi.org/10.1017/S1479 26211 2000275 Upadhyaya, H. D., Reddy, K. N., Irshad Ahmed, M., Gowda, C. L. L., & Haussmann, B. I. G. (2010). Identification of geographical gaps in the pearl millet germplasm conserved at ICRISAT genebank from West and Central Africa. Plant Genetic Resources, 8, 45–51. https ://doi. org/10.1017/S1479 26210 999013X Upadhyaya, H. D., Reddy, K. N., Vetriventhan, M., Krishna Gumma, M., Irshad Ahmed, M., Thimma Reddy, M., & Singh, S. K. (2017). Status, genetic diversity and gaps in sorghum germplasm from South Asia conserved at ICRISAT genebank. Plant Genetic Resources, 15, 527–538. https ://doi.org/10.1017/S1479 26211 600023X USDA ARS NPGS (2018). USDA GRIN Global. van de Wouw, M., Kik, C., van Hintum, T., van Treuren, R., & Visser, B. (2010). Genetic erosion in crops: Concept, research results and chal- lenges. Plant Genetic Resources, 8, 1–15. https ://doi.org/10.1017/ S1479 26210 9990062 van Etten, J., de Sousa, K., Aguilar, A., Barrios, M., Coto, A., Dell’Acqua, M., … Steinke, J. (2019). Crop variety management for climate ad- aptation supported by citizen science. Proceedings of the National Academy of Sciences of the United States of America, 116, 4194–4199. https ://doi.org/10.1073/pnas.18137 20116 van Heerwaarden, J., Doebley, J., Briggs, W. H., Glaubitz, J. C., Goodman, M. M., de Jesus Sanchez Gonzalez, J. & Ross-Ibarra, J. (2011). Genetic signals of origin, spread, and introgression in a large sample of maize landraces. Proceedings of the National Academy of Sciences of the United States of America, 108, 1088–1092. https ://doi.org/10.1073/ pnas.10130 11108 van Heerwaarden, J., Hellin, J., Visser, R., & van Eeuwijk, F. (2009). Estimating maize genetic erosion in modernized smallholder agricul- ture. TAG Theoretical and Applied Genetics, 119, 875–888. https ://doi. org/10.1007/s00122-009-1096-0 van Treuren, R., Engels, J. M. M., Hoekstra, R., & van Hintum, T. J. L. (2009). Optimization of the composition of crop collections for ex situ conservation. Plant Genetic Resources, 7, 185–193. https ://doi. org/10.1017/S1479 26210 8197477 Warren, D. L., & Seifert, S. N. (2011). Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications, 21, 335–342. https ://doi.org/10.1890/10-1171.1 Weidmann, N. B., Rød, J. K., & Cederman, L.-E. (2010). Representing ethnic groups in space: A new dataset. Journal of Peace Research, 47, 491–499. https ://doi.org/10.1177/00223 43310 368352 Weiss, D. J., Nelson, A., Gibson, H. S., Temperley, W., Peedell, S., Lieber, A., … Gething, P. W. (2018). A global map of travel time to cities to as- sess inequalities in accessibility in 2015. Nature, 553, 333–336. https ://doi.org/10.1038/natur e25181 You, L., Wood-Sichra, U., Fritz, S., Guo, Z., See, L., & Koo, J. (2017). Spatial Production Allocation Model (SPAM) 2005 v3.2. Zeven, A. C. (1998). Landraces: A review of definitions and classifica- tions. Euphytica, 104, 127–139.      |  13RAMIREZ-VILLEGAS Et AL. BIOSKE TCH Dr. Julian Ramirez-Villegas is a senior scientist at the International Centre for Tropical Agriculture (CIAT), based at their headquar- ters in Cali, Colombia. He has a PhD from the School of Earth and Environment at the University of Leeds (UK). He leads a group on agricultural and climate modelling, doing research on climate information services, climate change impacts and adaptation, and crop agro-biodiversity conservation. The research presented here is part of the “Landrace Gap Analysis of 22 Crops” activ- ity, under the Conservation Module of the CGIAR Genebanks Platform (https ://www.geneb anks.org). Authors Contributions: J.R.-V., C.K.K., L.G. and Z.K. conceived the idea and designed the modelling framework. H.A.A., A.C.M., M.V.D. and C.C.S. collected and analysed the data. D.G.D. pro- vided critical inputs to correctly interpret results and to adjust models. J.R.-V. and C.K.K. wrote the manuscript. All authors re- viewed and edited the manuscript. SUPPORTING INFORMATION Additional supporting information may be found online in the Supporting Information section. How to cite this article: Ramirez-Villegas J, Khoury CK, Achicanoy HA, et al. A gap analysis modelling framework to prioritize collecting for ex situ conservation of crop landraces. Divers Distrib. 2020;00:1–13. https ://doi. org/10.1111/ddi.13046