The UCR Minicore: a valuable resource for cowpea research and breeding

Incorporation of new sources of genetic diversity into plant breeding programs is crucial for continuing to improve yield and quality, as well as tolerance to abiotic and biotic stresses. A minicore (the “UCR Minicore”) composed of 368 worldwide accessions of cultivated cowpea has been assembled, having been derived from the University of California, Riverside cowpea collection. High-density genotyping with 51,128 SNPs followed by principal component and genetic assignment analyses identified six subpopulations in the UCR Minicore, mainly differentiated by cultivar group and geographic origin. All six subpopulations were present to some extent in West African material, suggesting that West Africa is a center of diversity for cultivated cowpea. Additionally, population structure analyses supported two routes of introduction of cowpea into the U.S.: (1) from Spain to the southwest U.S. through Northern Mexico, and (2) from Africa to the southeast U.S. via the Caribbean. Genome-wide association studies (GWAS) of important agronomic traits including flowering time, resulted in the identification of significant SNPs for all traits and environments. The mapping resolution achieved by high-density genotyping of this diverse minicore collection allowed the identification of strong candidate genes, including orthologs of the Arabidopsis FLOWERING LOCUS T. In summary, this diverse, yet compact cowpea collection constitutes a suitable resource to identify loci controlling complex traits, consequently providing markers to assist with breeding to improve this crop of high relevance to global food and nutritional security.


INTRODUCTION
Cowpea (Vigna unguiculata L. Walp) is a diploid (2n = 22) warm-season legume of major importance for food and nutritional security. It provides a major source of dietary protein, fiber, minerals, and vitamins for millions of people in sub-Saharan Africa (SSA), and fodder for livestock. Most of the production is in SSA by smallholder farmers, most of whom are women. It is also grown in many other parts of the world including Latin America, Southeast Asia, the Mediterranean Basin, and the United States (FAOSTAT, www.fao.org). Cowpea is well-known for its adaptation to heat and drought, and to soils with low fertility, making it a successful crop in arid and semi-arid regions where most other crops do not perform as well [1]. However, breeding for increased heat and drought tolerance as well as for key agronomic traits and pest and disease-resistance is crucial as climate changes associated with global warming increase, and given that cowpea is primarily grown in regions that are quite vulnerable to climate change [2,3].
Plant genetic resources constitute the raw material for crop improvement. Cowpea is a genetically diverse crop species divided into five cultivar groups: unguiculata, biflora, sesquipedalis, textilis and melanophthalmus [4,5]. The two most important cultivar groups are unguiculata (grain-type cowpea) and sesquipedalis (vegetable-type cowpea, also known as asparagus bean or yardlong bean) [4]. The long-and succulent-podded vegetable cowpea is mostly grown in southeastern Asia while the short-podded grain cowpea is prevalent in Africa and elsewhere [1,6]. In addition to being grown for grain, cowpea is a source of nutritious fodder for livestock in dry savanna regions of sub-Saharan Africa [7]. One important aspect of cowpea agronomy is the time to flowering, as early-maturing types can in many cases be deployed as a strategy to capitalize on shortened periods of optimal growth, thus avoiding late-season drought with its accompanying array of biotic stressors [1].
Diverse cowpea germplasm is available from genebanks around the world, as partially summarized in Genesys (gene.sys-pgr.org/c/cowpea). The largest germplasm collection, comprised of 15,933 accessions, is located at the International Institute of Tropical Agriculture (IITA) in Ibadan, Nigeria. Other large collections, considerably overlapping in content, are at the United States Department of Agriculture -Agricultural Research Service (USDA-ARS) Plant Genetic Resources Conservation Unit (Griffin, Georgia, USA) with 8,242 accessions, the National Bureau of Plant Genetic Resources (NBPGR, New Delhi, India) with 3,704 accessions, and the University of California, Riverside (UCR, California, USA) with about 5,000 accessions. Managing and evaluating large germplasm collections is laborious and costly, and developing core and minicore subsets to facilitate access to the diversity contained in the entire set of accessions is a common practice in most ex situ collections [8,9].
Genetic and phenotypic evaluation of diverse collections are needed to fully utilize their potential in breeding programs. Several previous studies have reported on the genetic diversity of cultivated cowpea germplasm. Huynh et al. [10] genotyped 442 cowpea landraces, predominantly from Africa, with a 1,536-SNP GoldenGate assay [11] and identified two major genepools, nominally West versus Southeast Africa. Later, 768 accessions mostly from the USDA cowpea collection were characterized using 5,828 GBS (genotyping-by-sequencing) SNPs, revealing the presence of three major subpopulations [12], again one from West Africa, with the other two subpopulations separating Asian, European and some US accessions from those originating in India, Oceania, other parts of Africa, and the Americas. More recently, the majority of an IITA minicore collection (298 accessions) was genotyped using GBS with 2,276 SNPs, also identifying three major subpopulations [13], but with more dispersion of West and Central African accessions across the three sub-populations. Carvalho et al. [14] genotyped a smaller set of 96 worldwide cowpea accessions emphasizing Iberian Peninsula germplasm, but at a much higher SNP density using the Illumina Cowpea iSelect Consortium Array containing 51,128 SNPs [15]. In this latter work, logical relationships were noted between the genetic composition of accessions and colonial-era movement of germplasm from the Iberian Peninsula to the Caribbean, and from Africa to South America. In general, it is evident that there is much yet to be clarified regarding the spread of cowpeas worldwide, and the extent to which certain genetic variants have taken hold in different regions at different times.
Here, we report the development of a minicore (henceforth referred as the "UCR Minicore") composed of 368 domesticated cowpeas selected from a larger set of ~5,000 accessions comprising the UC Riverside cowpea collection. This minicore was genotyped using the 51,128-SNPs Cowpea iSelect Consortium Array [15] and the genotypic information was used to gain a better understanding of the genetic diversity of domesticated cowpea. Although this is the first summary report and full SNP data release for the UCR Minicore, this material has been utilized for more focused work on seed coat color [16], seed coat pattern [17], seed size [18], bruchid resistance [19], plant herbivore resistance [20] and pod shattering [21]. This study evaluated additional traits of agronomic importance including flowering time, dry pod weight, dry fodder weight and pod load score. Genome-wide association studies (GWAS) were conducted, identifying significant SNPs and candidate genes associated with each of these traits.

Plant materials and phenotyping
A total of 668 accessions representing a core subset from the University of California Riverside (UCR) cowpea germplasm collection were genotyped with an Illumina GoldenGate assay [11] (1,536-SNPs) in previous studies [10,22]. This SNP information, coupled with available phenotypic and passport data, was used to choose a smaller subset of non-redundant accessions, each of which was highly homozygous, collectively representing the genetic and phenotypic diversity of the larger core collection. A few additional accessions nominated by cowpea researchers and breeders based on regions of origin not represented among the core collection and possessing some specific traits were also included. The minicore of 368 accessions includes landraces and breeding materials from 50 countries in Africa, Asia, North and South America, Europe, and Australia (Table S1) encompassing members of the cultivar groups unguiculata and sesquipedalis. Individual plants were grown from each of the 368 accessions in a greenhouse at UCR for genotyping (see section 2.2) and seed production. Subsequent seed increases for distribution and phenotypic evaluations used seeds directly descended from these genotyped plants.
The UCR Minicore was phenotyped for days to flowering (DTF) in California (USA) under long-days at the UCR Citrus Research Center and Agricultural Experiment Station in Riverside (CA) during the summers of 2016 and 2017, as well as under short days at the UCR Coachella Valley Agricultural Research Station in Thermal (CA) during the autumn of 2016. For the Riverside summer planting, daylight hours (the time between sunrise and sunset) ranged from 14.4 h on 21 June to 11.9 h on 30 September. For the Thermal autumn planting, daylight hours ranged from 12.8 h on 1 September to 9.9 h on 21 December. Due to limited seed availability, 50 seeds of each accession were planted in single rows of 5.5 -6.1 m long with 0.75 m spacing between rows at both locations. Scoring of the minicore was also conducted at two locations in Nigeria during 2017. The minicore was sown in August 2017 at the International Institute of Tropical Agriculture (IITA) experimental field of Malamadori and Minjibir, near Kano, Nigeria. An alpha lattice design with three replications was used. Each accession was assigned to a plot of 2 m length. The distance between two consecutive plots was 0.75 m while two plants within a row were separated by 0.20 m. The fertilizer NPK (15-15-15) was applied two weeks after planting at the rate of 100 kg/ha. To control insect pests, the trial was sprayed with the insecticide Kartodim (Dimethoate 300g + Lambda-cyhalothrin 15g) at the rate of 1.2 l/ha four times: once at each of vegetative and at flower opening and twice during pod maturing. Manual weeding was performed twice to control weeds. At all locations, DTF was scored as the number of days from planting to when 50% of plants had at least one flower opened. Pod load score was recorded at plant maturity using a 1-3 scale with 1 for high pod load (80-100% of peduncles had 2-3 pods per peduncle), 2 for moderate pod load (80-100% of peduncles had 1-2 pods per peduncle) and 3 for poor load score (80-100% of peduncles had 1 or less pod per peduncle). At maturity, dry pods were harvested, and the other above-ground parts of the plant (leaves, twigs and stems) were cut and rolled up. Both the pods and the fodder were sun dried for one week to determine dry pod weight and fodder weight respectively.

SNP genotyping and data curation
Genomic DNA was extracted from young leaves of individual plants using DNeasy Plant Kit (Qiagen, Valencia, California, USA). The Cowpea iSelect Consortium Array including 51,128 SNPs [15] was used to genotype each DNA sample at the University of Southern California Molecular Genomics Core (Los Angeles, California, USA). SNPs were called in GenomeStudio (Illumina Inc., San Diego, California, USA) and manually curated to remove those with high levels (> 25%) of missing data and/or heterozygous calls. The highest percentage of heterozygosity for any accession was 3.83% (Cp 4906), and 96% of the accessions had heterozygosity levels below 1% (Table S2). The final dataset included 48,425 polymorphic SNPs, 47,334 with known physical positions [23], on the 368 minicore samples (Table S2).

Genetic diversity, population structure and linkage disequilibrium analyses
Expected heterozygosity (He) and nucleotide diversity (π) values were calculated for all 48,425 SNPs in the minicore as a measure of genetic diversity. He was calculated as , where Pi is the frequency for the i th allele among a total of k alleles. π was evaluated as in Xu et al. [6].
The admixture model implemented in STRUCTURE v2.3.4 [24] was used to infer population structure of the UCR Minicore. Only SNPs with minor allele frequency (MAF) ≥ 0.05 were included in the analysis. STRUCTURE was first run three times for each hypothetical number of subpopulations (K) between 1 and 10, with a burn-in period of 10,000 followed by 10,000 Monte Carlo Markov Chain (MCMC) iterations. LnP(D) values were plotted and ΔK values were calculated according to Evanno et al. [25] to estimate the optimum number of subpopulations. Plots were generated with Structure Harvester [26] ( Figure S1). Then, a new run using a burn-in period of 100,000 and 100,000 MCMC iterations was conducted for the estimated K to assign accessions to subpopulations based on a membership probability ≥ 0.80. In addition, principal component analysis (PCA) and linkage disequilibrium (LD) analysis were conducted on the same SNP set using TASSEL v5 [27].
Linkage disequilibrium (LD) was estimated for each chromosome as the correlation coefficient r 2 between pairs of SNPs, using SNPs with MAF ≥ 0.05 (42,711 SNPs). The decay of LD over physical distance was investigated by plotting pair-wise r 2 values and generating a locally weighted scatterplot smoothing (LOESS) curve in R. LD decay distance was determined when r 2 fell to the critical threshold estimated from the 95 th percentile r 2 distribution for unlinked markers (r 2 = 0.15).

Genome-wide association studies (GWAS)
The mixed linear model [28] implemented in TASSEL v.5 [27] (http://www.maizegenetics.net/tassel) was used for GWAS, with a population structure matrix (for K = 6) and a kinship coefficient matrix accounting for population structure and the relatedness of accessions, respectively. A false discovery rate (FDR; [29]) threshold of 0.01 was used to identify significant associations. The percentage contribution of each SNP to the total phenotypic variation was calculated using marker R 2 values from TASSEL multiplied by 100. Candidate genes within significant regions were identified from annotations of the cowpea reference genome v1.1 [23] (https://phytozome.jgi.doe.gov/).

Diversity and linkage disequilibrium in the UCR Minicore
The UCR Minicore was developed to represent available genetic and phenotypic diversity of cultivated cowpea while maintaining a sample size that can be managed by most researchers and breeders for evaluating traits of interest. This minicore is composed of 368 accessions from 50 countries including 242 landraces, 98 breeding lines, three accessions categorized as "weedy", and 25 accessions that are uncategorized (Table S1). It largely overlaps with the IITA minicore, with 233 accessions in common (Table S1), at least by name though not necessarily by genetic identity. The UCR Minicore showed great phenotypic diversity in pod and seed types and colors, flowering time and maturity, leaf shape, and plant architecture, among other traits. Figure 1 illustrates some of the morphological variation existing in the UCR Minicore. Genotyping with the 51,128-SNP array [15] enabled analyses of genetic diversity in the minicore. 94.7% of those SNPs (48,425) were polymorphic in the dataset. Expected heterozygosity (He) and nucleotide diversity (π) were calculated and both averaged 0.313 (the maximum for a biallelic SNP is 0.5).
The LD decay distance was also investigated for each cowpea chromosome using SNPs with MAF ≥ 0.05. The decay of LD with physical distance differed among chromosomes and ranged from 809 kb on Vu05 to 4,705 kb on Vu10 ( Figure S2). In addition to Vu05, the LD decay distance was shortest in Vu08 (813 kb) and Vu07 (1,048 kb), while Vu01 and Vu04 had the largest decay distances after Vu10 (3,803 kb and 3,286 kb, respectively; Figure S2). On average there was one SNP per 14.9 kb, indicating sufficient marker density for GWAS.

Population structure of the UCR Minicore
A total of 42,711 SNPs with MAF ≥ 0.05 were used to evaluate population structure in the UCR Minicore. STRUCTURE [24] was run for K = 1-10 and the inspection of both the estimated log probabilities of the data and the ∆K values calculated as in Evanno et al. [25] supported the presence of six genetic subpopulations ( Figure S1). Accessions were assigned to each subpopulation using a membership coefficient ≥ 0.8 (accessions with membership coefficients less than 0.8 were considered "admixed"; Table  S1). PCA also showed a clear separation of the six subpopulations on the first three principal components ( Figure S3). Subpopulation 1 included 31 accessions mostly from West Africa, 29 of which are landraces (Table  S1). Subpopulation 2 was the smallest subpopulation by number of accessions: it included 12 breeding lines developed at the International Institute of Tropical Agriculture (IITA) in Nigeria. Subpopulation 3 was composed mostly of landraces from Mediterranean countries including Egypt, Italy, and Portugal as well as accessions from California and Puerto Rico (Table S1). Subpopulation 4 included 27 accessions that are mostly landraces from India, China and Papua New Guinea, among other countries. Passport data and visual examination of accessions in Subpopulation 4 indicated that they belong to the sesquipedalis cultivar group (yardlong beans). Subpopulation 5 contained 32 accessions from West Africa, most of which are landraces (Table S1). The largest subpopulation was Subpopulation 6 (69 accessions), which was composed primarily of landraces from Southeastern Africa. The remainder of the accessions (165) were considered "admixed" (Table S1). Figure 2 shows the worldwide distribution of the six subpopulations, with each pie plot representing the proportion of the six subpopulations contributing to accessions in each country. Accessions within the USA were divided further based on the state they belonged to. Each accession from California is represented on the map, while the rest of the USA accessions were grouped together due to the lack of passport information about the state of origin for most of them. Their cultivar names, however, traced to U.S. southern states (Table S1). The figure graphically shows that while all subpopulations are present in West Africa, three of them (1, 2 and 5) are predominant. Accessions from Southeastern Africa have ancestry primarily from Subpopulation 6, while most germplasm from Mediterranean countries belonged to Subpopulation 3. Interestingly, a closer look at the germplasm from the USA indicates that accessions from California have ancestry mostly from Subpopulation 3, while accessions from other U.S. states are predominantly Subpopulation 6 ( Figure 2). Subpopulation 4 is primarily from countries in Asia and Oceania.

GWAS of agronomic traits
GWAS was conducted for four different agronomic traits including days to flowering (DTF), pod load score, dry pod weight, and dry fodder weight using 42,711 SNPs with MAF ≥ 0.05 (see Materials and Methods). DTF was evaluated in five different environments in the USA and Nigeria, three of which are considered short-day environments (< 12h of daylight) while the other two are considered long-day environments (> 12h of daylight) ( Table 1, Materials and Methods). Pod load score, dry pod weight, and dry fodder weight were evaluated in one environment in Nigeria. Significant marker-trait associations were identified for all traits and environments (Figure 3 and 4; Table 1 and 2; Table S3 and S4).

Days to flowering
Considering all five environments, 91 significant SNPs at 40 genomic regions were identified for DTF ( Figure 3, Table 1, Table S3). Among those 40 significant QTLs, 26 were associated with DTF under short days in Thermal (California, USA) and both Malamadori and Minjibir (Nigeria), while 14 were associated with DTF under long days in Riverside (California, USA) ( Figure 3, Table 1, Table S3). The phenotypic variation of significant SNPs ranged from 5 -9% (Table 1, Table S3).  Five of the significant genomic regions corresponded to previously reported QTLs for DTF ( Figure  3; Table 1; [30,31]) while the rest are novel (Table 1; Figure 3). Those five QTLs were investigated further by considering gene model annotations in Phytozome (phytozome.jgi.doe.gov/) and the Legume Information System (LIS; www.legumeinfo.com). The resolution of QTL positions was improved in the present work using higher-density genotyping than previously, together with a larger and more diverse set of accessions. Several candidate genes stood out for these five QTLs, as follows. One of the previously reported QTL was identified in Thermal 2016 and is located at the beginning of Vu05 (371,674 bp; Table 1). Just 47 kb away from that position, another QTL was identified in Minjibir 2017 between 324,586 bp and 325,016 bp (Table  1). These coordinates coincide with a QTL for DTF identified in a cowpea MAGIC population also under short days [30] and hence they are likely to represent the same QTL. Only six genes were found within this region (324,586 -371,674 bp) including a cluster of four genes (Vigun05g004000, Vigun05g004100, Vigun05g004200 and Vigun05g004300) encoding the flowering locus protein T. Another of these five DTF QTLs was identified by a SNP at 5,412,353 bp on Vu04 under short days in Minjibir in 2017 (Table 1; Figure 3). This position is contained within a QTL for flowering time under short days identified in the cowpea MAGIC population by Huynh et al. [30]. Genes near the significant SNP (2_36725) include Vigun04g057300, encoding a circadian clock coupling factor ZGT, ~200 kb from this SNP. Vigun04g057300 is an ortholog of the Arabidopsis Empfindlicher im Dunkelroten Licht 1 (EID1) gene involved in the regulation of phytochrome-A light signaling [32].
Another of the five previously reported QTLs affects flowering time under long-days. A single significant SNP (2_22037) was identified at 5,939,066 bp on Vu09 in Riverside in 2016 (Table 1; Figure  3). This position matches a QTL for DTF identified previously in the same environment in the MAGIC population [30] as well as a flowering time QTL identified in a RIL population derived from a cultivated x wild cross (CFt9; [31]). Vigun09g059700, which encodes a MADS-box protein and is orthologous to the Arabidopsis AGAMOUS-LIKE 8 (AGL8) gene also known as FRUITFULL (FUL), was identified 140 kb from this SNP. Lastly, another of these five QTLs was associated with flowering under long days. This locus was detected at position 37,711,373 on Vu11 for the Riverside 2017 environment (Table 1; Figure  3), which is contained within a flowering time QTL previously identified in the same environment using the cowpea MAGIC population [30]. Two cowpea genes, Vigun11g169600 and Vigun11g169400, encoding AP2/B3-like transcription factor family proteins were identified at 128 kb and 139 kb from the peak SNP, respectively. These genes are orthologous to the Arabidopsis REDUCED VERNALIZATION RESPONSE 1 (VRN1), a major gene involved in regulation of flowering and vernalization response.
In addition to these previously reported cowpea flowering time QTLs, it is worth mentioning three other QTLs that contain or are located near genes with clear roles in flowering. One is the QTL identified in Thermal in 2016 between 7,227,977 -7,307,028 bp on Vu05 (Table 1; Figure 3). The cowpea gene Vigun05g077400, encoding a MADS-box protein orthologous to the Arabidopsis AGAMOUS-like 20 (AGL20) gene, is located 43.7 kb from the peak SNP (2_19172) for this QTL. The second one is a QTL region spanning 470 kb (41152886 to 41623157 bp) on Vu09 that was identified in Riverside in 2017, a long-day environment. This region contains 69 genes, among which Vigun09g244300, encoding a protein of the BES1/BZR1 family, was found. This gene is orthologous to the Arabidopsis BRASSINAZOLE-RESISTANT 1 (BZR1), which positively regulates the brassinosteroid signaling pathway [33]. The third QTL of interest was also identified in Riverside in 2017; it was located on Vu03 between 9,053,619 and 9,060,012 bp (Table 1; Figure 3). The cowpea gene Vigun03g104200, which is an ortholog of the soybean E6, a main flowering and maturity gene, was identified 21 kb upstream of the peak SNP (2_24857).

Pod load score, dry pod weight, and dry fodder weight
Pod load score, dry pod weight and dry fodder weight were evaluated in Minjibir in 2017 (Nigeria). Correlations between these traits were calculated using Pearson's correlation coefficient. A strong positive correlation was observed between dry pod weight and dry fodder weight (0.93), while a moderate negative correlation was observed between pod load score and dry fodder weight (-0.45) as well as between pod load score and dry pod weight (-0.48). Note that lower pod load score numbers were given to plants with higher pod loads (see Methods).
A single QTL was identified for pod load score and dry pod weight on Vu04, while two QTLs on Vu04 were identified for dry fodder weight (Table 2; Figure 4; Table S4). Interestingly, the major QTL for dry fodder weight coincides with the QTLs for pod load score and dry pod weight (Table 2; Figure 4; Table  S4). The colocation of these QTLs together with the correlations between the three traits suggests pleiotropic effects of a single gene or the existence of closely linked genes.  Genes within this common QTL region which spans 125 kb were explored. Eleven genes were identified, seven of which encode members of the cyclic nucleotide-gated ion channel (CNGC) family of proteins. Four of those seven genes (Vigun04g039300, Vigun04g039400, Vigun04g039800 and Vigun04g039900) are orthologs to the Arabidopsis CNGC20 gene, also called CNBT1 (cyclic nucleotidebinding transporter 1), while the other three (Vigun04g039500, Vigun04g039600 and Vigun04g039700) are orthologs to the Arabidopsis CNGC19 gene. The encoded ion channel proteins mediate signaling pathways involved in responses to abiotic and biotic stresses [34], including response to herbivores, nematodes and heavy metals among others [35][36][37] (see Discussion).

Population structure and historical crop dispersal
The UCR Minicore and its associated high-density SNP data (51,128 SNPs) constitute a powerful combination of material and information resources to support genetic diversity analyses of cultivated cowpea. Six subpopulations were identified in this minicore, all of which were represented to at least some extent in West African material ( Figure 2). This indicates that West Africa is a center of diversity for cultivated cowpea, as suggested by previous studies [13,38,39]. Five of the six subpopulations were composed mostly of landraces, while Subpopulation 2 included cowpea lines only from IITA's cowpea breeding program. Based on how the six subpopulations split at different K numbers, as illustrated in Herniter et al. [40], it appears that Subpopulation 2 is the result of crosses between materials from Subpopulations 1 and 5. Subpopulations 1 and 5 are composed almost exclusively of West African accessions. A closer inspection of landraces from Subpopulations 1 and 5 revealed consistent differences in several traits. For example, Subpopulation 1 accessions flowered earlier than those in Subpopulation 5 (8 days earlier on average in short-day environments) and had decreased photoperiod sensitivity (most Subpopulation 5 accessions did not flower under long-days) (Table S1). Other work reported that Subpopulation 1 had much more pod shattering than Subpopulation 5 accessions [21]. In addition, all accessions in Subpopulation 1 have smooth seed coats, while most in Subpopulation 5 have rough coats (data not shown). Seed coat texture is an important quality trait that influences seed end-use; rough seedcoated varieties are preferred for food preparations requiring seed coat removal (e.g., making of Akara) as rough seed coats can be easily removed after soaking, while varieties with smooth seed coats are often preferred when cowpea is consumed as boiled intact seed [41]. Also, rough seed coat types imbibe water quicker and generally have reduced cooking times compared to smooth seed coat types.
Genotyping of the UCR Minicore has shed light on the history of cowpea in USA. Landraces and their breeding derivatives from California belong to Subpopulation 3, which is composed mostly of landraces from Mediterranean countries, while accessions from other U.S. states were predominantly from Subpopulation 6, which is composed mostly of landraces from Southeastern Africa. This population structure, together with textual evidence summarized by Herniter et al. [40], is consistent with a global dispersal of cowpea from its centers of domestication in West and East Africa along historical trade routes. For the USA, cowpea appears to have arrived through at least two distinct introduction routes. It is believed that in the US Southwest, cowpea was first introduced by the Spanish explorer Hernando de Alcorón in 1540 going northward from Mexico, possibly followed in the late 1600's by the Jesuit monk Eusebio Kino, including accessions that were popular in the Mediterranean basin during those times [40,42]. In the Southeastern United States, cowpea seems to have been brought on slave ships, perhaps as provisions. The two distinct introduction routes of apparently genetically distinct cowpeas contrast with older studies predating genotyping, which have often assumed a single introduction. This conjecture of more than one route of cowpea introductions into the USA, and genetic distinctions between them that now are evident, is further supported by the findings of Carvalho et al. [14], which showed relationships between a Cuban and Mediterranean accessions, and between sub-Saharan African and South American accessions, which represent yet another route of colonial-era dispersal of cowpeas.
Although the germplasm utilized in this study largely overlaps with that utilized by Huynh et al. [10], only two genetic clusters were identified in that previous study which correspond to the two major "West Africa" and "Southeastern Africa" gene pools. As shown by Herniter et al. [40] utilizing this same UCR Minicore material, at K=2 Subpopulation 6 (Southeastern Africa gene pool) splits from the rest of the subpopulations, confirming that as a primary genetic differentiation between subpopulations of domesticated cowpeas. The smaller number of SNPs that were available for Huynh et al. [10] to conduct population structure analyses, together with an emphasis on African landraces among accessions genotyped, most likely precluded further subdivision of the West African subpopulation.
Carvalho et al. [14] has been the only previous study to genotype a set of cowpea accessions at a high density (51,128 SNPs). Although their focus was on genetic diversity of Iberian Peninsula cowpeas, results from that study largely agree with the findings reported in the present work. In particular, Subpopulations 1, 2, and 3 from the study of Carvalho et al. [14], correspond to Subpopulations 4, 3 and 6 reported here, respectively. Their fourth subpopulation was composed only of four accessions from West Africa. This small representation of germplasm from West Africa certainly would preclude the detection of additional subpopulation structure present in that region.

Flowering time
Flowering time is one of the most important agronomic traits, which affect environmental adaptation and yield potential. It is controlled by multiple genes via different pathways, and it is influenced by environmental conditions [43].
Cowpea is generally a short-day plant and, although some accessions are day-neutral (photoperiodinsensitive), many cowpea genotypes are photoperiod sensitive and show a delay in flowering under long day conditions [44,45]. The degree of photoperiod sensitivity can vary between accessions and is influenced by temperature [45]. Understanding the underlying genetic factors of cowpea flowering time is important for the use of diverse germplasm to customize varieties for different environments.
GWAS using the UCR Minicore has enabled the identification of many loci and meaningful candidate genes associated with flowering under both short and long days. Five of the significant regions coincide with QTLs reported in previous studies [30,31]. Overall, these results are consistent with polygenic control of flowering time in cowpea.
One of the main flowering time regions identified under short-day conditions (and two different environments) was on Vu05. A cluster of four genes (Vigun05g004000, Vigun05g004100, Vigun05g004200 and Vigun05g004300) annotated as FLOWERING LOCUS T (FT) are in this region. Vigun05g004000 and Vigun05g004100 are orthologous to the Arabidopsis flowering gene TWIN SISTER OF FT (TSF; AT4G20370.1), which is a close relative of FT, whereas Vigun05g004200 and Vigun05g004300 are orthologs of the Arabidopsis FT (AT1G65480.1) gene. TSF and FT are main floral pathway integrators and play overlapping roles in the promotion of flowering [43,46,47]. They share a similar mode of regulation, and overexpression of both FT and TSF results in precocious flowering [46,47]. The identification of multiple copies of FT genes in the cowpea reference genome [23] suggests that copy number variation could play an important role in the regulation of flowering time in cowpea, as reported for other crops [48,49].
Another main flowering time locus identified under short days was located near Vigun04g057300, which is an ortholog of the Arabidopsis gene EID1. EID1 encodes an F-box protein that is a negative regulator in phytochrome A (phyA)-specific light signaling [50]. Mutations in EID1 causing the deceleration of the circadian clock have been selected during tomato domestication [51]. In Arabidopsis, EID1 mutations caused alterations in flowering induction [32].
A third QTL identified under a short-day environment on Vu05 was near Vigun05g077400, which is an ortholog of the Arabidopsis AGAMOUS-like 20 (AGL20) gene. AGL20 is an integrator of different pathways controlling flowering and is considered a central component for the induction of flowering [52,53]. Overexpression of AGL20 in Arabidopsis suppresses late flowering and delays phase transitions from the vegetative stages of plant development [52,53].
Under the long-day conditions of Riverside (CA, USA), most of the accessions in the UCR Minicore showed a delay in flowering, and about one fourth of the minicore did not flower (Table S1). Under long days, a major flowering time QTL was identified on Vu09 near Vigun09g059700, an ortholog of the Arabidopsis AGL8/FUL gene. This gene encodes a MADS-box family transcription factor that regulates inflorescence development and is negatively regulated by APETALA1 in Arabidopsis [54]. In addition, AGL8/FUL promotes floral determination in response to far-red-enriched light [55]. Loss of AGL8/FUL function in Arabidopsis caused a delay in flowering time [56,57], suggesting that the cowpea ortholog of FUL (Vigun09g059700) could similarly be involved in the response to photoperiod.
Another region associated with flowering under long day was identified near genes Vigun11g169600 and Vigun11g169400. Both genes encode AP2/B3-like transcription factors that are orthologs to the Arabidopsis VRN1 gene. VRN1, a close homolog of FUL, promotes flowering after prolonged cold (vernalization) in different plant species [58][59][60][61][62]. It is intriguing to identify a vernalization-pathway gene in a warm-season legume that does not need to undergo vernalization before flowering. However, vernalization pathway genes have been identified in other warm-season plants including soybean [63]. In particular, Glyma11g13220, which was a homolog of the Arabidopsis VRN1, was responsive to photoperiod as well as to low temperatures in soybean. This vernalization pathway gene could also be functional in cowpea.
The cowpea gene Vigun09g244300, located on Vu09 and encoding a protein of the BES1/BZR1 family, is another strong candidate gene for days to flowering under long-day conditions. Vigun09g244300 is an ortholog of the Arabidopsis BRASSINAZOLE-RESISTANT 1 (BZR1), one of the main regulators of the brassinosteroid signaling pathway [33]. In Arabidopsis, brassinosteroid signaling inhibits the floral transition and promotes vegetative growth. Furthermore, brassinosteroid-deficient mutants cause a strong delay in days to flowering [64]. Lastly, another QTL was found on Vu03 near Vigun03g104200, an ortholog of the soybean E6 gene, which affects both flowering and maturity [65,66]. E6 plays an important role in the long-juvenile trait (delayed flowering) in soybean [67].

Plant productivity traits
Pod load score, dry pod weight and dry fodder weight are important traits related to plant productivity. A cowpea genotype with high pod load (i.e., low pod load score) demonstrates high grain yield potential. High pod load is a result of a high number of pods per plant, which is an indication of low rate of flower abortion. Generally, low flower abortion is associated with resistance to insect attack (typically flower thrips or maruca) and high night temperatures (>18°C) [68]. Negative correlations have been identified between pod load score and dry pod weight, as well as between pod load score and number of pods per plant, and between pod load score and grain yield [69]. The negative correlation between pod load score and grain yield holds true as long as no attack by pod sucking bugs occurs, and in such a situation selection of high grain yielding genotypes can be made using pod load score. However, there are high positive correlations between dry pod weight and grain yield. In some studies, dry pod weight has been negatively correlated with dry fodder weight [70,71] except for some lines with dual purpose characteristics [72,73].
We identified one major QTL associated with pod load score, dry pod weight and dry fodder weight. Seven genes encoding members of the CNGC family proteins were located within the QTL region: four of them (Vigun04g039300, Vigun04g039400, Vigun04g039800 and Vigun04g039900) are orthologs of the Arabidopsis CNGC20 gene, also called CNBT1 (cyclic nucleotide-binding transporter 1), which is involved in the response to nematodes [34,74]. The other three (Vigun04g039500, Vigun04g039600 and Vigun04g039700) are orthologs of the Arabidopsis CNGC19 gene, which is involved in herbivore response [34,36]. Interestingly, this region corresponds to the major Rk locus for root-knot nematode resistance identified in cowpea [75,76]. However, the authors are not aware of any nematode infestation or herbivore damage at the Minjibir field location. Members of the CNGC family of proteins have also been involved in plant tolerance to heavy metals [37]. CNGC20 and CNGC19 play essential roles in the responses to biotic and abiotic stresses and thereby may influence plant productivity. Further studies are necessary clarify any possible role of these cowpea homologs in regulating pod load score, dry pod weight and dry fodder weight. Figure S1. Exploration of the optimal number of subpopulations in the UCR Minicore. Figure S2. Linkage disequilibrium decay for the eleven cowpea chromosomes. Figure S3. Principal component analysis of the UCR Minicore. Table S1. Information on the 368 accessions in the UCR Minicore. Table S2. SNP data from accessions in the UCR Minicore. Table S3. Significant SNPs for DTF in the five different environments. Table S4. Significant SNPs for pod load score, dry pod weight and dry fodder weight.