bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 1 Manuscript in Preparation 2 3 Range-wide differential adaptation and genomic vulnerability in 4 critically endangered Asian rosewoods 5 6 Tin Hang Hung1,*, Thea So2, Bansa Thammavong3, Voradol Chamchumroon4, Ida Theilade5, 7 Chhang Phourin2, Somsanith Bouamanivong6, Ida Hartvig7,8, Hannes Gaisberger9,10, Riina 8 Jalonen11, David H. Boshier1, John J. MacKay1,* 9 1. Department of Biology, University of Oxford, Oxford OX1 3RB, United Kingdom 10 2. Institute of Forest and Wildlife Research and Development, Phnom Penh, Cambodia 11 3. National Agriculture and Forestry Research Institute, Forestry Research Center, 12 Vientiane, Laos 13 4. The Forest Herbarium, Department of National Park, Wildlife and Plant 14 Conservation, Ministry of Natural Resources and Environment, Bangkok, Thailand 15 5. Department of Food and Resource Economics, Faculty of Science, University of 16 Copenhagen, Denmark 17 6. National Herbarium of Laos, Biotechnology and Ecology Institute, Ministry of 18 Science and Technology, Vientiane, Laos 19 7. Forest Genetics and Diversity, Department of Geosciences and Natural Resource 20 Management, University of Copenhagen, Denmark 21 8. Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, 22 Denmark 23 9. Bioversity International, Rome, Italy 24 10. Paris Lodron University, Salzburg, Austria 25 11. Bioversity International, Serdang, Malaysia Page 1 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 26 27 Corresponding authors: 28 *T.H.H.: tin-hang.hung@biology.ox.ac.uk; *J.J.M.: john.mackay@biology.ox.ac.uk 29 30 Classifications: Biological sciences: Ecology 31 Keywords: rosewood, ecological genomics, climate vulnerability, adaptation 32 Page 2 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 33 Abstract 34 In the billion-dollar global illegal wildlife trade, rosewoods have been the world’s most 35 trafficked wild product since 20051. Dalbergia cochinchinensis and D. oliveri are the most 36 sought-after rosewoods in the Greater Mekong Subregion2. They are exposed to significant 37 genetic risks and the lack of knowledge on their adaptability limits the effectiveness of 38 conservation efforts. Here we present genome assemblies and range-wide genomic scans of 39 adaptive variation, together with predictions of genomic vulnerability to climate change. 40 Adaptive genomic variation was differentially associated with temperature and precipitation- 41 related variables between the species, although their natural ranges overlap. The findings are 42 consistent with differences in pioneering ability and in drought tolerance3. We predict their 43 genomic offsets will increase over time and with increasing carbon emission pathway but at a 44 faster pace in D. cochinchinensis than in D. oliveri. These results and the distinct gene- 45 environment association in the eastern coastal edge suggest species-specific conservation 46 actions: germplasm representation across the range in D. cochinchinensis and focused on 47 vulnerability hotspots in D. oliveri. We translated our genomic models into a seed source 48 matching application, seedeR, to rapidly inform restoration efforts. Our ecological genomic 49 research uncovering contrasting selection forces acting in sympatric rosewoods is of 50 relevance to conserving tropical trees globally and combating risks from climate change. 51 Page 3 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 52 Significant statement 53 In the billion-dollar global illegal wildlife trade, rosewoods have been the world’s most 54 trafficked wild product since 2005, with Dalbergia cochinchinensis and D. oliveri being the 55 most sought-after and endangered species in Southeast Asia. Emerging efforts for their 56 restoration have lacked a suitable evidence base on adaptability and adaptive potential. We 57 integrated range-wide genomic data and climate models to detect the differential adaptation 58 between D. cochinchinensis and D. oliveri in relevance to temperature- and precipitation- 59 related variables and projected their vulnerability until 2100. We highlighted the stronger 60 local adaptation in the coastal edge of the species ranges suggesting conservation priority. We 61 developed genomic resources including chromosome-level genome assemblies and a web- 62 based application seedeR for genomic model-enabled assisted migration and restoration. Page 4 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 63 Main 64 Rosewoods have been the world’s most trafficked wild product since 2005, amounting 65 to 30–40% of the global illegal wildlife trade1, which is estimated at 7–23 billion USD 66 annually4. Dalbergia cochinchinensis Pierre and D. oliveri Gamble ex Prain are among the 67 most sought-after and threatened rosewood species. Exploited for their extremely valuable 68 timber2, alongside many other valued and threatened tree species in Asia’s tropical and 69 subtropical forests5, the growing demand and limited supply have driven prices as high as 70 50,000 USD per cubic metre6. Both these Dalbergia species were classified as Vulnerable 71 and Endangered in the 1998 IUCN Red List 7,8. The Convention on International Trade in 72 Endangered Species of Wild Fauna and Flora (CITES) has listed the entire Dalbergia genus 73 in its Appendix II since 2017 to reduce sequential exploitation of other closely related 74 species9. In the IUCN's latest re-assessment of their endangered status to Critically 75 Endangered in 202210,11, it is suspected that the populations of both species have already 76 experienced a decline of at least 80% over the last three generations, and the decline is likely 77 to continue12. 78 D. cochinchinensis and D. oliveri are sympatric species, endemic to the Greater Mekong 79 Subregion (GMS) in Southeast Asia, an area of high ecological and conservation concern as 80 84% of the GMS overlaps with the Indo-Burmese mega biodiversity hotspot13. The complex 81 biogeographical and geological histories of the GMS have contributed to its high species 82 richness, heterogeneous landscapes, and high endemism levels14. Ancient changes in the 83 distribution of terrestrial and water bodies have been associated with changes in vegetation 84 types and cover15. These forests contribute substantially to local livelihoods, economies, food 85 security, and human health16,17, though overexploitation undermines their potentially central 86 role to nature-based solutions and most of them are unprotected4. Page 5 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 87 Species- and environment- specific conservation approaches represent an immediate 88 need in response to declining populations5. Conservation, collection, and use of genetically 89 diverse germplasm are key to conserving diversity and restoring these rosewood populations. 90 Genetic conservation actions were started in the early 2000s but were limited in scale, usually 91 including fewer than 50 seed-producing trees per country18–20. Newer capacity-building 92 initiatives targeting tree nurseries and seed value chain development21 may still carry genetic 93 risks associated with the supply and use of germplasm, and may compound the effects of 94 over-exploitation. First, underrepresented genetic diversity during the sourcing of genetic 95 materials can create a genetic bottleneck for the species and reduce the species’ ability to 96 adapt and evolve in a changing climate22. Second, mismatch of habitat suitability can result in 97 maladaptation, if populations have strong local adaptation23. Third, climate change will likely 98 impose new forces of selection on the current genetic diversity, thus reducing the species’ 99 adaptability, affecting population functioning24,25, and leading to increased risk of local 100 extirpations and species’ range collapse26. If unaddressed, these risks will reduce both short 101 and long-term effectiveness of restoration projects. The genetic risks call for an 102 understanding of adaptation and its genetic basis in Dalbergia species in the GMS to 103 safeguard on-going conservation and restoration efforts. Dalbergia are high value species that 104 could be used sustainably and generate income for farmers in developing countries if well- 105 adapted planting material is available5. Planting for economic purposes and reducing risks to 106 remaining natural populations of these species seem necessary, where ecological restoration 107 alone is insufficient. 108 Of the 14,191 vascular plants that are listed as either Vulnerable, Endangered, and 109 Critically Endangered in the IUCN Red List, only 0.1% have their genomes published, far 110 fewer than the 1% reported for listed animals27. There is a critical lack of genomic resources 111 in threatened species and a disproportionate representation across taxa, in contrast with the Page 6 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 112 rapid growth in genomic technologies. New reference genomes in threatened species will 113 enable the analysis, of functional genes, higher-resolution studies of species delineation, 114 association mapping and adaptation, genetic rescue, and genome editing28. These in turn will 115 help to address important conservation (and restoration) questions such as genetic monitoring 116 of introduced and relocated populations, predicting population viability, disease resistance, 117 synthetic alternatives, and de-extinction29,30. 118 This paper develops an unprecedented understanding of adaptation in critically 119 endangered rosewoods, which integrates genomic analyses, the creation of a novel evidence, 120 and a resource base to inform and expand ongoing conservation efforts. (1) We present 121 genome assemblies of D. cochinchinensis and D. oliveri at chromosomal and near- 122 chromosomal scale respectively. (2) We analyse range-wide patterns of adaptation by 123 genotyping ~800 trees, and identify differential drivers of adaptive genetic diversity between 124 the two species by using gene-by-environment association analyses. (3) We project current 125 genotypes onto future climate scenarios and predict the potential maladaptation of 126 populations. (4) We deploy an interactive application to predict optimal seed sources, based 127 on our landscape genomic results, in D. cochinchinensis and D. oliveri for use in restoration 128 under future climate scenarios. Our ecological genomic study in the GMS fills crucial 129 knowledge gaps for genomic adaptation in tropical tree species which are highly 130 underrepresented in the current research literature. 131 132 Chromosome-scale genome characterisation 133 The D. cochinchinensis reference genome assembly (Dacoc_1.4) was 621 Mbp in size 134 comprised of 10 pseudochromosomes (Figure 1a, Supplementary Figure 1, Supplementary 135 Table 1). Whole-genome sequencing of a single seedling of D. cochinchinensis produced 165 136 Gbp (~260 X) long-read data. A diploid-aware draft assembly of 1.3 Gbp with 6,443 contigs Page 7 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 137 and a N50 of 1.35 Mbp was first obtained, with the longest contig between 33.2 Mb at 138 chromosome-arm length. We purged the haplotig and scaffolded the draft genome with 54.97 139 Gbp (~88.52X) Hi-C chromosome conformation capture reads into 511 scaffolds with a N50 140 of 60.0 Mb (Supplementary Table 2). The 10 longest scaffolds were considered 141 pseudochromosomes and 98.3% of the contigs were mapped onto them (Figure 1b). 142 The D. oliveri draft genome assembly (Daoli_0.3) was 689.25 Mbp in size 143 (Supplementary Figure 1, Supplementary Table 3). Whole-genome sequencing of a single 144 seedling of D. oliveri produced 15.13 Gbp (~22X) long-read data. We first obtained a 145 diploid-aware draft assembly of 814.69 Mbp with 3,249 contigs and a N50 of 474.02 Kbp. 146 We purged the haplotig and scaffolded the draft genome with 13.46 Gbp (~20X) Pore-C 147 multi-contact chromosome confirmation capture reads into 2,977 scaffolds with a N50 of 148 38.43 Mbp. Syntenic analysis of the D. oliveri assembly (Daoli_0.3) against the 10 149 pseudochromosomes obtained in D. cochinchinensis (Dacoc_1.4) showed that the 16 largest 150 scaffolds in Daoli_0.3 had 1-to-1 or 2-to-1 correspondences to Dacoc_1.4, implying that 151 Daoli_0.3 was at chromosome-arm length (Figure 1c). 152 We constructed de novo repeat libraries of Dacoc_1.4 and Daoli_0.3, which contained 153 402 Mbp and 453 Mbp of repeat elements respectively (64.80% and 65.71% of the genomes) 154 (Supplementary Table 4, Supplementary Table 5), the majority of which were annotated as 155 containing LTR elements (46.63% and 48.55%) such as Ty1/Copia (15.25% and 15.75%) and 156 Gypsy/DIRS1 (30.51% and 31.96%). The repeat content of the two genomes was 157 significantly higher than the average among Fabids (~49%), which may be due to the near 158 double amount of LTRs (~22%)31. Page 8 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 159 160 Figure 1. (a) Genomic landscape of the 10 assembled pseudochromosomes of D. cochinchinensis (Dacoc_1.4), showing tick 161 marks every 1 Mb, gene density (orange), repeat density (green), 5-mC density (blue), and interchromosomal syntenic 162 arrangement (brown). The densities are calculated in 1-Mb sliding window. (b) High-resolution contact probability map of 163 the final D. cochinchinensis genome assembly after scaffolding, revealing the 10 pseudochromosomes at 100 Kbp resolution. 164 (c) Syntenic dot plot of assemblies of D. oliveri (Daoli_0.3) against D. cochinchinensis with a minimum identity of 0.25. Page 9 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 165 We predicted and annotated 27,852 and 33,558 gene models in Dacoc_1.4 and 166 Daoli_0.3 respectively, using previous RNA sequencing data (Supplementary Table 6) and 167 protein homology of Arabidopsis thaliana and Arachis ipaensis. The gene models had a mean 168 length of 4,284.20 and 3942.71 bp respectively, of which 98.3% and 95.5% had an AED 169 score less than 0.5, considered as strong confidence (Supplementary Figure 2). The gene 170 models had a BUSCO v5.1.2 completeness of 96.2% and 88.3% using the eudicots_odb10 171 reference dataset, with 92.1% and 86.7% being both complete and single copy. 172 173 Range-wide genomic scan for adaptive signals 174 We obtained initial pools of 1,832,629 and 3,377,855 SNPs from genotyping 435 and 175 331 individuals of D. cochinchinensis and D. oliveri respectively, across their natural ranges 176 (Supplementary Table 7), and final pools of 180,944 and 193,724 SNPs after filtering for 177 missing data, minimum allele frequency, and linkage disequilibrium. The samples 178 represented previous sampling work32,33 and new sampling that covered all known existing 179 populations. 180 We employed the sparse non-negative matrix factorisation (sNMF) algorithm to 181 determine the optimal number of ancestral populations (K) for D. cochinchinensis and D. 182 oliveri as 13 and 14 respectively (Supplementary Figure 3, Supplementary Figure 4, 183 Supplementary Figure 5). These results were much higher than the previous estimation of K 184 = 5 – 9 for the same species using nine microsatellite markers and 19 SNPs32,33. The analysis 185 revealed a highly resolved hierarchical genetic structure for both species and distinct 186 population clusters around the Cardamon Mountains in southwest Cambodia and in northern 187 Laos. Our calculation gave a larger genomic inflation factor (λ) in D. cochinchinensis (range 188 from 0.071 (evapotranspiration) to 0.25 (precipitation of driest quarter), mean of 0.13, 189 standard deviation of 0.049) than that in D. oliveri (range from 0.038 (evapotranspiration) to Page 10 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 190 0.081 (mean diurnal range), mean of 0.056, standard deviation of 0.016 (Supplementary 191 Table 8). 192 The numbers of SNPs found to be adaptive for at least one of the environmental 193 variables were 20,373 (11.3%) and 6,953 (3.59%) in D. cochinchinensis and D. oliveri 194 respectively ( | Z-value | > 2 & Q-value < 0.01), after correcting for population structure 195 (optimal K) and genomic inflation (Supplementary Figure 6, Supplementary Figure 7, 196 Supplementary Table 9). Relatively few SNPs were associated with all or many 197 environmental variables; 4 SNPs were associated with 11 out of 13 variables tested in D. 198 cochinchinensis, and 46 SNPs were associated with all 12 variables in D. oliveri. These 199 findings revealed the complex and polygenic nature of environmental adaptation, where 200 multiple forces of natural selection can act together via different environmental cues and 201 affect overlapping loci. 202 In D. cochinchinensis, ‘precipitation in the driest quarter’ was the environmental 203 variable (wc2.1_30s_bio_17) and the strongest gene-environmental association with a SNP 204 on chromosome 3 at position 36,345,659 (LFMM Z = 6.07237, Q = 4.77e-29). The SNP was 205 located within the gene Dacoc08834, a homologue of the Ubiquitin-like-specific protease 1B 206 ULP1B. The highest allele frequencies of this SNP were found in the southwest of Cambodia 207 with the highest precipitation of the driest quarter (Figure 2). ULP1B is one of the ubiquitin 208 like-specific proteases that mediate the maturation and deconjugation of a small ubiquitin- 209 like modifier (SUMO) from target proteins as part of post-translational modification34. The 210 SUMO process in plants has been shown to regulate stress responses including to drought, 211 heat, salinity, and pathogens35–37 and timing of flower initiation38, which might explain the 212 strong association with the drought stress associated with the said environmental factor. In an 213 analysis of transcriptomes from 6 Dalbergia species, ubiquitin-related proteins were found to Page 11 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 214 be overrepresented compared to other legumes27. Taken together, these observations suggest 215 that ubiquitin-related proteins have a role in Dalbergia adaptation to water assimilation. 216 217 Figure 2. (a) The most significant gene-environment association at 36,346,659 bp on chromosome 3, within the Dacoc08834 218 gene and upstream of Dacoc08835 and Dacoc08836 genes, which are homologues of ULP1B, TRX9, and Cbei_0202 219 respectively. (b) and (c) Correlation between allele frequency and wc2.1_30s_bio_17 (Precipitation of driest quarter) for 220 this locus. 221 By contrast, the strongest association in D. oliveri was between precipitation of the 222 wettest quarter (wc2.1_30s_bio_16), and a SNP on the scaffold Daoli_0035 at the position 223 107,725 (LFMM Z = 6.1895, Q = 6.36e-102). The locus was 3,254 bp upstream of a 224 predicted gene model Daoli32516 and 5,010 bp downstream of the gene Daoli32517, a 225 homologue of tatC-like protein YMF16. 226 227 Differential adaptation related to temperature and precipitation 228 Isothermality (wc2.1_30s_bio_3) was identified as the most important overall driver 229 of both neutral and adaptive genomic variation among non-spatial environmental variables in 230 D. cochinchinensis , based on our gradient forest (GF) model (Figure 3, Supplementary Page 12 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 231 Figure 8a), in contrast to ‘precipitation of the wettest quarter’ (wc2.1_30s_bio_16) in D. 232 oliveri (Figure 4, Supplementary Figure 8b). Spatial variables, as principal coordinates of a 233 neighbourhood matrix (PCNM), were the most important variables that explained both 234 neutral and adaptive genomic variation, which was unsurprising given strong isolation by 235 distance was known in these species32 and environmental adaptation only affects a small 236 portion of the genome39. Soil factors were among the lowest ranked variables for gene- 237 environment associations for both species. We observed different patterns of geographic 238 variation in D. cochinchinensis and D. oliveri when fitting the GF models across their native 239 ranges. D. cochinchinensis had strong differentiation between North and South populations at 240 around 16°N, that was mainly driven by isothermality (wc2.1_30s_bio_3) as seen in the PCA 241 loadings. On the other hand, D. oliveri’s major differentiation was between coastal and inland 242 areas, driven by both precipitation of the wettest quarter (wc2.1_30s_bio_16) and mean 243 diurnal range (wc2.1_30s_bio_2 ). The eastern coastal areas in Vietnam showed particularly 244 strong differences in environmental associations with adaptive variation and neutral variation 245 for both D. cochinchinensis and D. oliveri (Figure 5). 246 247 248 Page 13 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 249 250 Figure 3. (a) Adaptive genomic variation across the species range predicted by GF model for D. cochinchinensis, visualised 251 using the first two principal axes from the PCA. (b) Accuracy and R2-weighted importance for environmental predictor 252 variables which explained adaptive genomic variation (adaptive SNPs) by the GF model. (c) Principal component analysis 253 (PCA) of the adaptive genomic variation predicted by the GF model across the species range. Loadings are the 254 environmental factors. Page 14 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 255 256 Figure 4. (a) Adaptive genomic variation across the species range predicted by GF model for D. oliveri, visualised using the 257 first two principal axes from the PCA. (b) Accuracy and R2-weighted importance for the environmental predictor variables 258 which explained the adaptive genomic variation (adaptive SNPs) by the GF model. (c) Principal component analysis (PCA) 259 of the adaptive genomic variation predicted by the GF model across the species range. The loadings are the environmental 260 factors. Page 15 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 261 262 Figure 5. Procrustes residuals between neutral and adaptive gene-environmental associations for (a) D. cochinchinensis 263 and (b) D. oliveri. 264 We compared the allelic frequency turnover functions of the neutral and adaptive 265 genomic variation for each environmental predictor variable. Adaptive genomic variation was 266 significantly more strongly associated with environmental gradients than neutral variation 267 (Supplementary Figure 9). There was only one exception, where available soil water capacity 268 at a depth of 60 cm (s_AWCh1_sl5) was near-zero but of similar importance in explaining 269 neutral and adaptive variation, regardless of the environmental gradient. 270 When exposed to drought stress under controlled conditions, D. cochinchinensis was 271 more anisohydric than D. oliveri, which means that D. cochinchinensis, as a pioneering 272 species with faster growth, optimises carbon assimilation and better tolerates reduced water 273 availability3. D. oliveri is often found in moist areas and along streams and rivers40, and the 274 morphological characteristics of its seeds suggest that secondary dispersal by water is likely32. 275 This could explain how isothermality, which is a useful metric in tropical environments41 and 276 shown to influence plant height growth42, had a dominant effect in the adaptive variation only 277 in D. cochinchinensis. Pioneering species maximise height growth in early successional 278 habitats to meet their light requirements43, consistent with the observation of higher 279 photosynthetic pigment levels in D. cochinchinensis3. On the other hand, the effect of Page 16 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 280 precipitation of the wettest quarter could act on selection in seed dispersal and survival in D. 281 oliveri in the wet season. Temperature and precipitation, and their variability such as 282 isothermality44 have been widely reported as the most important drivers shaping patterns of 283 productivity and adaptation in tree species across the world45–47. 284 To fill the current gaps in existing conservation actions, populations that are 285 underrepresented but display distinct adaptive variation should be prioritised to avoid the 286 potential loss of unique genetic diversity. Populations at the edge of the species ranges should 287 be prioritised based on our findings on adaptive variation showing their distinct allelic 288 frequencies and adaptation; however, they are currently underrepresented in conservation 289 efforts and existing protected area networks. Importantly, hotspots of differential adaptive 290 variation near the edges of species ranges are shared between D. cochinchinensis and D. 291 oliveri. This observation reinforces the role of marginal populations in preserving 292 evolutionary potential for range expansion and persistence due to their adaptation to distinct 293 environmental conditions48. 294 295 Genomic vulnerability under different climate change scenarios 296 Genetic offset in the form of Euclidean distance represented the mismatch between 297 current and future gene-environment association, which was modelled over five general 298 circulation models (GCMs), namely MIROC6, BCC-CSM2-MR, IPSL-CM6A-LR, CNRM- 299 ESM2-1, MRI-ESM2-0, under WCRP CMIP6 (Supplementary Figure 10). For both 300 Dalbergia species, genetic offset generally increased over time (P = 2.71e–10) and shared 301 socioeconomic pathway (P = 4.54e–14), which implies increased carbon emission (Figure 6a, 302 Supplementary Table 10). However, D. cochinchinensis shows a significantly larger increase 303 in genetic offset over time compared to D. oliveri (P = 0.025), suggesting that D. 304 cochinchinensis is more susceptible to any mismatch of current genotypes and future climate. Page 17 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 305 The geographic patterns of genetic offset also differed between the two species: D. 306 cochinchinensis had an increasing offset across the entire range, while D. oliveri had a 307 distinctly high offset in the southeast part of the range (Figure 6b–c). The variation in 308 genomic offset between two species was mainly driven by the strong association with 309 isothermality (wc2.1_30s_bio_3) in D. cochinchinensis, as demonstrated in the GF model, as 310 it contributed to ~75% of the genomic offset on average (Figure 6d). Isothermality had a 311 smaller effect (~35%) in D. oliveri (Figure 6e). 312 Our prediction contrasts with a separate sensitivity-and-exposure modelling study 313 which predicted that D. oliveri is likely to be slightly more vulnerable to climate change by 314 2055 (2041–2070 period) than D. cochinchinensis12. It used growth rate and seed weight as 315 proxy traits, predicting that both species have equally high sensitivity to climate change, but 316 that D. oliveri is more exposed to the threat. Our findings predict that the dominant 317 environment factor of isothermality could give more weight to the species’ vulnerability. As 318 discussed, isothermality is likely to affect the productivity and growth in pioneering species 319 like D. cochinchinensis more than later successional species like D. oliveri. Our work 320 supports that isothermality and other temperature variation factors will serve as more reliable 321 indicators to predict the climate response of D. cochinchinensis and encourages further 322 studies of this response, such as greenhouse or common garden experiments to validate the 323 prediction with empirical data. 324 The different geographical patterns of genomic vulnerability support species-specific 325 recommendations in conservation and restoration. While climate change is likely to affect D. 326 cochinchinensis evenly across its range, greater attention is needed on the representation of 327 adaptive variation in germplasm collection and conservation units; sampling should target 328 edge populations in particular as they show potential signals of local adaptation, where the 329 environmental associations between adaptive and neutral variation are the greatest. By Page 18 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 330 contrast, we recommend targeting hotspots of vulnerability in D. oliveri, especially around 331 the borders between Cambodia, Laos, Vietnam, and Thailand, to improve conservation 332 efforts. 333 In a rapidly changing environment, forest trees either persist through migration or 334 phenotypic plasticity, or will extirpate45 when environmental change outpaces adaptation 335 potential. The spatially explicit model of genomic vulnerability helps to develop conservation 336 decisions balancing between in situ adaptation and assisted migration, as populations with 337 lower vulnerability are likely to persist through adaptation49. Page 19 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 338 339 Figure 6. (a) Absolute genomic offset of gene-environment association, quantified as the Euclidean distance, of D. 340 cochinchinensis and D .oliveri in 4 SSPs (126, 245, 370, and 585) over three bidecades (2041–2060, 2061–2080, 2081– 341 2100) averaged across five GCMs (BCC-CSM2-MR, CNRM-ESM2-1, IPSL-CM6A-LR, MIROC6, MRI-ESM2-0). Scaled 342 genomic offset across the range of (b) D. cochinchinensis and (c) D. oliveri, using SSP585 between 2041 and 2060 as an 343 example. Proportion of genomic variation explained by environmental variables in (d) D. cochinchinensis and (e) D. oliveri. Page 20 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 344 Genomic model-enabled assisted migration and restoration 345 We developed seedeR, an open-source web application that is freely available from 346 https://trainingidn.shinyapps.io/seedeR/, where users can input the species (D. 347 cochinchinensis or D. oliveri), shared socioeconomic pathways (SSP), time period, and 348 geographical coordinates of the target restoration or planting site. With these inputs, seedeR 349 predicts the genomic similarity between a current germplasm source and target site from 350 allelic frequency turnover functions and genetic offset and projects them onto the species 351 range. We demonstrate the utility of seedeR for a hypothetical target restoration site (106° N, 352 14° E) in northeast Cambodia for both D. cochinchinensis and D. oliveri, under the future 353 climate scenario of SSP370 between 2081 and 2100 (Figure 7). In both predictions, the 354 genomic similarity was the highest at proximity to several hundreds of kilometres and 355 decreased when further away. Commonly, coastal regions in northeast Vietnam, which were 356 predicted to have the strongest local adaptation in both species, showed a lower genomic 357 similarity. The geographical scale of suitable seed sources has an important implication as too 358 many forest landscape projects collect seeds from very close (a few kilometres) to restoration 359 sites to feed the “local is best” paradigm50, while our predictions showed otherwise. It is also 360 important to note that local tree populations in landscapes in need of restoration are often 361 degraded and have low genetic diversity. Genetic quality of seed should be ensured by 362 collecting seed from large populations and many unrelated trees, even if this means collecting 363 from trees at distances much further from the target restoration site. Page 21 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 364 365 Figure 7. Genomic similarity (scaled between 0, most dissimilar, and 1, most similar) between a hypothetical future 366 restoration site (106° N, 14° E) and the current potential germplasm sources under the future climate scenario of SSP370 367 between 2081 and 2100 for (a) D. cochinchinensis and (b) D. oliveri predicted on seedeR 368 (https://trainingidn.shinyapps.io/seedeR/). 369 Matching seed sources and restoration sites remains one of the keys for effective 370 conservation and restoration51, in line with the importance of adaptive variation and potential 371 in genetic materials. Our genome-enabled prediction tool considers the future climate of 372 restoration sites, which in turn will greatly influence the future resilience and productivity of 373 these species. In the case of maladaptation and extirpation due to environmental change52, 374 when the classical preference for local provenance may no longer hold, deliberate transfer of 375 germplasm along climate gradients may be necessary53. Especially in the case of Dalbergia, 376 when many local populations have extirpated or are very small in size, and large 377 environmental association was predicted, assisted migration based on admixture and 378 predictive provenancing are deemed more appropriate for the species to facilitate adaptation 379 of the populations under climate change54. Genetic materials from regions with strong 380 adaptive genomic variation, such as coastal Vietnam, can be moved to suitable regions using 381 the seedeR prediction to facilitate gene flow and maintain unique genetic components of the 382 population by admixture53. Hotspots of vulnerable populations such as those in northern 383 Cambodia are suitable to be moved to new suitable areas to prevent loss of genetic diversity. Page 22 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 384 The seedeR application helps to visualize these spatially explicit predictive models of 385 genomic vulnerability and match, which are most useful to frontline practitioners and 386 managers55. Not only can it inform conservation and management strategies, but by 387 simplifying the analytical pipelines through a user-friendly platform, it will also directly 388 reduce the gap between conservation and genomics; a challenge faced for dissemination of 389 genomic knowledge56. 390 391 Narrowing the gap between conservation and genomics 392 Our study characterises range-wide gene-environment association in two sympatric 393 endangered species, D. cochinchinensis and D. oliveri, for which there was virtually no prior 394 knowledge on adaptability. Building on previous understanding of their different 395 physiologies, we demonstrate their differential adaptive characteristics, which point to 396 species-specific implications for their conservation. These findings on differential genomic 397 adaptation between sympatric species sheds novel understanding on tropical forests, which in 398 particular harbour many threatened species, at risk from threats associated with climate 399 change. 400 We show how genomic technologies can directly support rapid decision-making and 401 conservation activities. The separation between scientific and conservation communities 402 represents a long-standing challenge, such that advances in scientific research and 403 specifically genomic technologies are often inaccessible to the conservation side, which 404 hinders translational science57,58. Through engagement with diverse stakeholders and 405 conservation activities, we were strongly motivated to deliver the results of this study in a 406 user-friendly (e.g. seedeR) and spatially explicit manner that can be integrated with ongoing 407 conservation work. Page 23 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 408 Methods 409 Plant materials and sample preparation for genome assemblies 410 Dried seeds of Dalbergia cochinchinensis and D. oliveri were collected from the 411 Bolikhamxay, Khamkend, Laos, and Phnom Penh, Cambodia in 2018 by their forestry 412 authorities respectively. We germinated the seeds in a greenhouse at 30°C with 16L/8D 413 photoperiod. Leaf tissues were harvested from a selected 1-year-old individual for each 414 species and ground in liquid nitrogen with a mortar and pestle. 415 High-molecular-weight genomic DNA was extracted from the reference individual 416 with Carlson lysis buffer (100 mM Tris-HCl, pH 9.5, 2% CTAB, 1.4 M NaCl, 1% PEG 8000, 417 20 mM EDTA) followed by purification using the QIAGEN Genomic-tip 500/G. The 418 quantity and quality of genomic DNA were determined with NanoDrop 2000 (Thermo, 419 Wilmington, United States) and Qubit 4 (Thermo Fisher Scientific, United Kingdom). DNA 420 integrity was preliminary assessed with a 0.4% agarose gel against a NEB Quick-Load® 1 kb 421 Extend DNA Ladder. A DNA sample passed the quality check only when a single band could 422 be mapped near a lambda DNA band (~ 48.5 kb). 423 424 Genomic sequencing and assembly of D. cochinchinensis 425 For Oxford Nanopore sequencing, 9 µg of extracted DNA was size-selected using the 426 Circulomics Short Read Eliminator XL Kit (Maryland, United States) to deplete fragments < 427 40 Kbp. Three libarires were prepared each starting from 3 µg of size-selected DNA was 428 used in each library preparation with the Oxford Nanopore Technologies Ligation 429 Sequencing Kit (SQK-LSK110). The libraries were sequenced on two R10.3 (FLO-109D) 430 flow cells on a GridION sequencer for ~ 72 hours. Real-time basecalling was performed in 431 MinKNOW release 19.10.1. Raw reads with Phred score lower than 8 were filtered. Page 24 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 432 For PacBio sequencing, DNA samples were sent to the Genomics & Cell 433 Characterization Core Facility at the University of Oregon for DNA library preparation and 434 sequencing. Throughout the sample preparation, the quality of DNA was assessed using 435 Fragment Analyzer 1.2.0.11 (Agilent, United States). 20 µg of unsheared genomic DNA was 436 used for library preparation using the SMRTbell Express Template Prep Kit 2.0 (Pacific 437 Biosciences, United States). The library was size selected using the BluePippin system (Sage 438 Science, United States) at 45 kb and then sequenced on a single SMRT 8M cell on a Sequel II 439 System (2.0 chemistry) using the Continuous Long-Read Sequencing (CLR) mode with a 440 movie time of 30 hours. 441 For Hi-C sequencing, we harvested 0.5 g of fresh leaf from the same reference 442 individual and immediately cross-linked the finely chopped tissue in 1% formaldehyde for 20 443 minutes. The cross-linking was then quenched with glycine (125 mM). The cross-linked 444 samples were ground in liquid nitrogen with a mortar and pestle and shipped to Phase 445 Genomics (Seattle, USA) for library preparation and sequencing. The Hi-C library was 446 prepared with the restriction enzyme DpnII, proximity-ligated, and reverse-crosslinked using 447 Proximo Hi-C Kit (Plant) v2.0 (Phase Genomics, Seattle, USA). The library was sequenced 448 on a HiSeq4000 for ~300 M 150-bp paired-end sequencing. 449 450 Genomic sequencing of D. oliveri 451 For Nanopore sequencing, the same protocol and procedure were used as for D. 452 cochinchinensis (see above). 453 For Pore-C sequencing, the library was prepared with the protocol and reagents 454 described by Belaghzal et al.59 with minor modifications. We harvested 2 g of fresh leaf from 455 the same reference individual as for the Nanopore library and immediately cross-linked the 456 finely chopped tissues in 1% formaldehyde for 20 minutes. The cross-linking was quenched Page 25 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 457 with 125 mM glycine for 20 minutes and then the samples were ground in liquid nitrogen 458 with a mortar and a pestle. Cell nuclei were isolated with a buffer containing 10 mM Trizma, 459 80 mM KCl, 10 mM EDTA, 1 mM spermidine trihydrochloride, 1 mM spermine 460 tetrahydrochloride, 500 mM sucrose, 1% (w/v) PVP-40, 0.5% (v/v) Triton X-100, and 0.25% 461 (v/v) β-mercaptoethanol, and then passed through a 40 µm cell strainer. The suspension was 462 centrifuged at 3,000 g, according to the estimated genome size of ~ 700 Mbp. Chromatin was 463 denatured with the restriction enzyme NlaIII at a final concentration of 1 U/µL (New England 464 Biolabs, United Kingdom) at 37°C for 18 hours. The enzyme was heat-denatured at 65°C for 465 20 minutes at 300 rpm rotation in a thermomixer. Proximity ligation, protein degradation, 466 decrosslinking, and DNA extraction were performed according to the original Belaghzal 467 protocol. The Pore-C library was prepared with the Oxford Nanopore Technologies Ligation 468 Sequencing Kit (SQK-LSK110), then sequenced on two R10.3 (FLO-109D) Nanopore flow 469 cells on a GridION sequencer for ~ 72 hours. The flow cell was washed once every 24 hour 470 with the Flow Cell Wash Kit (EXP-WSH003). 471 472 Assembly pipelines 473 Raw reads shorter than 500 bp were filtered. Due to the heterozygous nature of the 474 wild individual, we assembled the sequences with Canu 2.1.1 using the options 475 “corOutCoverage=200 correctedErrorRate=0.16 batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 476 50”. We then used purge_haplotigs v1.1.1 to collapse the assembly by separating the primary 477 assembly and haplotigs. 478 Hi-C reads (for D. cochinchinensis) were mapped to the draft genome assembly using 479 hicstuff 2.3.260 to generate the contact matrix, which was then used to scaffold and polish the 480 assembly using instaGRAAL 0.1.261 with default options to produce the final assembly 481 Dacoc 1.4 after removing contamination. Page 26 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 482 Pore-C reads (for D. oliveri) were mapped to the draft genome assembly and used to 483 generate contact map with the Pore-C-Snakemake (https://github.com/nanoporetech/Pore-C- 484 Snakemake) and produce a merged_nodups (.mnd) file, which contains a duplicate-free list of 485 paired alignments from the Pore-C reads to the draft assembly. The draft assembly and the 486 merged_nodups file were used for scaffolding in 3D-DNA (version 180419) and produce the 487 final genome Daoli 0.3. 488 To validate the scaffold arrangement, Daoli 0.3 was aligned to that of D. 489 cochinchinensis (Dacoc 1.4) using minimap2 and D-GENIES62 to produce a dot plot for 490 visualising similarity, repetitions, breaks, and inversions, with a minimum identity of 0.25. 491 492 De novo repeat library 493 A de novo repeat library was constructed using RepeatModeler 2.0.163, which 494 incorporated RECON 1.0864, RepeatScout 1.0.665, and TRF 4.0.966 for identification and 495 classification of repeat families. We then used RepeatMasker 4.1.167 to mask low complex or 496 simple repeats only (“-noint”). A de novo library of long terminal repeat (LTR) 497 retrotransposons was constructed on the simple-repeat-masked genome using LTRharvest68 498 and annotated with the GyDB database and profile HMMs using LTRdigest69 module in the 499 genometools 1.6.1 pipeline. Predicted LTR elements with no protein domain hits were 500 removed from the library. We applied the RepeatClassifier module in RepeatModeler to 501 format both repeat libraries. We merged the libraries together and clustered the sequences 502 that were ≥ 80% identical by CD-HIT-EST 4.8.170 (“-aS 80 -c 0.8 -g 1 -G 0 -A 80”) to 503 produce the final repeat library. 504 505 Gene models and annotation Page 27 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 506 Filtered mRNA-sequencing data for D. cochinchinensis (50.5 Gbp) and D. oliveri 507 (54.4 Gbp) from a previous project27 (NCBI Bioproject: PRJNA593817) were aligned against 508 the genome assembly using STAR v2.7.6 and assembled using the genome-guided mode of 509 Trinity v2.13.2. Protein sequences were obtained from Arabidopsis thaliana (Araport11)71 510 and Arachis ipaensis (Araip1.1)72. After soft-masking the genome with the de novo repeat 511 library using RepeatMasker (Dfam libraries 3.2), the transcript and protein evidences were 512 used to produce gene models using MAKER 3.01.0373. The MAKER pipeline was iteratively 513 run for two more rounds to produce the final gene models. In between each run of MAKER, 514 the gene models were used to train the ab initio gene predictors SNAP (version 2006-07-28)74 515 and AUGUSTUS 3.3.375 which were used in the MAKER pipeline. tRNA genes were 516 predicted with tRNAscan-SE 1.3.176. The quality of the gene models was assessed with two 517 metrics: the annotation edit distance (AED) in MAKER 3.01.0373 and the BUSCO score 518 (v5.1.2)77. 519 520 Population sampling 521 We obtained a collection of 435 and 331 foliage samples of Dalbergia 522 cochinchinensis and D. oliveri from 35 and 28 localities across their native range 523 (Supplementary Table 11). These samples were a combination of those collected in a 524 previous study32 and newly between 2019 and 2020. Genomic DNA was purified using a two- 525 round modified CTAB protocol (2% CTAB, 1.4 M NaCl, 1% PVP-40, 100 mM Tris-Cl pH 526 8.0, 20 mM EDTA pH 8.0, 1% 2-mercaptoethanol) with sorbitol pre-wash (0.35 M Sorbitol, 527 1% PVP-40, 100 mM Tris-Cl pH 8.0, and 5 mM EDTA pH 8.0) as the samples were rich in 528 polyphenols and polysaccharides78. Genomic DNA was treated with 5 μL RNase (10 529 mg/mL). Quality and quantity of the genomic DNA were assessed using NanoDrop One Page 28 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 530 (Thermo, Wilmington, United States) and Qubit dsDNA BR Assay kit on Qubit 4 (Thermo, 531 Wilmington, United States) respectively. 532 533 Genotyping-by-sequencing (GbS) 534 DNA samples were normalised to 200 ng suspended in 10 μL water and sent to the 535 Genomic Analysis Platform, Institute of Integrative and Systems Biology, Université Laval 536 (Quebec, Canada) for GbS library preparation. DNA was digested with a combination of 537 restriction enzymes PstI/NsiI/MspI, ligated with barcoded adapter, and pooled to 538 equimolarity. The pooled library was amplified by PCR and sequenced on a Illumina 539 NovaSeq6000 S4 with paired-end reads of 150 bp at the Génome Québec Innovation Centre, 540 (Montreal, Canada). 541 542 Variant calling 543 DNA sequence variant calling was done with the Fast-GBS v2.0 pipeline79: Illumina 544 raw reads were demultiplexed with Sabre 1.080 and trimmed with Cutadapt 1.1881 to remove 545 the adaptors. Trimmed reads shorter than 50 bp were discarded. Reads were aligned against 546 the Dacoc 1.0 genome (Hung et al., unpublished) and the Daoli 0.1 genome using BWA- 547 MEM 0.7.1782. The SAM alignment files were converted to BAM format and indexed using 548 SAMtools 1.983. Variant calling was performed in Platypus84 and variants were filtered with 549 proportion of missing data of 0.2 and minimum allele frequency (MAF) of 0.01 using 550 VCFtools 0.1.1685. Missing genotype was imputed using Beagle 5.2. Finally, linkage 551 equilibrium among SNPs was detected using BCFtools 1.983, and one SNP was removed from 552 all SNP pairs with r2 > 0.5 in a genomic window of 5 Kbp. 553 554 Environmental heterogeneity characterisation Page 29 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 555 Environmental data were obtained from different sources (34 variables in total, 556 Supplementary Table 12) and represented different measurers of temperature, precipitation, 557 their seasonality, soil, elevation, and vegetation. We calculated a correlation matrix across the 558 sampling localities and highly inter-correlated variables (pairwise correlation coefficient| > 559 0.7) were detected. For each inter-correlated variable pair, the one variable with the largest 560 mean absolute correlation across all variables was removed. 561 562 Population genetic structure and identification of putatively adaptive loci 563 Population genetic structure was assessed with sparse non-negative matrix 564 factorisation (sNMF) to estimate the number of discrete genetic clusters (K)86. The sNMF 565 was run for 10 repetitions for each value of K from 1 to 15 with a maximum iteration of 200. 566 The optimal K was selected based on the lowest cross-entropy value from the sNMF run, or 567 where the value began to plateau. Admixture plots were drawn for K = {2, 4, 8, optimal K}. 568 Population structure-based outlier analysis was also conducted with sNMF, in which outlier 569 SNPs that are significantly differentiated among populations, based on estimated FST values 570 from the ancestry coefficients obtained from sNMF87, were obtained and mapped on the 10 571 putative chromosomes for D. cochinchinensis or the 16 longest scaffolds for D. oliveri in a 572 Manhattan plot. 573 We used latent factor mixed modelling (LFMM) to test for significant associations 574 between environmental variables and SNP allele frequencies. The optimal K obtained from 575 the sNMF was used in LFMM to correct for the neutral genetic structure. LFMM was run for 576 3 repetitions with a maximum iteration of 1,000 and 500 burn-ins. Z-scores were obtained for 577 all repetitions for each environmental variable, and then the median was taken for each SNP. 578 Next, the genomic inflation factor λ, defined as the observed median of Z-scores divided by Page 30 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 579 the expected median of the chi-squared distribution for each environmental association88, was 580 calculated to calibrate for P-values: 581 , such that . 582 The calibration was then inspected on a histogram of P-values for each environmental 583 association. Finally, multiple testing was corrected with the Benjamini and Hochberg method 584 to obtain Q-values. 585 The sNMF and LFMM calculations were performed in R 4.1.0 using the packages 586 LEA 3.4.089. 587 588 Gradient forest modelling 589 For all predictions in gradient forest models, resampling was necessary because not 590 all environmental raster layers had the same resolution and extent. They were all cropped to 591 the latest-updated modelled and expert-validated species distribution12 and reprojected to the 592 WorldClim bioclimatic rasters, as they have the highest resolution, using bilinear 593 interpolation or nearest neighbour method for continuous and categorical variables 594 respectively. 595 To correct for the genetic structure, spatial variables were generated using the 596 principal coordinates of neighbour matrices (PCNM) approach90. Only half of the positive 597 PCNM values were kept. Gradient forest model was used to predict and rank the importance 598 of environmental variables in genomic variation, as its machine learning algorithm worked 599 best with minimal prior and confounding variables. Putatively neutral SNPs and putatively 600 adaptive SNPs were used as the response variables and all the filtered environment variables 601 and PCNM variables were used as the predictor variables in the gradient forest model for 500 602 regression trees. The maximum number of splits to evaluate was determined as follows: Page 31 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 603 604 The turnovers of allelic frequencies were then projected spatially across the latest- 605 updated predicted species distribution ranges12 using the fitted gradient forest model and the 606 environmental values across the range. Principal component analysis (PCA) was used to 607 summarise the genomic variation across the distribution and the first three principal 608 components (PC1, PC2, and PC3) were used for visualisation of genomic variation across the 609 range. 610 The PCAs of turnovers of allelic frequencies between adaptive SNPs and neutral 611 SNPs were compared using the Procrustes rotation, and its residuals were used to map where 612 adaptive genomic variation deviates from neutral variation. 613 614 Prediction of genomic vulnerability 615 Future climate projections were obtained from five general circulation models (GCM) 616 (MIROC6, BCC-CSM2-MR, IPSL-CM6A-LR, CNRM-ESM2-1, MRI-ESM2-0) 617 participating in the World Climate Research Programme Coupled Model Intercomparison 618 Project 6 (WCRP CMIP6) for four shared socio-economic pathways (SSPs) (126, 245, 370, 619 and 585) over four 20-year periods (2021–2040, 2041–2060, 2061–2080, 2081–2100). The 620 gradient forest model was used to predict patterns of genetic variation and local adaption 621 under future environmental scenarios. The allelic frequency turnover function was fitted on 622 the future landscape and the genomic offset, defined as the required genomic change in a set 623 of putatively adaptive loci to adapt to a future environment91, was calculated in a grid-by-grid 624 basis using the following equation for Euclidean distance, where p is the number of 625 environmental (predictor) variables: Page 32 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 626 627 The genetic offset was then scaled across all SSPs and time periods. 628 629 Prediction of genomic similarity between current germplasm source and future restoration 630 site 631 It is of practical interest to a range of forestry stake-holders to predict if a current 632 germplasm source is a good match for future restoration sites, or where to source suitable 633 germplasm for a proposed restoration site. We developed an interactive web application 634 based on R Shiny and hosted the application on the shinyapps.io server. seedeR v 1.0 is open 635 source and freely available from https://trainingidn.shinyapps.io/seeder/. The analysis 636 workflow consists of the selection of species of interest, time period and future climate 637 scenario, and the restoration site’s geographical coordinates (Supplementary Figure 11). 638 The application maps the predicted turnover of allelic frequencies at a hypothetical 639 future restoration site onto the current landscape on a grid-by-grid basis, with the genetic 640 offset calculated as described above. After scaling, the values are reversed on a 0-1 scale to 641 represent the genomic similarity between the current germplasm source and future restoration 642 site. 643 644 Data availability 645 The research materials supporting this publication, including genomic assemblies, raw 646 reads, and annotations, can be publicly accessed either in the Supplementary Information or 647 in NCBI GenBank under the BioProjects PRJNA841235 and PRJNA841689. Page 33 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 648 References 649 1. UNODC. World Wildlife Crime Report: Trafficking in Protected Species. (2020). 650 2. UNODC. World Wildlife Crime Report: Trafficking in Protected Species. (United Nations 651 Publication, 2016). 652 3. United Nations Environment Programme. The rise of environmental crime: A growing threat 653 to natural resources peace, development and security. (2016). 654 4. Gaisberger, H. et al. Tropical and subtropical Asia’s valued tree species under threat. 655 Conservation Biology 36, e13873 (2022). 656 5. Winfield, K., Scott, M. & Graysn, C. Global status of Dalbergia and Pterocarpus rosewood 657 producing species in trade. in Convention on International Trade in Endangered Species 17th 658 Conference of Parties - Johannesburg (2016). 659 6. Asian Regional Workshop (Conservation & Sustainable Management of Trees Viet Nam). 660 Dalbergia cochinchinensis. The IUCN Red List of Threatened Species. e.T32625A9719096 661 (1998) doi:10.2305/IUCN.UK.1998.RLTS.T32625A9719096.en. 662 7. Nghia, N. H. Dalbergia oliveri. The IUCN Red List of Threatened Species 1998. 663 e.T32306A9693932 (1998) doi:10.2305/IUCN.UK.1998.RLTS.T32306A9693932.en. 664 8. CITES. Consideration of proposals for amendment of appendices I and II. Convention on 665 International Trade in Endangered Species of Wild Fauna and Flora. (Convention on 666 International Trade in Endangered Species of Wild Fauna and Flora, 2017). 667 9. Barstow, M. et al. Dalbergia cochinchinensis. The IUCN Red List of Threatened Species 2022 668 (2022). 669 10. Barstow, M. et al. Dalbergia oliveri. The IUCN Red List of Threatened Species 2022 (2022). 670 11. Gaisberger, H. et al. Range-wide priority setting for the conservation and restoration of Asian 671 rosewood species accounting for multiple threats and ecogeographic diversity. Biol Conserv 672 270, 109560 (2022). 673 12. Myers, N., Mittermeier, R. A., Mittermeier, C. G., da Fonseca, G. A. B. & Kent, J. 674 Biodiversity hotspots for conservation priorities. Nature 403, 853–858 (2000). 675 13. Woodruff, D. S. Biogeography and conservation in Southeast Asia: how 2.7 million years of 676 repeated environmental fluctuations affect today’s patterns and the future of the remaining 677 refugial-phase biodiversity. Biodivers Conserv 19, 919–941 (2010). 678 14. Wurster, C. M. et al. Forest contraction in north equatorial Southeast Asia during the Last 679 Glacial Period. Proc Natl Acad Sci U S A 107, 15508–15511 (2010). 680 15. Jansen, M. et al. Food for thought: The underutilized potential of tropical tree-sourced foods 681 for 21st century sustainable food systems. People and Nature 2, 1006–1020 (2020). 682 16. Oldekop, J. A. et al. Forest-linked livelihoods in a globalized world. Nature Plants 2020 6:12 683 6, 1400–1407 (2020). 684 17. Centre for Forest, Landscape and Planning, D., Cambodia Tree Seed Project & Forestry 685 Administration, C. Conservation of valuable and endangered tree species in Cambodia 2001- 686 2006 - a case study. Forest & Landscape: Development and Environment Series 3, (2006). 687 18. Version, D. Conservation of valuable and endangered tree species in Cambodia 2001 - 2006 688 Moestrup, Søren; Sloth, Arvid; Burgess, Sarah. (2017). 689 19. Maningo, E. v. & Thea, S. Regional project for prootion of forest rehabilitation in Cambodia 690 and Vietnam through demonstration models and improvement of seed supply system: lesson 691 learned. 692 20. APFORGEN. Conserving Rosewood genetic resources for resilient livelihoods in the Mekong 693 - Project Inception Workshop Report. (2018). 694 21. Frankham, R. et al. Loss of genetic diversity reduces ability to adapt. in Genetic Management 695 of Fragmented Animal and Plant Populations (eds. Frankham, R. et al.) (Oxford University 696 Press, 2017). doi:10.1093/OSO/9780198783398.003.0004. 697 22. Savolainen, O., Lascoux, M. & Merilä, J. Ecological genomics of local adaptation. Nat Rev 698 Genet 14, 807–820 (2013). 699 23. Verkerk, P. J. et al. Climate-Smart Forestry: the missing link. For Policy Econ 115, 102164 700 (2020). Page 34 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 701 24. Petit-Cailleux, C. et al. Tree Mortality Risks Under Climate Change in Europe: Assessment of 702 Silviculture Practices and Genetic Conservation Networks. Front Ecol Evol 9, 582 (2021). 703 25. Lindner, M. et al. Climate change and European forests: What do we know, what are the 704 uncertainties, and what are the implications for forest management? J Environ Manage 146, 705 69–83 (2014). 706 26. Hung, T. H. et al. Reference transcriptomes and comparative analyses of six species in the 707 threatened rosewood genus Dalbergia. Sci Rep 10, 17749 (2020). 708 27. Supple, M. A. & Shapiro, B. Conservation of biodiversity in the genomics era. Genome Biol 709 19, (2018). 710 28. Allendorf, F. W., Hohenlohe, P. A. & Luikart, G. Genomics and the future of conservation 711 genetics. Nat Rev Genet 11, 697–709 (2010). 712 29. Desalle, R. & Amato, G. Conservation Genetics, Precision Conservation, and De-extinction. 713 Hastings Center Report 47, S18–S23 (2017). 714 30. Luo, X., Chen, S. & Zhang, Y. PlantRep: a database of plant repetitive elements. Plant Cell 715 Rep 41, 1163–1166 (2022). 716 31. Hartvig, I. et al. Population genetic structure of the endemic rosewoods Dalbergia 717 cochinchinensis and D. oliveri at a regional scale reflects the Indochinese landscape and life- 718 history traits. Ecol Evol 8, 530–545 (2018). 719 32. Hartvig, I. et al. Conservation genetics of the critically endangered Siamese rosewood 720 (Dalbergia cochinchinensis): recommendations for management and sustainable use. 721 Conservation Genetics 1–16 (2020) doi:10.1007/s10592-020-01279-1. 722 33. Roy, D. & Sadanandom, A. SUMO mediated regulation of transcription factors as a 723 mechanism for transducing environmental cues into cellular signaling in plants. Cellular and 724 Molecular Life Sciences 78, 2641–2664 (2021). 725 34. Lee, J. et al. Salicylic acid-mediated innate immunity in Arabidopsis is regulated by SIZ1 726 SUMO E3 ligase. The Plant Journal 49, 79–90 (2007). 727 35. Catala, R. et al. The Arabidopsis E3 SUMO ligase SIZ1 regulates plant growth and drought 728 responses. Plant Cell 19, 2952–2966 (2007). 729 36. Yoo, C. Y. et al. SIZ1 Small Ubiquitin-Like Modifier E3 Ligase Facilitates Basal 730 Thermotolerance in Arabidopsis Independent of Salicylic Acid. Plant Physiol 142, 1548–1558 731 (2006). 732 37. Jin, J. B. et al. The SUMO E3 ligase, AtSIZ1, regulates flowering by controlling a salicylic 733 acid-mediated floral promotion pathway and through affects on FLC chromatin structure. 734 Plant J 53, 530–540 (2008). 735 38. Bay, R. A. et al. Predicting Responses to Contemporary Environmental Change Using 736 Evolutionary Response Architectures. Am Nat 189, 463–473 (2017). 737 39. Hung, T. H. et al. Physiological responses of rosewoods Dalbergia cochinchinensis and D. 738 oliveri under drought and heat stresses. Ecol Evol (2020) doi:10.1002/ece3.6744. 739 40. Aerts, R. et al. Site requirements of the endangered rosewood Dalbergiaoliveri in a tropical 740 deciduous forest in northern Thailand. For Ecol Manage 259, 117–123 (2009). 741 41. Nix, H. A. A biogeographic analysis of Australian elapid snakes. in Atlas of elapid snakes of 742 Australia: Canberra, Australian Flora and Fauna Series 7 (ed. Longmore, R.) 4–15 743 (Australian Government Publishing Service, 1986). 744 42. Moles, A. T. et al. Global patterns in plant height. Journal of Ecology 97, 923–932 (2009). 745 43. Woodcock, D. & Shier, A. Wood specific gravity and its radial variations: the many ways to 746 make a tree. Trees 2002 16:6 16, 437–443 (2002). 747 44. Garnier-Géré, P. H. & Ades, P. K. Environmental Surrogates for Predicting and Conserving 748 Adaptive Genetic Variability in Tree Species. Conservation Biology 15, 1632–1644 (2001). 749 45. Aitken, S. N., Yeaman, S., Holliday, J. A., Wang, T. & Curtis-McLane, S. Adaptation, 750 migration or extirpation: climate change outcomes for tree populations. Evol Appl 1, 95–111 751 (2008). 752 46. Supple, M. A. et al. Landscape genomic prediction for restoration of a Eucalyptus foundation 753 species under climate change. Elife 7, e31835 (2018). 754 47. Manel, S. et al. Broad-scale adaptive genetic variation in alpine plants is driven by temperature 755 and precipitation. Mol Ecol 21, 3729–3738 (2012). Page 35 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 756 48. Ledoux, J. B. et al. Potential for adaptive evolution at species range margins: contrasting 757 interactions between red coral populations and their environment in a changing ocean. Ecol 758 Evol 5, 1178 (2015). 759 49. Gougherty, A. V., Keller, S. R. & Fitzpatrick, M. C. Maladaptation, migration and extirpation 760 fuel climate change risk in a forest tree species. Nature Climate Change 2021 11:2 11, 166– 761 171 (2021). 762 50. Jalonen, R., Valette, M., Boshier, D., Duminil, J. & Thomas, E. Forest and landscape 763 restoration severely constrained by a lack of attention to the quantity and quality of tree seed: 764 Insights from a global survey. Conserv Lett 11, e12424 (2018). 765 51. Fremout, T. et al. Diversity for Restoration (D4R): Guiding the selection of tree species and 766 seed sources for climate-resilient restoration of tropical forest landscapes. Journal of Applied 767 Ecology 59, 664–679 (2022). 768 52. Aitken, S. N. & Whitlock, M. C. Assisted Gene Flow to Facilitate Local Adaptation to Climate 769 Change. Annu Rev Ecol Evol Syst 44, 367–388 (2013). 770 53. Bozzano, M. et al. Genetic Considerations in Ecosystem Restoration Using Native Tree 771 Species. (FAO and Bioversity International, 2014). 772 54. Breed, M. F., Stead, M. G., Ottewell, K. M., Gardner, M. G. & Lowe, A. J. Which provenance 773 and where? Seed sourcing strategies for revegetation in a changing environment. Conservation 774 Genetics 14, 1–10 (2013). 775 55. Martins, K. et al. Landscape genomics provides evidence of climate-associated genetic 776 variation in Mexican populations of Quercus rugosa. Evol Appl 11, 1842–1858 (2018). 777 56. Shafer, A. B. A. et al. Genomics and the challenging translation into conservation practice. 778 Trends Ecol Evol 30, 78–87 (2015). 779 57. R. Taylor, H., Dussex, N. & van Heezik, Y. Bridging the conservation genetics gap by 780 identifying barriers to implementation for conservation practitioners. Glob Ecol Conserv 10, 781 231–242 (2017). 782 58. Shafer, A. B. A. et al. Genomics and the challenging translation into conservation practice. 783 Trends Ecol Evol 30, 78–87 (2015). 784 59. Belaghzal, H., Dekker, J. & Gibcus, J. H. Hi-C 2.0: An optimized Hi-C procedure for high- 785 resolution genome-wide mapping of chromosome conformation. Methods 123, 56 (2017). 786 60. Matthey-Doret, C. et al. koszullab/hicstuff: Use miniconda layer for docker and improved P(s) 787 normalisation. (2020) doi:10.5281/ZENODO.4066363. 788 61. Baudry, L. et al. InstaGRAAL: Chromosome-level quality scaffolding of genomes using a 789 proximity ligation-based scaffolder. Genome Biol 21, 1–22 (2020). 790 62. Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and 791 simple way. PeerJ 6, e4958 (2018). 792 63. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element 793 families. Proc Natl Acad Sci U S A 117, 9451–9457 (2020). 794 64. Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in 795 Sequenced Genomes. Genome Res 12, 1269 (2002). 796 65. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large 797 genomes. Bioinformatics 21 Suppl 1, (2005). 798 66. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 799 27, 573–580 (1999). 800 67. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to Identify Repetitive Elements in 801 Genomic Sequences. Curr Protoc Bioinformatics 25, 4.10.1-4.10.14 (2009). 802 68. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de 803 novo detection of LTR retrotransposons. BMC Bioinformatics 9, 1–14 (2008). 804 69. Steinbiss, S., Willhoeft, U., Gremme, G. & Kurtz, S. Fine-grained annotation and classification 805 of de novo predicted LTR retrotransposons. Nucleic Acids Res 37, 7002–7013 (2009). 806 70. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein 807 or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006). 808 71. Cheng, C.-Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference 809 genome. The Plant Journal 89, 789–804 (2017). Page 36 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 810 72. Bertioli, D. J. et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the 811 diploid ancestors of cultivated peanut. Nat Genet 48, 438–446 (2016). 812 73. Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management 813 tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011). 814 74. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 1–9 (2004). 815 75. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped 816 cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644 (2008). 817 76. Chan, P. P. & Lowe, T. M. tRNAscan-SE: Searching for tRNA genes in genomic sequences. 818 Methods Mol Biol 1962, 1 (2019). 819 77. Manni, M., Berkeley, M. R., Seppey, M., Sim~ Ao, F. A. & Zdobnov, E. M. BUSCO Update: 820 Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for 821 Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol 38, 4647–4654 (2021). 822 78. Inglis, P. W., Pappas, M. de C. R., Resende, L. V. & Grattapaglia, D. Fast and inexpensive 823 protocols for consistent extraction of high quality DNA and RNA from challenging plant and 824 fungal samples for high-throughput SNP genotyping and sequencing applications. PLoS One 825 13, e0206085 (2018). 826 79. Torkamaneh, D., Laroche, J. & Belzile, F. Fast-GBS v2.0: an analysis toolkit for genotyping- 827 by-sequencing data. https://doi.org/10.1139/gen-2020-0077 63, 577–581 (2020). 828 80. Joshi, N. A. sabre - A barcode demultiplexing and trimming tool for FastQ files. Preprint at 829 (2013). 830 81. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. 831 EMBnet J 17, 10–12 (2011). 832 82. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 833 ArXiv 1303.3997, (2013). 834 83. Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, 1–4 (2021). 835 84. Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling 836 variants in clinical sequencing applications. Nature Genetics 2014 46:8 46, 912–918 (2014). 837 85. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 838 (2011). 839 86. Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G. & François, O. Fast and efficient 840 estimation of individual ancestry coefficients. Genetics 196, 973–983 (2014). 841 87. Martins, H., Caye, K., Luu, K., Blum, M. G. B. & François, O. Identifying outlier loci in 842 admixed and in continuous populations using ancestral population differentiation statistics. 843 Mol Ecol 25, 5029–5042 (2016). 844 88. Yang, J. et al. Genomic inflation factors under polygenic inheritance. European Journal of 845 Human Genetics 19, 807 (2011). 846 89. Frichot, E. & François, O. LEA: An R package for landscape and ecological association 847 studies. Methods Ecol Evol 6, 925–929 (2015). 848 90. Borcard, D. & Legendre, P. All-scale spatial analysis of ecological data by means of principal 849 coordinates of neighbour matrices. Ecol Modell 153, 51–68 (2002). 850 91. Rellstab, C., Dauphin, B. & Exposito-Alonso, M. Prospects and limitations of genomic offset 851 in conservation management. Evol Appl 14, 1202–1212 (2021). 852 853 Page 37 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 854 Competing interests statement 855 The authors declare no competing interests. 856 857 Author contributions 858 T.H.H.: designed the study, processed the samples, conducted the Oxford Nanopore 859 sequencing, conceived and conducted the bioinformatic analyses, drafted the manuscript, and 860 secured funding for the project; 861 T.S: collected the samples, revised the manuscript, and secured funding for the project; 862 B.T.: collected the samples, revised the manuscript, and secured funding for the project; 863 V.C.: collected the samples, and revised the manuscript; 864 I.T.: collected the samples, revised the manuscript, and secured funding for the project; 865 C.P.: collected the samples, and revised the manuscript; 866 S.B.: collected the samples, and revised the manuscript; 867 I.H.: collected the samples, and revised the manuscript; 868 H.G.: provided expertise and materials for species distribution models, and revised the 869 manuscript; 870 R.J.: revised the manuscript, and secured funding for the project; 871 D.H.B.: supervised the study, revised the manuscript, and secured funding for the project; 872 J.J.M.: designed and supervised the study, revised the manuscript, and secured funding for 873 the project. 874 875 Acknowledgements 876 The genomic work was supported by funding to T.H.H. from the Biotechnology and 877 Biological Sciences Research Council (grant number BB/M011224/1) and to T.H.H., J.J.M. 878 from the Google Cloud Academic Grant. The sampling work was supported by funding to Page 38 of 39 bioRxiv preprint doi: https://doi.org/10.1101/2023.01.29.524750; this version posted January 31, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Adaptation and genomic vulnerability in Dalbergia Hung et al. 2023 879 T.S., B.T., I.T., R.J., D.H.B., J.J.M from the UK Darwin Initiative (ref. 25-023). The work of 880 H.G. and R.J. was supported by the CGIAR Fund Donors (https://www.cgiar.org/funder) 881 through the CGIAR Research Programme on Forests, Trees and Agroforestry. T.H.H. wishes 882 to thank Andrew Eckert and Stephen Harris as the examiners of his doctoral thesis, who 883 provided very constructive feedback that improved this paper. Page 39 of 39