Highlights1 A new adaptive identification strategy of best crop management with farmers2 Romain Gautron,Dorian Baudry,Myriam Adam,Gatien N. Falconnier,Gerrit Hoogenboom,Brian King,Marc Corbeels3 • We introduce a novel adaptive identification strategy of best crop management practices with farmers.4 • Minimizing yield losses in field trials is an exploration-exploitation dilemma.5 • Risk-aware bandit algorithms model farmers’ risk aversion in decision making.6 • A bandit algorithm identifies best nitrogen fertilizer practices for maize in simulated conditions.7 • Our bandit algorithm outperforms Explore-Then-Commit (ETC) methods in a simulated experiment.8 A new adaptive identification strategy of best crop management9 with farmers10 Romain Gautrona,b,c, Dorian Baudryd, Myriam Adame,f,g, Gatien N. Falconniera,b,h,11 Gerrit Hoogenboomi, Brian Kingc and Marc Corbeelsj 12 aAIDA, Université de Montpellier, Montpellier, France13 bCIRAD, Montpellier, France14 cCGIAR Platform for Big Data in Agriculture, Alliance of Bioversity International and CIAT, Km 17, Recta Cali-Palmira, 763537, Colombia15 dCNRS, Université de Lille, INRIA, Villeneuve d’Ascq, France16 eCIRAD, UMR AGAP Institut, Bobo-Dioulasso 01, Burkina Faso17 fUMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France18 gInstitut National de l’Environnement et de Recherches Agricoles (INERA), Burkina Faso19 hInternational Maize and Wheat Improvement Centre (CIMMYT)-Zimbabwe, 12.5 km Peg Mazowe Road, Harare, Zimbabwe20 iAgricultural and Biological Engineering, 289 Frazier Rogers Hall University of Florida Gainesville, Florida, 32611-0570, USA21 jInternational Institute of Tropical Agriculture, PO Box 30772, Nairobi, 00100, Kenya22 23 A R T I C L E I N F O Keywords: Crop management Experimental Design Bandit Algorithm Artificial Intelligence Risk Uncertainty 24 A B S T R A C T25 26 Identification of best performing fertilizer practices with on-farm trials is challenging, in27 particular in rainfed farming due to weather uncertainty. However, it remains crucial to test28 a range of viable practices to ascertain their performances, given that they are not known29 beforehand. This process also involves the testing of practices that could potentially yield inferior30 results in comparison to the best available practice(s). To identify a best management practice, an31 “intuitive strategy” typically sets up multi-year, multi-location field trials, wherein each practice32 is tested in a proportionally equal manner over a set number of years. Our objective was to33 provide an identification strategy for nitrogen fertilizer management designing a bandit learning34 algorithm. We aimed for the bandit algorithm to be better at minimizing farmers’ losses occurring35 from the testing of management practices that do not perform best, compared with the “intuitive36 strategy” that was formulized as the Explore-Then-Commit strategy. Our case study was for37 maize production in southern Mali. Bandit framework is a machine learning approach in which38 an agent learns from the feedback over time and accordingly selects actions in order to maximize39 its cumulative reward in the long term. To mimic the maize responses to nitrogen fertilization,40 we used the Decision Support System for Agrotechnology Transfer (DSSAT) crop model. We41 compared nitrogen fertilizer practices using a risk-aware measure, the Conditional Value-at-42 Risk (CVaR), and a novel agronomic metric, the Yield Excess (YE). The YE accounts for both43 grain yield and agronomic nitrogen use efficiency. The bandit algorithm performed better than44 the intuitive strategy: it minimized farmers’ yield losses during the identification process. This45 study is a methodological step which opens up new horizons for risk-aware identification of the46 performance of a range of crop management practices in real conditions.47 48 Gautron et al.: Preprint submitted to Elsevier Page 1 of 39 Bandits for best crop management identification 1. Introduction49 Identifying site-specific best-performing crop management is crucial for farmers to increase their income from crop50 production, but also for minimizing the negative environmental impacts of cropping activities (Tilman et al., 2002).51 However, due to weather variability, the identification of these practices can be challenging, in particular with rainfed52 farming: what worked best in a wet year or a year with sufficient rainfall, might not work in the next year, when rainfall is53 lower (Affholder, 1995). The performance of crop management at a given site has an underlying “hidden” distribution54 due to inter-annual weather variability, thus creating great uncertainty (e.g., Fosu-Mensah et al., 2012). Because crop55 management decisions are recurrent, i.e. they are repeated for each new crop growing season, the identification of best56 available crop management falls into the category of sequential decision making under uncertainty (Gautron et al.,57 2022). Computer-based decision support tools can allow farmers to make more informed (less uncertain) decisions58 about their cropping practices from one year to the next, and can facilitate farmers’ risk management in the face of59 seasonal weather variability (Hochman and Carberry, 2011). There exist numerous decision support tools of widely60 ranging complexity for crop management, that have been introduced to farmers with varying degrees of success61 (Gautron et al., 2022).62 Machine learning (ML) and more generally artificial intelligence (AI) can help addressing sequential decision63 making under uncertainty. Specifically, (multi-armed) bandit algorithms are a type of learning algorithm that are64 designed to perform in uncertain environments. The name comes from imagining a gambler at a row of slot machines,65 who has to repeatedly decide which machine to play for a given total number of trials. Thus, bandit problems (Lattimore66 and Szepesvári, 2020) consider a decision-maker, called agent, that repeatedly faces a choice between contending67 actions, and that has to iteratively improve its decision-making with trials in order to get the highest reward in68 expectation. The canonical bandit problem originates from clinical trials with sequential drug allocation (Thompson,69 1933). At each time step, the agent chooses one action (e.g., administering a particular drug for a patient) amongst a70 set of possible actions (e.g., administering other types of drugs). Each action provides a reward (e.g., a certain level71 of tumor cell reduction after taking a particular drug), drawn from a corresponding unknown reward distribution (e.g.,72 the distribution of tumor cell reduction because of the drug). The best available action has the reward distribution73 with the highest mean reward (e.g., the highest mean level of tumor cell reduction). The objective of the agent is to74 sequentially choose actions such that the expected sum of rewards is maximized. Maximizing the total expected reward75 is equivalent to minimizing the regret, which is a measure of the total losses that occur with actions that do not perform76 best (Robbins, 1952).77 ∗Corresponding author r.gautron@cgiar.org (R. Gautron); dorian.baudry@ensae.fr (D. Baudry); myriam.adam@cirad.fr (M. Adam); gatien.falconnier@cirad.fr (G.N. Falconnier); gerrit@ufl.edu (G. Hoogenboom); b.king@cgiar.org (B. King); marc.corbeels@cirad.fr (M. Corbeels) ORCID(s): 0000-0002-3218-7215 (R. Gautron) Gautron et al.: Preprint submitted to Elsevier Page 2 of 39 Bandits for best crop management identification In a sequential decision-making setting, the agent refines its next action based on all previous results in an iterative78 way (which can be thought as an iterative “active sampling”). To know how a given action performs, information about79 a sufficient number of (possibly poor) rewards is required by selecting actions that are not the best performers: this is the80 exploration phase. To maximize the expected sum of rewards, the previous actions that provided good results so far must81 be selected more frequently; this is the exploitation phase. Bandit algorithms aim at finding the right balance between82 exploration and exploitation. Through balancing exploration and exploitation, a bandit algorithm may come to describe83 the underlying reward distribution associated to each action with high confidence, allowing for the learner to exploit84 and attain the best expected cumulative reward possible. Maximizing the expected reward (by minimizing the regret)85 is a key objective in bandit problems, and the convergence and robustness of the algorithm are important factors in86 achieving this goal. To do so, bandit algorithms compute and optimize statistics of the reward distribution associated to87 each action (e.g., upper confidence bounds). Optimizing these statistics is mathematically proven to optimally manage88 the exploration-exploitation dilemma (Lattimore and Szepesvári, 2020). The exploration-exploitation dilemma is a89 reality for farmers when implementing crop management. Farmers typically want to minimize overall crop yield losses90 and therefore may explore the performance of promising new crop management practices on small test field plots91 (Cerf and Meynard, 2006; Evans et al., 2017). Thus, they avoid potentially large crop yield losses from new practices92 by managing a gradual transition between the current practices and the promising new one(s), based on the results they93 obtain on the small test plots.94 In our study, we consider a set of pre-defined crop management practices suitable to crop growing conditions that a95 group of farmers experience. Yet, the precise practice(s) that will best perform for the particular growing conditions is96 not known with certainty. The objective of our study is to identify the best crop management through a novel strategy97 which optimally addresses the exploration-exploitation dilemma. During the identification, we want the strategy to98 minimize farmers’ losses that result from crop management practices that do not best perform during field testing.99 Losses are defined as the differences in performance of a given crop management practice compared to the true best100 performing practice, which is unknown beforehand. These are for example crop yield losses. We set as baseline the101 “intuitive strategy” which consists in identifying the best crop management practice through multi-location multi-year102 field trials in which the whole set of pre-defined practices are tested in an equiproportional way during a fixed number103 of years. We compare this intuitive strategy to a novel identification strategy, based on a bandit algorithm. Thus, we104 test the hypothesis that a bandit algorithm can help farmers in better identifying the best crop management practice for105 their context from on-farm trials, by minimizing crop yield losses from crop management practices that do not perform106 best on these trials.107 Our study considers the case of nitrogen fertilization of rainfed maize production in southern Mali. We compare108 both identification strategies of best available nitrogen fertilization practices based on results from maize growth109 Gautron et al.: Preprint submitted to Elsevier Page 3 of 39 Bandits for best crop management identification simulations with a calibrated crop model in order to mimic real-world performance of crop management. We emphasize110 that we do not strictly optimize the nitrogen management itself, but rather the sequential choice (i.e. identification111 strategy) over time between contrasting nitrogen management practices that were preselected. As an example of112 crop management, we focused on nitrogen fertilization, but our study has a broader objective, i.e. providing a novel113 method for identifying best crop management practices from any set of predetermined practices (e.g., varietal choice,114 irrigation). Also, the identification strategies do not depend on model simulations, and ultimately could be applied115 in real field conditions. We used the Decision Support System for Agrotechnology Transfer (DSSAT) crop simulator116 (Hoogenboom et al., 2019) as crop model.117 2. Methods118 2.1. The virtual crop management problem119 In our virtual crop management identification problem, a population or ensemble of virtual farmers joined a120 participatory experiment to identify the best nitrogen fertilizer practices for maize production in their fields, i.e. in121 the Cercle of Koutiala in southern Mali (Coulibaly et al., 2017). The distribution of soil types of the fields of the group122 of virtual farmers was representative of the region (Table 1). A total population of 500 virtual farmers was considered.123 Each virtual farmer belonged to a cohort that corresponded to farmers growing maize on the same soil type. For each124 cohort, we wanted to identify the best nitrogen fertilizer practice from a set of candidate practices (see Table 2 and125 Section 2.2 for the performance measures we considered). Additionally, the research team set the additional objective126 to limit the maize yield losses of individual virtual farmers that could arise from poor nitrogen fertilizer practice127 recommendations during the identification process.128 At the beginning of each crop growing season, we assumed that a random number of virtual farmers (uniformly129 obtained between 250 and 350) of the total population of 500 farmers volunteered to apply the recommended fertilizer130 applications provided by the research team. Each year, the group of volunteers was variable in size and in the131 representation of cohorts, as could occur in reality (Figure 1). Thus, researchers did not control the composition of132 the group of volunteers. Each virtual farmer indicated the fields and corresponding soil types on which she/he planned133 to grow maize. Following the identification strategies, researchers then provided a fertilizer recommendation (Table 3)134 to each virtual farmer for the ongoing growing season, depending on her/his soil type i.e. cohort. At the end of the135 season, the farmers shared their results in terms of maize grain yields with the research team, allowing to refine the136 recommendations for the next season. The whole experiment was repeated during 20 consecutive years following the137 same steps. Figure 2a illustrates this process, and corresponds to the steps described in Figure 1.138 Gautron et al.: Preprint submitted to Elsevier Page 4 of 39 Bandits for best crop management identification cohort 1 (soil 1) cohort 2 (soil 2) identification strategy for cohort 1 identification strategy for cohort 2 season volunteers ... virtual farmer population census of virtual farmer volunteers fertilizer practice assignation yield reporting Figure 1: Set-up of the numerical experiment with virtual farmers to identify best nitrogen fertilizer practices. Virtual farmers (𝑛 = 500) were grouped by cohorts (𝑐 = 7, Table 1), sharing the same soil type. Each cohort defines an independent identification problem of best nitrogen fertilizer practice. A cohort is represented by identical symbols (only four of the seven cohorts are represented). At the start of the maize growing season, a random number of individuals (𝑛 = 250 to 350) from the overall farmer population volunteered to test a certain fertilizer practice from a set of pre-selected practices (Table 2). The specific fertilizer practice to test was defined by the researcher for each cohort independently, according to the identification strategy employed (see Section 2.1). At the end of the seasons, the virtual farmers reported the maize yield from their fields. Maize yields were generated through DSSAT model simulations (see Section A in Supplementary Materials). The process was repeated over 20 growing seasons. Nitrogen fertilizer practices. Ten nitrogen fertilizer practices were considered as recommendations in the virtual139 experiment (see Table 2). Practices 0 to 7 represent the following set of split fertilizer practice for a total amount of140 135 kg N/ha applied:141 - Two split applications (Practice 0): 15 kg N/ha at 15 days after planting (DAP), and 120 kg N/ha at 30 DAP.142 - Three split applications (Practice 4): 15 kg N/ha at 15 DAP, 60 kg N/ha at 30 DAP and 60 kg N/ha at 45 DAP.143 - Split applications according to the rainfall amount (Practices 2, 3 and 6, 7): 2nd and 3rd top-dressing applications144 only if the cumulated rainfall amount from the start of the season to 30 DAP exceeds the 30th percentile of145 historical rainfall i.e. 200 mm.146 - Split applications according to plant nitrogen status (Practices 1, 3 and 5, 7): 2nd and 3rd top-dressing147 applications only if the simulated nitrogen stress factor (NSTRES in DSSAT, see below) exceeds 0.2 (0 standing148 for no stress, 1 for maximal stress) at 30 DAP, hereby mimicking the use of a portable chlorophyll meter to149 monitor plant nitrogen status (e.g. Kalaji et al., 2017).150 Gautron et al.: Preprint submitted to Elsevier Page 5 of 39 Bandits for best crop management identification For T years: 1.b get volunteer farmers for current year 1.a farmer population 4 get all volunteers’ yield outcomes 2 researcher’ identification strategies 3 assign the specific fertilizer practices to the volunteers beginning of the season end of the season year← year + 1 (a) Best fertilizer practice identification process. At the start of the season, a number of farmers (𝑛 = 250 to 350) volunteer 1.b to test fertilizer practices recommended by the researcher following an identification strategy 2, 3 . At the end of each season, the farmers share their yield outcomes with the experts 4 . The experts will use these results to improve their fertilizer recommendations for the next growing season. The process is repeated for a total number of 𝑇 = 20 years. For T times: 1 choose an action kt from K actions 3 observe an uncertain result rt of action kt 2 make the action kt t← t+ 1 (b) Canonical bandit problem. For 𝑇 times, an agent sequen- tially makes decisions on an action 𝑘𝑡 from the set {1,⋯ , 𝐾} of possible actions 1 . After making the action 𝑘𝑡 2 , the agent observes an uncertain result 𝑟𝑡 3 . This result is sampled from a fixed distribution, unknown to the agent, which corresponds to the effect of action 𝑘𝑡. Figure 2: Schematic representation of the ensemble best fertilization identification process (a) and the canonical bandit problem (b). Split fertilizer applications were considered in order to adjust the amount of nitrogen applied to the likely crop demand151 as the season develops. This adjustment can rely on factors such as weather conditions and the crop performance (Piha,152 1993).153 Practice 8 corresponds to the recommended fertilizer application for maize (70 kg N/ha) in the study region, which154 was determined based on model simulations (Huet et al., 2022), i.e. the average of the nitrogen fertilizer rates that were155 expected to result in maximum positive return on fertilizer investment (Getnet et al., 2016). Practice 9 (180 kg N/ha)156 corresponds to a nitrogen fertilizer application that is likely excessive. In our model simulations (see below), the type157 of nitrogen fertilizer applied for all practices was set as ammonium nitrate broadcasted on the soil surface.158 Maize growth simulations. In order to get a proxy for real-world performances of the maize nitrogen fertilizer159 practices, we simulated maize growth responses to fertilization under the growing conditions of the Cercle of Koutiala160 in southern Mali using gym-DSSAT v0.0.7 developed from DSSAT v4.7 (Gautron and Padrón González, 2022).161 Gautron et al.: Preprint submitted to Elsevier Page 6 of 39 Bandits for best crop management identification Table 1 Main properties of the soil types of the fields of farmers growing maize in Koutiala, Mali (Adam et al., 2020). Soil name Texture SLDR SLOC SLDP AWCH pH Prop. ITML840101 clay loam 0.60 0.20 110 115 5.7 7 ITML840102 loam 0.60 0.45 100 124 5.5 9 ITML840103 silty loam 0.60 0.27 160 98 6.5 21 ITML840104 silty clay loam 0.25 0.70 105 101 5.5 4 ITML840105 silty clay loam 0.40 0.38 120 108 5.8 24 ITML840106 loam 0.60 0.30 110 115 5.7 27 ITML840107 silty clay loam 0.25 0.60 105 101 5.5 8 ‘SLDR’: soil drainage rate (fraction/day); ‘SLOC’: soil organic matter (g C/ 100 g soil) in the 0-30 cm topsoil; ‘SLDP’: soil depth (cm); ‘AWCH’: soil available water-holding capacity (mm); ‘pH’ is the pH in water; ‘Prop’ stands for the percentage of each soil type present in the study area. Table 2 Maize nitrogen fertilizer practices for maize considered during the virtual experiment in Koutiala, Southern Mali. The inclusion of rainfall and plant nitrogen stress as threshold factors in the fertilizer practice is denoted by “Yes” or “No”. Index of fertilizer practice Max. # of fertilizer applica- tions Rainfall threshold NSTRES threshold Application at 15 DAP (kgN/ha) Application at 30 DAP (kgN/ha) Application at 45 DAP (kgN/ha) Max. total amount applied (kgN/ha) 0 2 No No 15 120 0 135 1 2 No Yes 15 120 0 135 2 2 Yes No 15 120 0 135 3 2 Yes Yes 15 120 0 135 4 3 No No 15 60 60 135 5 3 No Yes 15 60 60 135 6 3 Yes No 15 60 60 135 7 3 Yes Yes 15 60 60 135 8 2 No No 23 0 47 70 9 3 No No 60 60 60 180 ‘NSTRES’ stands for plant nitrogen stress and ‘DAP’ for days after planting. gym-DSSAT is a modification of the DSSAT crop simulator (Hoogenboom et al., 2019) to allow a user to read daily162 internal DSSAT states and, accordingly, to be able to take fertilization decisions on a daily basis. Evidence of the163 reliability of DSSAT in simulating maize responses to different nitrogen fertilization practices under the conditions of164 southern Mali is provided by Falconnier et al. (2020); Huet et al. (2022). The soils (and associated model parameters)165 we used for simulations are the same as the ones used by Adam et al. (2020) who calibrated DSSAT for sorghum166 under different plant densities and nitrogen fertilizer practices in southern Mali. For each soil type (Table 1) that167 was parameterized in DSSAT (soil parameter files *.SOL), each simulated maize grain yield value is a sample of the168 yield response distribution for the considered fertilizer practice. This response distribution is the result of weather169 variability, generated in our study by the stochastic weather generator WGEN (Richardson and Wright, 1984; Soltani170 and Hoogenboom, 2003), which was parameterized using the 47-year weather records from the N’Tarla agricultural171 Gautron et al.: Preprint submitted to Elsevier Page 7 of 39 Bandits for best crop management identification research station of the Institute of Rural Economics (12◦35’ N, 5◦42’ W, 302 m.a.s.l.), about 30 km from the city of172 Koutiala (Ripoche et al., 2015). The ‘Sotubaka’ maize cultivar (original name ‘Suwan 1 SR’), from the DSSAT default173 cultivar list) was used for all model simulations as a representative of the maize varieties grown in southern Mali. This174 cultivar was parameterized by the DSSAT team for the conditions of southern Mali (Jones et al., 1998). Planting date175 was defined by an automatic rule (see Table A.2 in Supplementary Materials) depending on soil water conditions. At176 the start of the simulations, the initial soil mineral nitrogen content was set to a fixed, depending on the soil type as177 in Adam et al. (2020). Still, the variability of the weather from the beginning of the simulation to the occurrence of178 the automatic planting (itself dynamic) induced a variable initial soil mineral nitrogen content at the planting date for179 each simulation. Water and nitrogen stresses were simulated but yield reduction through pests and diseases were not180 considered, neither was weed competition.181 In the model simulations, a different weather time series was generated for each growing season but also for182 each farmer and thus fertilizer recommendation within a growing season using WGEN, inducing sets of independent183 simulated maize yield responses to nitrogen fertilization. The variability introduced by weather randomness was indeed184 the main source of uncertainty. If identical weather data would have been applied to all farmers during a single growing185 season, it would have resulted in gathering redundant information for those implementing the same fertilizer strategies186 under identical soil conditions. The modeling approach we embraced can be seen as distant farms encountering distinct187 weather patterns within the same year. Section A of Supplementary Materials gives further details of the DSSAT188 simulation settings.189 We simulated 105 times the maize grain yield responses to a given fertilizer practice for the different soil types,190 which corresponds to 105 hypothetical growing seasons. These samples were used i) to ensure that simulated maize191 yield responses were in realistic ranges, ii) to evaluate the complexity of the decision problem, and iii) to determine192 best nitrogen fertilizer practices whilst analyzing the performance of the crop management identification strategies. The193 samples were not provided to the algorithms prior to their learning (i.e. there was no prior knowledge of the problem).194 2.2. Performance indicators of fertilizer practices195 An indicator to evaluate both the economic and environmental performance of a nitrogen fertilizer practice 𝜋 is196 Agronomic Nitrogen use Efficiency (ANE), as defined by Vanlauwe et al. (2011):197 ANE𝜋 ∶= Y𝜋 − Y0 N𝜋 (1)198 where Y𝜋 is the crop yield obtained with the nitrogen fertilizer practice 𝜋 with a quantity N𝜋 of nitrogen, and Y0 is199 the yield of the control obtained in the same conditions without nitrogen fertilization. Maximizing ANE is a proxy of200 minimizing the quantity of nitrogen losses, e.g. through nitrate leaching.201 Gautron et al.: Preprint submitted to Elsevier Page 8 of 39 Bandits for best crop management identification However, there are certain limitations associated with using ANE as an indicator for optimizing fertilizer rates. For202 example, an ANE value of 25 kg grain/kg N can be achieved with a fertilizer input of 20 kg N/ha resulting in a total203 yield gain of 500 kg/ha, or with an input of 60 kg N/ha resulting in a total gain of 1500 kg/ha. For the same ANE, a204 farmer is likely to prefer the fertilizer practice that provides the greatest crop yield gain, i.e. with 60 kg N/ha. Similarly,205 selecting fertilizer practices only based on the associated crop yield gains is not satisfying. A similar yield gain can be206 achieved with different nitrogen fertilizer rates which result in different ANEs: the practice with the highest ANE must207 be preferred as it requires less nitrogen fertilizer to achieve the same yield gain.208 We introduced the Yield Excess (YE) indicator that favors the nitrogen fertilizer practice with the highest yield209 gain for those practices having the same ANE, and favors the practice with the highest ANE for those practices having210 the same yield gain. YE of a nitrogen fertilizer practice 𝜋 with respect to the reference practice 𝜋ref of fixed efficiency211 ANEref using the same quantity of nitrogen fertilizer as practice 𝜋, denoted N𝜋 , is computed as follows:212 YE𝜋 ∶= Y𝜋 − Y𝜋ref (2)213 = Y𝜋 − Y0 ⏟⏞⏞⏟⏞⏞⏟ yield gain of 𝜋 w.r.t. control − ( Y𝜋ref − Y0) ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ yield gain of 𝜋refw.r.t. control (3)214 = Y𝜋 − Y0 − N𝜋 × ANEref (4)215 = (Y𝜋 − Y0) × ( 1 − ANEref ANE𝜋 ) ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ penalization factor (5)216 The YE of practice 𝜋 with respect to the reference practice 𝜋ref corresponds to the yield difference between the217 practice 𝜋 and a reference practice that has a fixed ANE equal to ANEref and which uses the same quantity N𝜋 of218 nitrogen fertilizer as 𝜋. YE𝜋 increases with ANE𝜋 (Figure 3). YE𝜋 is negative and decreases with Y𝜋 − Y0 when219 ANE𝜋 < ANEref and is positive and increases with Y𝜋 − Y0 when ANE𝜋 ≥ ANEref. The YE of fertilizer practices220 with efficiency below ANEref are negatively affected by this metric. We chose ANEref = 15 kg grain/kg N for our study,221 i.e. the average ANE currently achieved by farmers across sub-Saharan Africa (Ten Berge et al., 2019; Vanlauwe et al.,222 2011).223 Because farmers are usually risk averse (e.g. Cerf and Sebillotte, 1997; Menapace et al., 2013; Jourdain et al., 2020),224 they are likely to prefer a stable maize grain yield of, for example, 3000 kg/ha rather than a yield of 5000 kg/ha in half of225 the years, and of 1000 kg/ha in the other half of the years, while both distributions have the same expectation. To account226 for risk aversion, we computed the Conditional-Value-at-Risk (CVaR, Mandelbrot, 1997; Acerbi and Tasche, 2002),227 a risk-aware measure that originated from the finance sector. Two definitions of the CVaR coexist in the literature,228 Gautron et al.: Preprint submitted to Elsevier Page 9 of 39 Bandits for best crop management identification Y π− Y 0 (kg/ha) 0 1000 2000 3000 4000 5000 AN E π (k g gr ai n/k g N ) 10 20 30 40 50 60 70 Y E π (k g/ h a) −10000 −8000 −6000 −4000 −2000 0 2000 4000 ANEref = 15 kg grain/ kg N Y π− Y 0 (kg/ha) 0 1000 2000 3000 4000 5000 AN E π (k g gr ai n/k g N ) 10 20 30 40 50 60 70 Y E π (k g/ h a) −10000 −8000 −6000 −4000 −2000 0 2000 4000 ANEref = 30 kg grain/ kg N −10000 −8000 −6000 −4000 −2000 0 2000 4000 Figure 3: Yield Excess (YE𝜋 , Equation 5) for ANEref = 15 kg grain /kg N (left) and ANEref = 30 kg grain /kg N (right) as a function of ANE𝜋 and Y𝜋 − Y0. ANE𝜋 is the Agronomic Nitrogen use Efficiency of the nitrogen fertilizer practice 𝜋 (Equation 1). Y𝜋 is the maize grain yield obtained with nitrogen fertilizer practice 𝜋, and Y0 is the yield obtained with no nitrogen fertilization (control). depending if an outcome is considered as a gain or a cost (Dowd, 2007). We adopted the gain point of view, in which229 the CVaR puts emphasize on the lower tail of a distribution. For a (continuous) random variable X with cumulative230 distribution function 𝐹𝑋 , we call Value-at-Risk (VaR) of level 𝛼 the quantile of probability 𝛼 ∈ (0, 1] of X, defined as:231 VaR𝛼(𝑋) ∶= inf {𝑥 ∈ ℝ ∶ 𝐹𝑋(𝑥) > 𝛼} (6)232 Then the CVaR of X of level 𝛼 ∈ (0, 1] is the mean value of the left tail of X of probability 𝛼, defined as:233 CVaR𝛼(𝑋) ∶= 𝔼[𝑋|𝑋 ≤ VaR𝛼(𝑋)] (7)234 A farmer is likely to choose the practice with the highest CVaR for the considered level 𝛼. The more 𝛼 → 0+, the235 more the measure puts emphasize on the worst observable yields. On the contrary, the more 𝛼 → 1, the less risk averse236 is the CVaR. When 𝛼 = 1, the CVaR equals the usual expectation 𝔼 [𝑋], which is risk neutral (Figure 4). In our study,237 we chose 𝛼 = 30%. The CVaR30% represents the mean crop yield of the 30% lowest observable yields.238 2.3. Identification of the best fertilizer practices239 2.3.1. A special type of bandit problem240 The identification of the best crop management with the constraint of minimizing farmers’ crop yield losses241 occurring during the process (Section 2.1) can be modeled as a special type of bandit problems. The canonical bandit242 Gautron et al.: Preprint submitted to Elsevier Page 10 of 39 Bandits for best crop management identification α C V aR α V aR α µ yield density (a) High risk aversion (𝛼 ≈ 20%) α C V aR α V aR αµ yield density (b) Low risk aversion (𝛼 ≈ 80%) Figure 4: Conditional Value-at-Risk (CVaR) of level 𝛼 in the case of high (a) and low(b) risk aversion. CVaR is the mean value of the blue area of the distribution of probability 0 < 𝛼 ≤ 1 . VaR𝛼 stands for Value-at-Risk of level 𝛼 and is the quantile of probability 𝛼 of the distribution. The more 𝛼 → 1, the more risk neutral is the CVaR. 𝜇 represents the mean value of the distribution which is equivalent to the CVaR of level 𝛼 = 100%. problem assumes that at each time step, a single trial is made and is followed by a single observation of a result,243 in a purely sequential mode. In contrast, the batched bandit setting (Perchet et al., 2015) assumes that at each time244 step an ensemble of trials are conducted in parallel, followed by the observation of an ensemble of results. Figure 2245 illustrates on the one hand the ensemble identification process of best crop fertilizer practices (Figure 2a), modeled as246 a batched-bandit problem, and on the other hand the canonical bandit problem (Figure 2b). In bandits problems that247 are risk-aware (Cassel et al., 2018), the agent maximizes a risk-aware measure of the collected rewards, such as the248 CVaR (Section 2.2), instead of the expectation of rewards. Our ensemble fertilizer decision problem can be described249 as a risk-aware batched-bandit decision problem.250 Formally, in our virtual experiment, for 𝑡 ∈ {1, 2,⋯ , 𝑇 }, in each season 𝑡, researchers assigned a number 𝑛𝑡251 of volunteer farmers for season 𝑡 with a nitrogen fertilizer practice 𝜋 ∈ {1, 2,⋯ , 𝐾}. Each farmer belonged to a252 cohort 𝑐 ∈ {1, 2,⋯ , 𝐶}. At the end of season 𝑡, researchers assemble rewards 𝑌𝑡 = {𝑦1𝑡 ,… , 𝑦𝑛𝑡𝑡 } as a result of253 the fertilizer practices of all farmers for season 𝑡. For each cohort 𝑐 ∈ {1,⋯ , 𝐶}, rewards are independently and254 identically distributed from unknown stationary distributions {𝜈𝑐1,⋯ , 𝜈𝑐𝐾}. These reward distributions are the YE with255 ANEref = 15 kg grain/kg N associated to each of the 10 recommended nitrogen fertilizer practices, for a given soil256 type. We denote 𝑇 = ⋃𝑇 𝑡=1 𝑌𝑡 the set of all rewards obtained by all farmers between 𝑡 = 1 and 𝑡 = 𝑇 . The objective257 of an identification strategy is to maximize, for a given CVaR level 𝛼 and any time horizon 𝑇 ≥ 1:258 𝔼[CVaR𝛼(𝑇 )] (8)259 Gautron et al.: Preprint submitted to Elsevier Page 11 of 39 Bandits for best crop management identification For each cohort 𝑐 ∈ {1,⋯ , 𝐶}, the best performing nitrogen fertilizer practice 𝜋𝑐 ∗ is given by:260 𝜋𝑐 ∗ = argmax 𝑘 CVaR𝛼(𝜈𝑐𝑘) (9)261 Consequently, an optimal identification strategy always assigns nitrogen fertilizer practice 𝜋𝑐 ∗ to all farmers belonging262 to cohort 𝑐.263 2.3.2. Identification strategies264 We expected fertilizer practices to perform differently within each cohort, i.e. for each soil type. For example, the265 best performing nitrogen fertilizer practices were expected to be different between cohorts of farmers growing maize on266 a shallow sandy soil versus a deep clayey soil. Consequently, the results of one cohort were not supposed to be directly267 relevant for another cohort. Thus, each cohort was considered to represent an independent identification problem, i.e.268 had its own independent identification strategy that did not share information with the identification strategies of other269 cohorts. On the other hand, for a given cohort, from one season to another, the identification strategy kept memory of270 all results obtained during past seasons. In our study, we considered two types of identification strategies: the standard271 ETC (Explore-Then-Commit) strategy, previously referred as the “intuitive strategy”, and the risk-aware-bandit based272 BCB (Bounded-CVaR-Thompson-Sampling Batch) strategy. For the seven soil types in Table 1, the two identification273 strategy types were either all ETC, or all BCB, but not a mix of both.274 Intuitive identification strategy (ETC) ETC provides a simple and intuitive solution to the exploration-exploitation275 dilemma (Lattimore and Szepesvári, 2020). During an initial exploration phase of an arbitrary number of years, ETC276 equiproportionally test all nitrogen fertilizer practices. Thereafter, the exploitation phase starts and ETC chooses for277 the remaining time the fertilizer strategy that has shown best performance during the exploration phase. In Section B.2278 of Supplementary Materials, we provide a simple adaptation of ETC to the batch setting (see Section 2.1) using the279 CVaR of rewards rather than the classical expectation of rewards. We considered ETC-3 and ETC-5, with respectively280 three and five years for the exploration phase. During the exploration phase, fertilizer practices are randomly assigned281 in equal proportions to the farmers within the cohort.282 Bandit based identification strategy (BCB) BCB is a risk-aware bandit algorithm (Cassel et al., 2018) which uses283 the CVaR of rewards as decision criterion, in the batched bandit setting. BCB is an extension of the algorithm presented284 in Baudry et al. (2021a). We specifically designed BCB for the identification of best management practices with a group285 of farmers. The general idea of this bandit algorithm is, to use in each crop growing season the rewards acquired during286 all past growing seasons, such that the algorithm learns to optimally manage the exploration-exploitation dilemma.287 Gautron et al.: Preprint submitted to Elsevier Page 12 of 39 Bandits for best crop management identification An overview of the execution of BCB is shown in Algorithm 1. More detailed information can be found in288 Supplementary Materials Section B.1. For the execution of BCB (see first step of Algorithm 1), we set the maximum289 obtainable maize YE at 4000 kg/ha (Figure 3) for ANEref = 15 kg grain/kg N for all fertilizer practices. The statistical290 performance guarantees are demonstrated in Section C, Supplementary Materials.291 Algorithm 1 Simplified pseudo-code of BCB (Bounded-CVaR-Thompson-Sampling Batch). for fertilizer practice 𝑘 ∈ {1,⋯ , 𝐾} do Add maximum observable value to the results of fertilizer practice 𝑘 // prior to any experiments end for season 𝑡 ∈ {1,⋯ , 𝑇 } do for farmer 𝑓 ∈ {1,⋯ , 𝑛} do for fertilizer practice 𝑘 ∈ {1,⋯ , 𝐾} do Re-weight the rewards of the fertilizer practice 𝑘 with random weights sampled from a Dirichlet distribution (Everitt and Skrondal, 2002) Score practice 𝑘 with a noisy empirical measure of the CVaR at level 𝛼 of practice 𝑘 from the re-weighted rewards end Recommend to the farmer 𝑓 the fertilizer practice with the maximum score end Collect and store all results of the season for all fertilizer practices end 2.4. Performance measures of the identification strategies292 In order to compare the identification strategies, for each season 𝑡, we computed:293 1. An empirical measure of the objective that was defined in Equation 8. This is an estimate of the CVaR at 30%294 of the YE obtained by an identification strategy; it should be maximized. See Section D.1 of Supplementary295 Materials.296 2. The cumulated regret, which mirrors the empirical measure of the objective expressed in term of cumulated maize297 yield loss, and should be minimized. Yield loss is measured as the performance difference in yield between an298 omniscient identification strategy (that always chooses the best available fertilization practice for each soil type)299 and the strategy being evaluated. The cumulated regret is a convenient statistic for theorical performance analysis300 and for representing the performance of an algorithm with less noise compared to the empirical measure of the301 objective. See Section D.2 of Supplementary Materials.302 Gautron et al.: Preprint submitted to Elsevier Page 13 of 39 Bandits for best crop management identification 3. Results303 3.1. Simulated maize yield responses to nitrogen fertilizer practices304 All simulated maize yield responses to nitrogen fertilization showed values within the expected ranges for the305 growing conditions in Koutiala, with an average grain yield varying from 3125 kg/ha for a sandy soil with low306 fertility (ITML84105) up to 3945 kg/ha for a loamy soil (ITML84106). When applying the most promising fertilization307 strategies, YE (i.e. yield gain compared to the reference) ranged from 1200 kg/ha to 1800 kg/ha, and CVaR30%(YE)308 (i.e. the mean crop YE of the 30% lowest yields) from 500 kg/ha to 1032 kg/ha. Table 3 provides the statistics of the309 best available nitrogen fertilizer practices for each soil type (Table 1), and Figure A.1 in Supplementary Materials310 shows the distributions of grain maize yield, ANE and YE responses.311 There was no simple parametric assumption that could be made about YE, such as its probability distribution to be312 Gaussian (e.g., for fertilizer Practice 5, Figure A.1e). The left fat tails for e.g. Practices 0 and 4 or the bi-modality of YE313 e.g. for Practices 6 and 7 (Figure A.1e), further supported the use of CVaR as a relevant risk measure. Above all, CVaR314 is most relevant for asymmetric and irregularly shaped distributions, such as fat-tailed or multi-modal distributions315 (e.g. Rockafellar et al., 2000). For all soil types, the best available nitrogen fertilizer practices were either Practice 0 or316 8 i.e. practices with a single nitrogen top-dressing application that is not threshold dependent (Table 3).317 Yet, the fertilizer practices had different responses for the different soil types in terms of grain yield and ANE (and318 consequently YE), and ranking of the practices was inconsistent across the soil types (Figure A.1). For instance, for319 the soil ITML840104 (silt clay loam of medium fertility), Practices 0 to 4 all had similar YE values (Figure A.1e),320 whilst, for the soil ITML840105 (silt clay loam of low fertility), Practices 0, 1 and 4 had substantially higher YE values321 compared to Practices 2 and 3 (Figure A.1f).322 Threshold-based fertilizer practices behaved inconsistently across the soil types. For example, is the case of the323 bi-modal YE distribution of the Practice 1, the YE probability density was predominantly concentrated around 0 kg/ha324 for the soil ITML840104 (Figure A.1e) and around 1800 kg/ha for the soil ITML840105 (Figure A.1f). The low to zero325 YE values in the case of soil ITML840104 and Practice 1 can be attributed to the fact that the nitrogen-stress factor326 threshold of 0.2 was not reached in most seasons, and consequently no top-dressing occurred (Table 2). In such cases,327 only a basal dressing of 15 kg N/ha was applied, instead of the total 135 kg N/ha when the top dressing was triggered.328 The associated probability density of grain yield was concentrated around 1000 kg/ha (Figure A.1a). On the other329 hand, for the soil ITML840105, the nitrogen-stress threshold of 0.2 was reached most seasons and Practice 1 involved330 both basal and top-dressing fertilization. This corresponded to YE values of around 1800 kg/ha (Figure A.1f), and the331 corresponding grain yields were generally around 4000 kg/ha (Figure A.1b).332 Gautron et al.: Preprint submitted to Elsevier Page 14 of 39 Bandits for best crop management identification Table 3 Statistics of the best available nitrogen fertilizer practices for each of the soil types presented in Table 1. soil 𝜋∗ N̄𝜋∗ CVaR30%(Y 𝜋∗ ) Ȳ𝜋∗ ̄ANE𝜋∗ CVaR30%(YE𝜋∗ ) ȲE𝜋∗ (kg/ha) (kg/ha) (kg/ha) (kg/kg) (kg/ha) (kg/ha) ITML840101 0 120.0 (1.0) 3091 3874 (666) 30.0 (5.4) 1032 1795 (651) ITML840102 8 69.8 (4.0) 2391 3150 (653) 33.2 (7.5) 652 1270 (529) ITML840103 8 70.0 (0.4) 2539 3152 (526) 34.4 (6.8) 808 1356 (475) ITML840104 8 69.9 (2.7) 2533 3339 (682) 31.7 (8.1) 500 1169 (565) ITML840105 8 70.0 (1.2) 2467 3127 (570) 34.2 (7.3) 757 1346 (508) ITML840106 0 120.0 (1.2) 3132 3945 (695) 28.9 (5.5) 900 1667 (660) ITML840107 8 69.9 (2.7) 2472 3247 (659) 32.5 (8.0) 565 1226 (559) For the corresponding best available nitrogen fertilizer practice 𝜋∗, we define N𝜋∗ : quantity of nitrogen fertilizer applied; CVaR30%(𝑋): conditional Value-at-Risk of 𝑋 of level 30% (Section 2.2); 𝑋̄: mean value of 𝑋; Y𝜋∗ : maize grain yield; ANE𝜋∗ : Agronomic Nitrogen use Efficiency; YE𝜋∗ : Yield Excess (Section 2.2); values in parentheses indicate standard deviations. 3.2. Identification of best fertilizer practices333 Section 3.2.1 provides a visual comparison of nitrogen fertilizer recommendations following the BCB and ETC-5334 identification strategies. In Section 3.2.2, we present a direct measure of empirical performances of the nitrogen335 fertilizer practice identification strategies (see also Section D.1 in Supplementary Materials), and in Section 3.2.3,336 we illustrate the regret as a proxy measure (see also Section D.2).337 3.2.1. Visualization of identification strategies338 Figure 5 provides the average proportions at which the fertilizer practices were selected by the identification339 strategies, from the beginning of the experiment to time 𝑇 , exemplified for soil types ITML840105 and ITML840101.340 For the soil ITML840105 (silt clay loam of low fertility), after a span of 20 years of experimentation, BCB selected the341 Practice 8, which was the best available one for this soil type (see Table 3), with an average proportion of 50%. ETC-5342 also decided on the same practice, with an average proportion of 31%. For the soil ITML840101, BCB and ETC-5343 similarly performed after 20 years of experimentation. For this soil type, BCB sampled the best available Practice 0344 (Table 3) with an average proportion of 27%, ETC-5 selected the same practice with an average proportion of 26%. In345 the case of ETC-5, the constant and equal proportions of each management practice during the five first years seen in346 Figures 5b and 5d illustrate the equiproportional initial exploration phase used by the strategy.347 3.2.2. Empirical measure of the objective348 On average, farmers following the nitrogen fertilizer recommendations based on the BCB identification strategy had349 a higher empirical CVaR at 30% of YE than farmers following the recommendations from the ETC strategies, from350 the second year of the experiment onwards. Figure 6 shows the evolution of the CVaR at 30% of the YE for all cohorts351 Gautron et al.: Preprint submitted to Elsevier Page 15 of 39 Bandits for best crop management identification 2 4 6 8 10 12 14 16 18 20 time step (year) 0% 20% 40% 60% 80% 100% p ro p o rt io n i n s a m p lin g Identification strategy of BCB ; soil ITML840105 960 replications practice index 8 0 4 1 5 9 7 3 2 6 (a) BCB sampling proportions for soil ITML840105. 2 4 6 8 10 12 14 16 18 20 time step (year) 0% 20% 40% 60% 80% 100% p ro p o rt io n i n s a m p lin g Identification strategy of ETC_5 ; soil ITML840105 960 replications practice index 8 0 4 1 5 9 7 3 2 6 (b) ETC-5 sampling proportions for soil ITML840105. 2 4 6 8 10 12 14 16 18 20 time step (year) 0% 20% 40% 60% 80% 100% p ro p o rt io n i n s a m p lin g Identification strategy of BCB ; soil ITML840101 960 replications practice index 0 1 4 8 9 5 7 6 2 3 (c) BCB sampling proportions for soil ITML840101. 2 4 6 8 10 12 14 16 18 20 time step (year) 0% 20% 40% 60% 80% 100% p ro p o rt io n i n s a m p lin g Identification strategy of ETC_5 ; soil ITML840101 960 replications practice index 0 1 4 8 9 5 7 6 2 3 (d) ETC-5 sampling proportions for soil ITML840101. Figure 5: Averaged sampling proportions for soils ITML840105 and ITML840101, 𝑇 = 20 years. The whole experiment was replicated 960 times with a different random generator initialization each time. The fertilizer practices are ordered according to the true Conditional Value-at-Risk at level 30% (CVaR) of their Yield Excess (YE) with ANEref=15 kg grain/kg N ; the greener the color, the better a fertilizer practice is. Close colors indicate similar practice performances. BCB: Bounded-CVaR-Thompson-Sampling Batch; ETC-5: Explore-Then-Commit strategy with an exploration phase of 5 years. (soil types) throughout the years (Equation D.2). The difference in performance between BCB and ETC is relatively352 high during the initial years. For instance, at year 4, farmers following recommendations from the BCB identification353 strategy had a CVaR at 30% of YE of 318 kg/ha, compared to 168 kg/ha (47% less than BCB) and 74 kg/ha (77% less354 than BCB) for farmers following the recommendations from the ETC-3 and the ETC-5 strategies, respectively. Thus,355 BCB allowed to identify sooner the best available fertilizer practices and consequently further avoided low crop yield356 outcomes compared to ETC strategies. ETC strategies were adversely affected by their exploration phases during which357 Gautron et al.: Preprint submitted to Elsevier Page 16 of 39 Bandits for best crop management identification 2 4 6 8 10 12 14 16 18 20 time step T (year) 100 200 300 400 500 e m p ir ic a l C V a R o f Y E ( kg /h a ) Empirical CVaR @ alpha=30% ; mean batch size: 299 960 replications BCB ETC_3 ETC_5 90% confidence interval Figure 6: Empirical conditional Value-at-Risk (CVaR) at level 30% (CVaR) of maize Yield Excesses (YE) between 𝑇 = 0 and the considered 𝑇 ; ANEref = 15 kg grain/kg N. The whole experiment was replicated 960 times with a different random generator initialization each time. One time step 𝑇 is one year ; ‘mean batch size’ is the number of virtual farmers who have volunteered to participate in the experiment, averaged over all years and all replications. Confidence intervals were computed following Thomas and Learned-Miller (2019). all fertilizer practices were equiproportionally tested. In contrast, BCB had a continuously increasing empirical CVaR,358 during the whole duration of the experiment.359 3.2.3. Cumulated regret360 For 𝛼 = 30%, the BCB identification strategy outperformed ETC strategies, regardless of the number of years361 during which the strategy was applied. Figure 7 shows the evolution of the mean cumulated regret for all cohorts362 throughout the years of the simulated experiment (Equation D.5). The difference in performance between BCB and ETC363 increased for the whole duration of the experiment. After 20 years, farmers following recommendations from the BCB364 identification strategy experienced a mean cumulated regret of 2400 kg/ha, compared to 3385 kg/ha (41% more than365 BCB) and 3701 kg/ha (54% more than BCB) for farmers following the recommendations respectively from the ETC-3 and366 ETC-5 strategies. Consequently, farmers following BCB recommendations accumulated less regret compared to farmers367 following ETC recommendations. Furthermore, the variance of the cumulated regret (due to the different weather series368 in the experiments, for each season and each field trial, and the variability in cohorts each year) was smaller for BCB than369 for ETC, confirming that the BCB strategy was more robust (see quantile ranges in Figure 7) for this decision problem.370 Gautron et al.: Preprint submitted to Elsevier Page 17 of 39 Bandits for best crop management identification 0 2 4 6 8 10 12 14 16 18 20 time step T (year) 0 1000 2000 3000 4000 5000 cu m u la te d Y E C V A R r e g re t (k g /h a ) Averaged over #960 replications for alpha=30% mean batch size: 299 BCB ETC_3 ETC_5 0.05 to 0.95 quantile range Figure 7: Mean cumulated regret of population, for the Conditional Value-at-Risk (CVaR) at level 30% of Yield Excess (YE); ANEref = 15 kg grain/kg N. The cumulated regret is averaged over the virtual farmers’ population, between 𝑇 = 0 and the considered 𝑇 . The whole experiment was replicated 960 times with a different random generator initialization each time. One time step 𝑇 is one year, ‘mean batch size’ is the number of virtual farmers who have volunteered to participate in the experiment, averaged over all years and all replicates. 3.2.4. Sensitivity analysis371 In Section E of Supplementary Materials, we present the same as in Sections 3.2.2 and 3.2.3 but for higher CVaR372 levels of 𝛼 = 50% and 𝛼 = 100%, respectively. The CVaR with 𝛼 = 1 recovers the usual expectation 𝔼[𝑋]. For373 𝛼 = 50%, the difference between BCB on one side and ETC-3 and ETC-5 on the other other side is similar to what374 was observed for 𝛼 = 30%. For 𝛼 = 100%, ETC-3 was the best performer, and BCB and ETC-5 performed equally.375 Nonetheless, BCB showed a smaller variance than both ETC-3 and ETC-5.376 4. Discussion377 4.1. Benefits from an adaptive identification strategy378 Practical perspective In multi-year multi-location on-farm trials, participating farmers simultaneously conduct field379 experiments with crops over multiple seasons to compare e.g. crop management practices (e.g. Naudin et al., 2010;380 Baudron et al., 2012; Falconnier et al., 2016). After a given number of years, results i.e. crop yields, are typically381 analyzed using mixed linear models (Laird and Ware, 1982) to account for random effects associated with fields382 and farms. Best crop management practices are then identified by researchers, based on the statistical analyses. In383 our simulated nitrogen fertilization practice decision problem, we adopted the intuitive ETC identification strategy as384 Gautron et al.: Preprint submitted to Elsevier Page 18 of 39 Bandits for best crop management identification a substitute for the traditional approach of designing and analyzing multi-year, multi-location on-farm trials. Both385 replicated on-farm trials and ETC consist of an exploration phase of a fixed duration (data collection), followed by386 an exploitation phase (application of the best identified practice(s) after analysis of collected data). Consequently,387 both replicated on-farm trials and ETC can be considered as non-adaptive identification strategies: before the end388 of the exploration phase, the intermediary results are not exploited to gradually refine the experimental setup. In389 contrast, bandit-based identification strategies, such as BCB, refine the recommendations every year, based on the results390 observed in the previous years. The better a crop management practice, the more its representation among the tested391 practices grows over time. From a farmer’s perspective, this mean that the probability of testing worse performing392 recommendations (compared to the best available practice) decreases over time. This is in contrast with non-adaptive393 identification strategies, that equi-proportionally recommend all crop management practices during the exploration394 phase. The cost of the identification of best management practices is likely to be reduced for the farmers when using395 bandit-based approaches. Another common method to generate crop management recommendation consists in the396 use of calibrated crop simulation models and scenario analyses (e.g. Huet et al., 2022). Although this method has397 its limitations due to model uncertainty (Yin et al., 2017), it can be complementary to the bandit-based approach.398 For example, a set of candidate crop management practices can first be determined based on outcomes from crop399 modeling, and out of those, the true best option can then be identified from field trials with the bandit algorithms for400 the experimental set-up.401 Theoretical perspective ETC is theoretically proven to be a sub-optimal identification strategy without a calibration402 of the duration of the exploration phase that would require strong prior knowledge on the complexity of the decision403 problem, which is usually unavailable (Lattimore and Szepesvári, 2020, Chapter 6). In the numerical experiments, for404 𝛼 = 100%, ETC-3, with the tree years of an exploration phase, performed best. This can be likely associated with the405 particular YE distributions and the size of the farmer groups in question. A relatively minor change in the decision406 problem may, however, induce that three years of exploration is no longer optimal (e.g. by changing 𝛼 to 30% or 50%).407 More generally, prior to an experiment, there is no guarantee than an arbitrary number of years of exploration in the408 ETC strategy will be optimal, and consequently there are no guarantees about the performance of ETC, as opposed to409 BCB (see theoretical results in Section C). The main benefit of the BCB strategy over the ETC strategy is that it eliminates410 the need to select parameter values that require knowledge that is a priori not available. BCB neither requires strong411 assumptions about probability laws of reward distributions, as opposed to other common bandit algorithms. The only412 requisite for BCB is knowledge of the maximum observable reward. In agronomy, such knowledge is usually available413 through expert knowledge. For instance, considering crop yield as reward, an expert can usually estimate the crop yield414 Gautron et al.: Preprint submitted to Elsevier Page 19 of 39 Bandits for best crop management identification potential under the given crop growing conditions, e.g., through crop growth modeling or from field experiments that415 are conducted under optimal growing conditions (Affholder et al., 2013).416 4.2. Adaptations to real field conditions417 Definition of fertilization practices In our study, the mean simulated maize yield with a total mineral nitrogen418 fertilizer application of 70 kg N/ha (Practice 8) ranged from 2.6 to 3.2 t/ha depending on soil type. In a set of on-farm419 experiments in the region of Koutiala, Falconnier et al. (2016) observed that maize grain yield interquartile ranged420 between 1.2 and 3.1 t/ha for a local variety and a total mineral nitrogen fertilizer application of 80 kg N/ha (86 on-farm421 trials), which confirms our model simulations were reliable. As reported by Falconnier et al. (2016), yield was mostly422 influenced by seasonal weather, soil type (water holding capacity) and the preceding effect of the previous crop. For all423 soil types, none of the identified best performing nitrogen fertilizer practices in our simulated experiment were based424 on thresholds linked to rainfall or plant nitrogen status, nor involved split top-dressing. This does, however, not discard425 the potential benefit from threshold-based fertilizer practices, or split top-dressing. In semi-arid conditions such as426 in Koutilia, crop nitrogen demand can be highly different between seasons and, therefore, should be evaluated as the427 season develops (Piha, 1993). It is however, not certain that models such as DSSAT reproduce the crop responses to split428 applications, if no nitrogen losses occur through e.g., nitrate leaching. It is important that the set of candidate fertilizer429 practices to explore are carefully selected from the vast possible combinations of practice attributes, e.g., application430 time, rate and number of splits. In our study, the values for the fertilizer practice attributes were most likely not optimal,431 because our objective was on establishing an improved generic identification method for crop management, rather than432 designing refined fertilizer recommendations. For an application in real field conditions, we recommend these attributes433 to be first estimated using existing expert knowledge and/or crop growth model simulations (see Section 4.1). The434 set of candidate practices can also comprise practices that are based on advanced methods, such as refined balance-435 based methods or machine learning-based methods for nitrogen fertilization (e.g., Morris et al., 2018; Timsina et al.,436 2021). More generally, the design of fertilizer recommendations must include experts, local extensionists and farmers437 themselves (Cerf and Meynard, 2006; Hochman and Carberry, 2011). Finally, it is important to take into account that438 the quantity of mineral fertilizer a farmer can apply often depends on access to financial resources and markets (Jayne439 et al., 2003).440 Objective to maximize We defined the farmers’ objective as maximizing the CVaR at level 𝛼 = 30% of the YE with441 ANEref = 15 kg grain/kg N. The YE indicator is meaningful as it represents the yield gain compared to a reference442 fertilizer practice, and can be easily calculated. The value of 𝛼 allows to adjust the risk aversion level for a cohort443 of farmers. The value of ANEref defines an invariant economic and environmental trade-off setting the boundaries of444 the performance of nitrogen fertilizer use. Considering a set of pre-defined nitrogen fertilizer practices, yield losses445 Gautron et al.: Preprint submitted to Elsevier Page 20 of 39 Bandits for best crop management identification were defined as the expected performance difference between the best available fertilizer practice and the other less446 performing practices, in the face of the seasonal weather uncertainty.447 However, in our study we did not evaluate fertilizer practices by their economic return that depends on many factors,448 such as fertilizer subsidies, fertilizer market price, application costs, and grain selling prices. Including those factors449 dramatically increases the complexity of the identification problem, and so does the required amount of data to identify450 the best practices (we provide more details in Supplementary Materials, Section F). In this context, we must keep in451 mind the inherent constraints of modeling farmer’s objectives and decisions, that always remains a proxy for real life452 situations and choices (McCown, 2002). It is evident that farmers should play an active role in the formulation and453 validation of the objective to maximize, ensuring that mathematical terms are meaningfully translated into practical454 cases of crop management decision problems.455 Evaluation of the method Statistical performance guarantees of the identification strategies hold in the face of the456 many possible decision problems, i.e., in many different real field conditions. In the simulated experiment, we quantified457 the performances of identification strategies with thousands of replications in order to be as confident as possible in their458 evaluation. However, conducting the identification strategies in real conditions poses challenges, and is not meaningful459 when done on a single experiment. Rather, the identification strategies should be tested in as many as possible real-460 world situations. Such large-scale experiments are costly, but necessary to objectively evaluate identification strategies,461 including the "intuitive strategy".462 4.3. Limits and possible improvements463 In our simulated crop management decision problem, we largely simplified the experimental structure of multi-464 location, multi-year replicated field trials. First, weather time series random variables were independent and identically465 distributed for all model simulations. Such assumption is unlikely to be true in the real world, because weather spatial466 correlations can be high, for instance in case of extreme weather events (Tack and Holt, 2016). Second, within the467 same cohort, it was assumed that had identical soil type and maize cultivars, and were closely adhering to the assigned468 fertilizer practices. For the application of our methodology in real field conditions, variations in site conditions and469 other potential random effects should be properly considered. This means that the bandit identification strategy we470 introduced should be extended to account for experimental structure and multiple factors at stake. Here, contextual471 bandits (Lattimore and Szepesvári, 2020) could potentially provide solutions by enabling the sharing of information472 between similar decision contexts (the cohorts) and similar fertilizer practices (i.e., the data collected about a given473 context/fertilizer practice provides also insights for other “close” contexts/fertilizer practices).474 Finally, in the model simulations, we assumed average weather (rainfall, temperature) in southern Mali to remain475 the same throughout the 20 years of the experiment. Such hypothesis is unlikely in real conditions given climate476 Gautron et al.: Preprint submitted to Elsevier Page 21 of 39 Bandits for best crop management identification change (e.g., Traore et al., 2017). From a decision problem perspective, best available management practices are likely477 to change over time under climate change, as a response to the increasing occurrences of heat and water stress (Adam478 et al., 2020). Such problem can be formalized as a non-stationary bandit problem (Lattimore and Szepesvári, 2020). To479 handle this, the BCB strategy can be equipped with a sliding window approach where the algorithm primarily focuses480 on the most recent rewards, discarding older ones over time (Garivier and Moulines, 2011; Baudry et al., 2021b).481 5. Conclusion482 Bandit learning algorithms aim at optimally balancing between exploration (gathering information) and exploita-483 tion (using the information) in uncertain decision problems with repeated choice between contending actions. In a484 simulated problem of the identification of best fertilizer practices with virtual farmers, we compared our bandit-based485 algorithm to the “intuitive strategy” of Explore-Then-Commit (ETC) in which the set of pre-defined practices are tested486 in an equiproportional way during a fixed number of years. During simulated field trials in southern Mali, the bandit-487 based identification strategy minimized maize yield losses from testing worse performing fertilizer practices compared488 to the true best available practice by up to 35% after 20 years. This novel approach opens up new perspectives as489 an alternative to the usual multi-year, multi-location on-farm trials. The bandit-based identification strategy can be490 employed to identify best management practices in real field conditions, if variability in site conditions, possible491 correlations between site conditions, and other potential random effects are further considered.492 Software availability493 All the numerical experiments in this paper are meant to be as reproducible as possible, and the code is open source.494 The Python code with the necessary packages, instructions and experimental data are provided in the following public495 GitLab repository: https://gitlab.inria.fr/rgautron/batch-cvts/-/tree/master. The simulations are496 performed with gym-DSSAT (https://gitlab.inria.fr/rgautron/gym_dssat_pdi), a modified version of the497 Decision Support System for Agrotechnology Transfer (DSSAT) software (https://dssat.net/).498 Declaration of competing interests499 The authors declare that they have no known competing financial interests or personal relationships that could have500 appeared to influence the work reported in this paper.501 Acknowledgments502 This work has been supported by:503 • The French Agricultural Research Centre for International Development (CIRAD).504 Gautron et al.: Preprint submitted to Elsevier Page 22 of 39 https://gitlab.inria.fr/rgautron/batch-cvts/-/tree/master https://gitlab.inria.fr/rgautron/gym_dssat_pdi https://dssat.net/ Bandits for best crop management identification • The Consultative Group for International Agricultural Research (CGIAR) Platform for Big Data in Agriculture.505 • The French Ministry of Higher Education and Research, Hauts-de-France region, Inria within the Scool team506 project and MEL.507 Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest508 group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see509 https://www.grid5000.fr).510 References511 Acerbi, C., Tasche, D., 2002. On the coherence of expected shortfall. Journal of Banking & Finance 26, 1487–1503. doi:10.1016/512 S0378-4266(02)00283-2.513 Adam, M., MacCarthy, D.S., Traoré, P.C.S., Nenkam, A., Freduah, B.S., Ly, M., Adiku, S.G., 2020. Which is more important to sorghum production514 systems in the sudano-sahelian zone of west africa: Climate change or improved management practices? Agricultural Systems 185, 102920.515 Affholder, F., 1995. Effect of organic matter input on the water balance and yield of millet under tropical dryland condition. Field Crops Research516 41, 109–121.517 Affholder, F., Poeydebat, C., Corbeels, M., Scopel, E., Tittonell, P., 2013. The yield gap of major food crops in family agriculture in the tropics:518 Assessment and analysis through field surveys and modelling. Field Crops Research 143, 106–118.519 Agrawal, S., Koolen, W.M., Juneja, S., 2021. Optimal best-arm identification methods for tail-risk measures, in: Advances in Neural Information520 Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual.521 Baudron, F., Tittonell, P., Corbeels, M., Letourmy, P., Giller, K.E., 2012. Comparative performance of conservation agriculture and current522 smallholder farming practices in semi-arid zimbabwe. Field crops research 132, 117–128.523 Baudry, D., Gautron, R., Kaufmann, E., Maillard, O., 2021a. Optimal thompson sampling strategies for support-aware cvar bandits, in: International524 Conference on Machine Learning, PMLR. pp. 716–726.525 Baudry, D., Russac, Y., Cappé, O., 2021b. On Limited-Memory Subsampling Strategies for Bandits, in: ICML 2021- International Conference on526 Machine Learning, Vienna / Virtual, Austria.527 Cassel, A., Mannor, S., Zeevi, A., 2018. A general approach to multi-armed bandits under risk criteria, in: Conference On Learning Theory, PMLR.528 pp. 1295–1306.529 Cerf, M., Meynard, J.M., 2006. Les outils de pilotage des cultures: diversité de leurs usages et enseignements pour leur conception. Natures Sciences530 Sociétés 14, 19–29.531 Cerf, M., Sebillotte, M., 1997. Approche cognitive des décisions de production dans l’exploitation agricole [confrontation aux théories de la532 décision]. Economie rurale 239, 11–18.533 Coulibaly, D., Sissoko, F., Doumbia, S., Ba, A., Dembele, B., 2017. Evaluation de l’effet de la fertilisation minerale sur la production de varietes534 ameliorees de mais et le disponible fourrager en zone cotonniere du mali-sud (mali). Agronomie Africaine 29, 109–117.535 Dowd, K., 2007. Measuring market risk. John Wiley & Sons.536 Evans, K.J., Terhorst, A., Kang, B.H., 2017. From data to decisions: helping crop producers build their actionable knowledge. Critical reviews in537 plant sciences 36, 71–88.538 Everitt, B., Skrondal, A., 2002. The Cambridge dictionary of statistics. volume 106. Cambridge University Press Cambridge.539 Gautron et al.: Preprint submitted to Elsevier Page 23 of 39 https://www.grid5000.fr http://dx.doi.org/10.1016/S0378-4266(02)00283-2 http://dx.doi.org/10.1016/S0378-4266(02)00283-2 http://dx.doi.org/10.1016/S0378-4266(02)00283-2 Bandits for best crop management identification Falconnier, G.N., Corbeels, M., Boote, K.J., Affholder, F., Adam, M., MacCarthy, D.S., Ruane, A.C., Nendel, C., Whitbread, A.M., Justes, É., et al.,540 2020. Modelling climate change impacts on maize yields under low nitrogen input conditions in sub-saharan africa. Global change biology 26,541 5942–5964.542 Falconnier, G.N., Descheemaeker, K., Van Mourik, T.A., Giller, K.E., 2016. Unravelling the causes of variability in crop yields and treatment543 responses for better tailoring of options for sustainable intensification in southern mali. Field Crops Research 187, 113–126.544 Fosu-Mensah, B., MacCarthy, D., Vlek, P., Safo, E., 2012. Simulating impact of seasonal climatic variation on the response of maize (zea mays l.)545 to inorganic fertilizer in sub-humid ghana. Nutrient cycling in agroecosystems 94, 255–271.546 Garivier, A., Moulines, E., 2011. On upper-confidence bound policies for switching bandit problems, in: International Conference on Algorithmic547 Learning Theory, Springer. pp. 174–188.548 Gautron, R., Maillard, O.A., Preux, P., Corbeels, M., Sabbadin, R., 2022. Reinforcement learning for crop management support: Review, prospects549 and challenges. Computers and Electronics in Agriculture 200, 107182.550 Gautron, R., Padrón González, E.J., 2022. gym-DSSAT - A crop model turned into a Reinforcement Learning environment. URL: https:551 //gitlab.inria.fr/rgautron/gym_dssat_pdi.552 Getnet, M., Van Ittersum, M., Hengsdijk, H., Descheemaeker, K., 2016. Yield gaps and resource use across farming zones in the central rift valley553 of ethiopia. Experimental Agriculture 52, 493–517.554 Hochman, Z., Carberry, P., 2011. Emerging consensus on desirable characteristics of tools to support farmers’ management of climate risk in555 australia. Agricultural Systems 104, 441–450.556 Hoogenboom, G., Porter, C., Boote, K., Shelia, V., Wilkens, P., Singh, U., White, J., Asseng, S., Lizaso, J., Moreno, L., et al., 2019. The dssat crop557 modeling ecosystem. Advances in crop modelling for a sustainable agriculture , 173–216.558 Huet, E., Adam, M., Traore, B., Giller, K., Descheemaeker, K., 2022. Coping with cereal production risks due to the vagaries of weather, labour559 shortages and input markets through management in southern mali. European Journal of Agronomy 140, 126587.560 Jayne, T.S., Govereh, J., Wanzala, M., Demeke, M., 2003. Fertilizer market development: a comparative analysis of ethiopia, kenya, and zambia.561 Food policy 28, 293–316.562 Jones, J., Tsuji, G., Hoogenboom, G., Hunt, L., Thornton, P.K., Wilkens, P., Imamura, D., Bowen, W., Singh, U., 1998. Decision support system563 for agrotechnology transfer: Dssat v3. Understanding options for agricultural production , 157–177.564 Jourdain, D., Lairez, J., Striffler, B., Affholder, F., 2020. Farmers’ preference for cropping systems and the development of sustainable intensification:565 a choice experiment approach. Review of Agricultural, Food and Environmental Studies 101, 417–437.566 Kalaji, H.M., Dabrowski, P., Cetner, M.D., Samborska, I.A., Lukasik, I., Brestic, M., Zivcak, M., Tomasz, H., Mojski, J., Kociel, H., et al., 2017. A567 comparison between different chlorophyll content meters under nutrient deficiency conditions. Journal of Plant Nutrition 40, 1024–1034.568 Laird, N.M., Ware, J.H., 1982. Random-effects models for longitudinal data. Biometrics , 963–974.569 Lattimore, T., Szepesvári, C., 2020. Bandit algorithms. Cambridge University Press.570 Mandelbrot, B.B., 1997. The variation of certain speculative prices, in: Fractals and scaling in finance. Springer, pp. 371–418.571 Massart, P., 1990. The tight constant in the dvoretzky-kiefer-wolfowitz inequality. Annals of Probability 18.572 McCown, R.L., 2002. Changing systems for supporting farmers’ decisions: problems, paradigms, and prospects. Agricultural systems 74, 179–220.573 Menapace, L., Colson, G., Raffaelli, R., 2013. Risk aversion, subjective beliefs, and farmer risk management strategies. American Journal of574 Agricultural Economics 95, 384–389.575 Morris, T.F., Murrell, T.S., Beegle, D.B., Camberato, J.J., Ferguson, R.B., Grove, J., Ketterings, Q., Kyveryga, P.M., Laboski, C.A., McGrath, J.M.,576 et al., 2018. Strengths and limitations of nitrogen rate recommendations for corn and opportunities for improvement. Agronomy Journal 110, 1.577 Gautron et al.: Preprint submitted to Elsevier Page 24 of 39 https://gitlab.inria.fr/rgautron/gym_dssat_pdi https://gitlab.inria.fr/rgautron/gym_dssat_pdi https://gitlab.inria.fr/rgautron/gym_dssat_pdi Bandits for best crop management identification Naudin, K., Gozé, E., Balarabe, O., Giller, K.E., Scopel, E., 2010. Impact of no tillage and mulching practices on cotton production in North578 Cameroon: a multi-locational on-farm assessment. Soil and Tillage Research 108, 68–76.579 Perchet, V., Rigollet, P., Chassang, S., Snowberg, E., 2015. Batched bandit problems, in: Grünwald, P., Hazan, E., Kale, S. (Eds.), Proceedings of580 The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, JMLR.org. p. 1456. URL: http://proceedings.mlr.581 press/v40/Perchet15.html.582 Piha, M., 1993. Optimizing fertilizer use and practical rainfall capture in a semi-arid environment with variable rainfall. Experimental Agriculture583 29, 405–415.584 Richardson, C.W., Wright, D.A., 1984. WGEN: A model for generating daily weather variables. ARS (USA) .585 Ripoche, A., Crétenet, M., Corbeels, M., Affholder, F., Naudin, K., Sissoko, F., Douzet, J.M., Tittonell, P., 2015. Cotton as an entry point for soil586 fertility maintenance and food crop productivity in savannah agroecosystems–evidence from a long-term experiment in southern Mali. Field587 crops research 177, 37–48.588 Robbins, H., 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58, 527–535.589 Rockafellar, R.T., Uryasev, S., et al., 2000. Optimization of conditional value-at-risk. Journal of risk 2, 21–42.590 Soltani, A., Hoogenboom, G., 2003. A statistical comparison of the stochastic weather generators WGEN and simmeteo. Climate Research 24,591 215–230.592 Tack, J.B., Holt, M.T., 2016. The influence of weather extremes on the spatial correlation of corn yields. Climatic Change 134, 299–309.593 Tamkin, A., Keramati, R., Dann, C., Brunskill, E., 2020. Distributionally-aware exploration for cvar bandits, in: NeurIPS 2019 Workshop on Safety594 and Robustness in Decision Making; RLDM 2019.595 Ten Berge, H.F., Hijbeek, R., Van Loon, M., Rurinda, J., Tesfaye, K., Zingore, S., Craufurd, P., van Heerwaarden, J., Brentrup, F., Schröder, J.J.,596 et al., 2019. Maize crop nutrient input requirements for food security in sub-saharan africa. Global Food Security 23, 9–21.597 Thomas, P., Learned-Miller, E., 2019. Concentration inequalities for conditional value at risk, in: International Conference on Machine Learning,598 PMLR. pp. 6225–6233.599 Thompson, W.R., 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25,600 285–294.601 Tilman, D., Cassman, K.G., Matson, P.A., Naylor, R., Polasky, S., 2002. Agricultural sustainability and intensive production practices. Nature 418,602 671–677.603 Timsina, J., Dutta, S., Devkota, K.P., Chakraborty, S., Neupane, R.K., Bishta, S., Amgain, L.P., Singh, V.K., Islam, S., Majumdar, K., 2021.604 Improved nutrient management in cereals using nutrient expert and machine learning tools: Productivity, profitability and nutrient use efficiency.605 Agricultural Systems 192, 103181.606 Traore, B., Descheemaeker, K., Van Wijk, M.T., Corbeels, M., Supit, I., Giller, K.E., 2017. Modelling cereal crops to assess future climate risk for607 family food self-sufficiency in southern mali. Field Crops Research 201, 133–145.608 Vanlauwe, B., Kihara, J., Chivenge, P., Pypers, P., Coe, R., Six, J., 2011. Agronomic use efficiency of n fertilizer in maize-based systems in609 sub-saharan africa within the context of integrated soil fertility management. Plant and soil 339, 35–50.610 Yin, X., Kersebaum, K.C., Kollas, C., Baby, S., Beaudoin, N., Manevski, K., Palosuo, T., Nendel, C., Wu, L., Hoffmann, M., Hoffmann, H., Sharif,611 B., Armas-Herrera, C.M., Bindi, M., Charfeddine, M., Conradt, T., Constantin, J., Ewert, F., Ferrise, R., Gaiser, T., de Cortazar-Atauri, I.G.,612 Giglio, L., Hlavinka, P., Lana, M., Launay, M., Louarn, G., Manderscheid, R., Mary, B., Mirschel, W., Moriondo, M., Öztürk, I., Pacholski,613 A., Ripoche-Wachter, D., Rötter, R.P., Ruget, F., Trnka, M., Ventrella, D., Weigel, H.J., Olesen, J.E., 2017. Multi-model uncertainty analysis614 in predicting grain n for crop rotations in europe. European Journal of Agronomy 84, 152–165. URL: https://www.sciencedirect.com/615 Gautron et al.: Preprint submitted to Elsevier Page 25 of 39 http://proceedings.mlr.press/v40/Perchet15.html http://proceedings.mlr.press/v40/Perchet15.html http://proceedings.mlr.press/v40/Perchet15.html https://www.sciencedirect.com/science/article/pii/S1161030116302532 https://www.sciencedirect.com/science/article/pii/S1161030116302532 https://www.sciencedirect.com/science/article/pii/S1161030116302532 Bandits for best crop management identification science/article/pii/S1161030116302532, doi:https://doi.org/10.1016/j.eja.2016.12.009.616 Gautron et al.: Preprint submitted to Elsevier Page 26 of 39 https://www.sciencedirect.com/science/article/pii/S1161030116302532 https://www.sciencedirect.com/science/article/pii/S1161030116302532 http://dx.doi.org/https://doi.org/10.1016/j.eja.2016.12.009 Bandits for best crop management identification 617 Supplementary Materials618 A. Maize simulations619 The cultivation scenarios were based on the the conditions found in Southern Mali. The soils came from Adam620 et al. (2020) who compiled and supplemented with survey data the soils found in the literature for the location of621 Koutiala, Mali. The data of Adam et al. (2020) included soils’ depth, texture, water capacity, bulk density, organic622 matter content, pH and initial mineral nitrogen content. Soil characteristics and proportions in the population were623 summarized in Table 1, based on Adam et al. (2020). During the simulations, the weather times series were generated624 using the WGEN weather model (see Richardson and Wright, 1984; Soltani and Hoogenboom, 2003). WGEN had625 been parameterized on 47-year-long historical daily weather records from a weather station located in N’Tarla found626 in Ripoche et al. (2015), which was located about 20 km from Koutiala ; these historical weather records were the627 best available. The cultivars used in the simulation and its parameterization in DSSAT are presented in Table A.1 ;628 this cultivars comes with DSSAT default data and was representative of the cultivars used in Mali. The cultivars were629 already calibrated based on experiments carried out in Mali. The simulations were initiated on Day Of Year (DOY) 140630 and the planting is automatically performed in a window ranging from DOY 155 to 185 ; we specified the parameters631 of the automatic planting with Table A.2. For each soil, the initial soil nitrogen content was set according to the values632 found in Adam et al. (2020). The soil water content was set to crop lower limit, as a result of the end of the dry season633 at the usual planting dates. Because the simulations were initiated prior to planting date and because the weather was634 stochastically generated, the soil nitrogen mineral and water contents were uncertain at planting time. Each simulation635 was performed independently from the previous ones. At the beginning of the experiment, all the soils described in636 Table 1 were randomly distributed amongst the initial group of farmers following the proportions provided in Table 1.637 Figure A.1 shows the simulated yield distributions for ITML840104 and ITML840105 soils.638 B. Algorithms639 B.1. Details about BCB640 In algorithm B.1, we provide the detailed pseudo-code of BCB (BCB). As shown by Figure B.1, the higher the641 number of collected rewards, the less the weights sampled from Dirichlet distributions exhibit variance. This variance642 directly relates to the noise introduced in the computation of the score of the different available actions.643 Remark B.1 (First season). Algorithm B.1 is well defined for the first season as without data all CVaRs will be equal644 to the maximum observable result, making the algorithm choose each option arbitrarily at random. On average, each645 Gautron et al.: Preprint submitted to Elsevier Page 27 of 39 Bandits for best crop management identification Table A.1 Maize cultivar parametrization in DSSAT name ecotype P1 P2 P5 G2 G3 PHINT Sotubaka IB0001 300.0 0.520 930.0 500.0 6.00 38.90 Table A.2 Automatic planting parametrization in DSSAT. PFRST: Starting date of the planting window; PLAST: End date of the planting window; PH2OL: Lower limit on soil moisture for automatic planting; PH2OU: Upper limit on soil moisture for automatic planting; PH2OD: Depth to which average soil moisture is determined for automatic planting; PSTMX: Maximum temperature of planting; PSTMN: Minimum temperature of planting. PFRST (DOY) 155 PLAST (DOY) 185 PH2OL (%) 40 PH2OU (%) 100 PH2OD (cm) 30 PSTMX (◦C) 40 PSTMN (◦C) 10 option will be equally explored. Note that we could replace this step by an equi-proportional exploration step (similar646 to Explore-Then-Commit, see B.2) without changing the theoretical properties of our algorithm. Furthermore, the647 decision maker could also include any additional results collected before the experiment (if the practices has already648 been tested for some time) in the initialization of the algorithm.649 B.2. Explore-Then-Commit (ETC)650 We provide the pseudo-code of the Explore-Then-Commit (ETC) strategy with algorithm B.2. The noise in-651 troduced by random weights and the presence of the maximum observable results in the histories manage the652 exploration/exploitation dilemma. BCB will favor fertilizer practices with higher CVaR compared to the others. But,653 the algorithm will still prevent the under-exploration of fertilizer practices by choosing them with a proper probability,654 even if e.g. poor YE have been observed due to rare unfavorable weather events. Indeed, with the extra randomness655 introduced by the random weighting of rewards, poor rewards may be re-weighted by smaller weights compared to656 higher rewards, yielding a good score. The amount of noise introduced by the random weights sampled from the657 Dirichlet distribution is related to variance of these random weights. The greater the number of rewards, the lesser the658 variance and consequently the lesser the noise (Figure B.1). Thereby, the more a fertilizer practice was tried by the659 algorithm, the closer its score gets to the true CVaR of rewards. The presence of the maximum observable YE acts as660 an “optimistic bonus" in the computation of the scores, encouraging exploration even for sub-optimal practices, as it661 raises up their initial values when few rewards have been observed.662 Gautron et al.: Preprint submitted to Elsevier Page 28 of 39 Bandits for best crop management identification 0 2000 4000 6000 8000 dry grain yield (kg/ha) 0 1 2 3 4 5 6 7 8 9 (a) Yield distributions for soil ITML840104. Stars represent the CVaR at level 30%. 0 2000 4000 6000 8000 dry grain yield (kg/ha) 0 1 2 3 4 5 6 7 8 9 (b) Yield distributions for soil ITML840105. Stars represent the CVaR at level 30%. 0 10 20 30 40 50 60 70 nitrogen use efficency (kg/kg) 0 1 2 3 4 5 6 7 8 9 (c) Agronomic Nitrogen Efficiency (ANE) distributions for soil ITML840104. Stars represent the mean value. 0 10 20 30 40 50 60 70 nitrogen use efficency (kg/kg) 0 1 2 3 4 5 6 7 8 9 (d) Agronomic Nitrogen Efficiency (ANE) distributions for soil ITML840105. Stars represent the mean value. 1000 0 1000 2000 3000 4000 yield excess (kg/ha) 0 1 2 3 4 5 6 7 8 9 (e) Yield Excess (YE) distributions for soil ITML840104 with ANEref=15 kg grain/kg N. Stars represent the CVaR at level 30%. 1000 0 1000 2000 3000 4000 yield excess (kg/ha) 0 1 2 3 4 5 6 7 8 9 (f) Yield Excess (YE) distributions for soil ITML840105 with ANEref=15 kg grain/kg N. Stars represent the CVaR at level 30%. Figure A.1: Simulated impact of maize fertilizer practices on grain yield, Agronomic Nitrogen use Efficiency (ANE), Yield Excess (YE) for 105 hypothetical years using a weather generator. Maize cultivar was the same for all simulations. Practices indexes are indicated on the left-hand side of each sub-figure. C. Theoretical Analysis663 This section is devoted to the theoretical analysis of the BCB algorithm. We will mostly adapt the analysis of Baudry664 et al. (2021a), and show that the problem of learning with batched data of finite upper bounded size is no harder than665 the pure online learning problem considered in the original paper.666 Gautron et al.: Preprint submitted to Elsevier Page 29 of 39 Bandits for best crop management identification Algorithm B.1 BCB: identification strategy at cohort level (detailed) Input: Level 𝛼, horizon 𝑇 , 𝐾 options, upper bounds 𝐵1,… , 𝐵𝐾 ,  𝑐 the set of all farmers in the cohort Init.: ∀𝑘 ∈ {1, ..., 𝐾}: 𝑘 = {𝐵𝑘}, 𝑁𝑘 = 0 ;  𝑐 1 = {𝑓1,⋯ , 𝑓𝑛1} ; 𝑡 = 1 ; 1 = {∅} // Beginning of first season for 𝑓 ∈  𝑐 1 do Randomly assign a crop management option 𝑎 ∈ {1,… , 𝐾} to the farmer 𝑓 1 = 1 ∪ {𝑎} end // End of first season for (𝑎, 𝑓 ) ∈ (1, 𝑐 1 ) do Receive the result of the option 𝑎 from farmer 𝑓 : 𝑟𝑓,𝑎Update 𝑎 = 𝑎 ∪ {𝑟𝑓,𝑎}, 𝑁𝑎 = 𝑁𝑎 + 1 end for 𝑡 ∈ {2,… , 𝑇 } do // Beginning of season 𝑡 Get  𝑐 𝑡 = {𝑓1,⋯ , 𝑓𝑛𝑡} ; // the set of farmers of the same cohort to provide recommendations for 𝑘 ∈ {1,… , 𝐾} do Update the empirical CVaR of action 𝑘: 𝑐𝑘,𝑡−1 = 𝐶𝛼(𝑘) end for 𝑓 ∈  𝑐 𝑡 do Update the empirical regret of farmer 𝑓 : 𝑙𝑓,𝑡−1 = 𝑅𝛼 𝑓 (𝑡 − 1) end 𝑡 = {∅} ; // the set of recommendations to provide to the farmers for 𝑓 ∈  𝑐 𝑡 do for 𝑘 ∈ {1,… , 𝐾} do Draw 𝜔𝑘 = {𝑤1,⋯ , 𝑤𝑁𝑘 } ∼ 𝑁𝑘 ; // Dirichlet of concentration parameter (1,⋯ , 1) ⏟⏞⏟⏞⏟ 𝑁𝑘 times Search 𝑗 the maximum index such that ∑𝑗 𝑖=1𝑤𝑖 ≤ 𝛼 Sort 𝑘 in increasing order Compute 𝑐𝑘 = 𝑥𝑗 − 1 𝛼 ∑𝑁𝑘 𝑖=1𝑤𝑖max(𝑥𝑗 − 𝑥𝑖, 0) ; // assign a score to action 𝑘 end 𝑎 = argmax𝑘∈{1,…,𝐾}𝑐𝑘 𝑡 = 𝑡 ∪ {𝑎} end for (𝑎, 𝑓 ) ∈ (𝑡, 𝑐 𝑡 ) do Assign action 𝑎 to farmer 𝑓 end // End of season 𝑡 for (𝑎, 𝑓 ) ∈ (𝑡, 𝑐 𝑡 ) do Receive result of action a from farmer 𝑓 : 𝑟𝑓,𝑎Update 𝑎 = 𝑎 ∪ {𝑟𝑓,𝑎}, 𝑁𝑎 = 𝑁𝑎 + 1 end end Theorem C.1 (𝛼-CVaR Regret of BCB). Consider a bandit problem (𝐹1,… , 𝐹𝐾 ) ∈ 𝐾 , with respective CVaR𝛼667 denoted by (𝑐1,… , 𝑐𝐾 ) with 𝑐1 = argmax𝑘=1,…,𝐾𝑐𝑘. Assume that BCB runs for 𝑇 seasons, and that at each season the668 size of the batch is 𝑛𝑇 ≤ 𝐹 ∈ ℕ. Then, for any 𝜀 > 0 small enough there exists some 𝜀1 > 0, 𝜀2 > 0 such that the669 Gautron et al.: Preprint submitted to Elsevier Page 30 of 39 Bandits for best crop management identification 0.00 0.05 0.10 0.15 0.20 0.25 weights 100 10re wa rd n um be r Random weights from Dirichlet distributions Figure B.1: Examples of weights sampled from Dirichlet distributions during BCB execution, respectively for 10 and 100 rewards. The greater the number of rewards, the less variance the weights show. The variance of weights is related to the noise level in the computation of the empirical CVaR of BCB. regret of BCB satisfies :670 𝛼 𝑇 ≤ 𝐾 ∑ 𝑘=2 Δ𝛼 𝑘 ( 𝑚𝑘 𝑇 + 𝐹 + 2𝐹 𝑒−2𝑚 𝑘 𝑇 𝜀21 1 − 𝑒−2 𝜀 1 1 + 𝐶𝛼 1,𝜀2 ) ,671 where 𝑚𝑘 𝑇 = log(𝑇 )+log(𝐹 ) 𝛼, inf (𝐹𝑘,𝑐1)−𝜀 and 𝐶1,𝜀2 is a constant depending only on the distribution 𝐹1, the family  and 𝜀2.672 It is interesting to compare this regret upper bound to the one obtained in the purely sequential setting, that we673 recall in Theorem C.2.674 Theorem C.2 (𝛼-CVaR Regret of B-CVTS with time horizon 𝑆𝑇 (adapted from Theorem 3 in Baudry et al. (2021a))).675 Consider a bandit problem (𝐹1,… , 𝐹𝐾 ) ∈ 𝐾 , with respective CVaR𝛼 denoted by (𝑐1,… , 𝑐𝐾 ) with 𝑐1 = argmax𝐾𝑐𝑘.676 Consider a number of data collected 𝑆𝑇 . Then, for any 𝜀 > 0 small enough there exists some 𝜀1 > 0, 𝜀2 > 0 such that677 the CVaR-regret of B-CVTS satisfies678 𝛼 𝑇 ≤ 𝐾 ∑ 𝑘=2 Δ𝛼 𝑘 ( 𝑛𝑘𝑆𝑇 + 2 𝑒−2𝑛 𝑘 𝑆𝑇 𝜀21 1 − 𝑒−2 𝜀 1 1 + 𝐶𝛼 1,𝜀2 ) ,679 where 𝑚𝑘 𝑆𝑇 = log(𝑆𝑇 ) 𝛼, inf (𝐹𝑘,𝑐1)−𝜀 and 𝐶1,𝜀2 is a constant depending only on the distribution 𝐹1, the family  and 𝜀2.680 First, we see that if 𝐹 is indeed a constant (i.e do not depend on the time) then when 𝑇 is large enough then 𝐹681 has not impact on the scaling of the regret. In our proof the main impact of the batch setting is an additive term 𝐹 for682 each arm, hence the regret becomes close to the one of the sequential setting once 𝑚𝑘 𝑇 ≫ 𝐹 . Finally, if the number683 of farmers in each batch is exactly 𝐹 at each step then 𝑆𝑇 = 𝐹𝑇 and, 𝑚𝑘 𝑇 = 𝑛𝐾𝑆𝑇 , hence the asymptotically dominant684 (logarithmic) term is the same in the two settings.685 These theoretical results show that learning with batch feedback does not introduce theoretical limitations in our686 setting, and so the BCB algorithm is theoretically grounded.687 Gautron et al.: Preprint submitted to Elsevier Page 31 of 39 Bandits for best crop management identification Algorithm B.2 ETC: identification strategy at cohort level Input: Level 𝛼, horizon 𝑇 , 𝐾 options,  𝑐 the set of all farmers in the cohort, 𝑡trials the number of years of trials Init.: ∀𝑘 ∈ {1,⋯ , 𝐾} ∶ 𝑁𝑘 = 0 // Do trials during 𝑡trials years for 𝑡 ∈ {1,⋯ , 𝑡trials} do // Beginning of the season 𝑡 Get  𝑐 𝑡 = {𝑓1,⋯ , 𝑓𝑛𝑡} ; // get the farmers willing to participate 𝑡 = {∅} Fill 𝑡 by uniformly distributing the 𝐾 options to the farmers in  𝑐 𝑡 // End of the season 𝑡 for (𝑎, 𝑓 ) ∈ (𝑡, 𝑐 𝑡 ) do Receive the result of the option 𝑎 from farmer 𝑓 : 𝑟𝑓,𝑎Update 𝑎 = 𝑎 ∪ {𝑟𝑓,𝑎}, 𝑁𝑎 = 𝑁𝑎 + 1 end end for 𝑘 ∈ {1,… , 𝐾} do Compute the empirical CVaR of action 𝑘: 𝑐𝑘,𝑡−1 = 𝐶𝛼(𝑘) end 𝑎max = argmax𝑘∈{1,…,𝐾}𝑐𝑘 ; // get the action that best performed during trials // After trial phase, always recommend the action that best performed during trials for 𝑡 ∈ {𝑡trials + 1,⋯ , 𝑇 } do // Beginning of the season 𝑡 Get  𝑐 𝑡 = {𝑓1,⋯ , 𝑓𝑛𝑡} for 𝑓 ∈  𝑐 𝑡 do Assign option 𝑎max to the farmer 𝑓 end // End of the season 𝑡 for 𝑓 ∈  𝑐 𝑡 do Receive the result of the option 𝑎max from farmer 𝑓 : 𝑟𝑓,𝑎maxUpdate 𝑎max = 𝑎max ∪ {𝑟𝑓,𝑎max }, 𝑁𝑎max = 𝑁𝑎max + 1 end end Proof of Theorem C.1. As in the proof of Baudry et al. (2021a) we will decompose the expected number of pulls of688 each sub-optimal arm inside the cohort according to several possible events, corresponding to "good" scenarios (the689 empirical distributions accurately reflect the true distributions) and "bad" ones (the empirical distributions give a wrong690 idea of the true performance of some arms) for the trajectory of the bandit algorithms. We denote by 𝑇 the number of691 seasons in the experiments and 𝑛𝑡 the number of farmers at each season 𝑡 for this cohort, and by 𝐹 the total number692 of farmers available for the experiment. Then, the expected number of pulls of arm 𝑘 during the total duration of the693 experiment inside the cohort is694 𝔼[𝑁𝑘(𝑇 )] = 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘) ] ,695 where 𝐴𝑡,𝑓 denotes the recommendation to farmer 𝑓 at season 𝑡.696 Gautron et al.: Preprint submitted to Elsevier Page 32 of 39 Bandits for best crop management identification The first step of the proof consists in considering the number of pulls of 𝑘 when its sample size is larger (resp.697 smaller) than some fixed threshold 𝑚𝑇 , that we will specify later.698 𝔼[𝑁𝑘(𝑇 )] = 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘) ] 699 ≤ 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 ) ] + 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑇 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 ) ] 700 701 We now consider the first term and introduce the random variable 𝜏 = {sup𝑡≤𝑇 ∶ 𝑁𝑘(𝑡−1) ≤ 𝑚𝑇 }. By construction,702 𝜏 is the last season for which the total number of observations for arm 𝑘 is smaller than 𝑚𝑇 . Using the basic properties703 of 𝜏 we obtain that:704 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 ) ≤ 𝜏 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 ) + 𝑇 ∑ 𝑡=𝜏+1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 )705 ≤ 𝑁𝑘(𝜏) + 𝑛𝜏+1 ∑ 𝑓=1 1(𝐴𝜏,𝑓 = 𝑘)706 ≤ 𝑚𝑇 + 𝐹707 708 As this result does not depend on the value of 𝜏, we can then obtain:709 𝔼[𝑁𝑘(𝑇 )] ≤ 𝑚𝑇 + 𝐹 + 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 ) ] ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝐴 .710 At this step, the only difference with the purely sequential bandit problem is the additional 𝐹 . We now consider the711 term 𝐴, that we further analyze according to three events: (1) the empirical distribution of arm 𝑘 is not close to its true712 distribution, (2) the empirical distribution of arm 𝑘 is close to its true distribution but the "noisy" CVaR computed for713 arm 𝑘 over-estimates its true CVaR, and (3) the "noisy" CVaR computed for the optimal arm 1 under-estimates its true714 CVaR. Classically in bandit analysis, we decompose the number of pulls of arm 𝑘 according to these three events, as715 at least one of them must be true when 𝐴𝑡,𝑓 = 𝑘 holds, that is716 {𝐴𝑡 = 𝑘} ⊂ {𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘)} ∪ {𝐹𝑘,𝑡−1 ∈ 𝜀1 (𝐹𝑘), 𝑐𝑘,𝑡,𝑓 ≥ 𝑐1 − 𝜀2} ∪ {𝑐1,𝑡,𝑓 ≤ 𝑐1 − 𝜀2} ,717 Gautron et al.: Preprint submitted to Elsevier Page 33 of 39 Bandits for best crop management identification where 𝜀1 (𝐹𝑘) is an 𝜀1-Levy ball around 𝐹𝑘, and 𝜀1, 𝜀2 are two small positive constants. This leads to718 𝐴 ≤ 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘)) ] ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝐴1 719 + 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∈ 𝜀1 (𝐹𝑘), 𝑐𝑘,𝑡,𝑓 ≥ 𝑐1 − 𝜀2) ] ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝐴2 720 + 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝑐1,𝑡,𝑓 ≤ 𝑐1 − 𝜀2) ] ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝐴3 .721 Upper bounding 𝐴2 Denoting by 𝐹𝑘,𝑛 the empirical distribution of arm 𝑘 after a total number of pulls 𝑛 (instead of722 after season 𝑡), we obtain723 𝐴1 ∶= 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘)) ] 724 ≤ 𝔼 [ 𝑇 ∑ 𝑡=1 1(𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘)) 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘) ] 725 ≤ 𝔼 [ 𝑇 ∑ 𝑡=1 𝑇 ∑ 𝑛=𝑚𝑇 1(𝑁𝑘(𝑡 − 1) = 𝑛, 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘)) 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘) ] ,726 727 with a union bound on the number of pulls. Under 𝑁𝑘(𝑡 − 1) = 𝑛 it holds that 𝐹𝑘,𝑡−1 = 𝐹𝑘,𝑛, and so we can further728 write that729 𝐴1 ≤ 𝔼 [ 𝑇 ∑ 𝑡=1 𝑇 ∑ 𝑛=𝑚𝑇 1(𝑁𝑘(𝑡 − 1) = 𝑛, 𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘)) 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘) ] 730 ≤ 𝔼 [ 𝑇 ∑ 𝑛=𝑚𝑇 1(𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘)) 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) = 𝑛) ] 731 ≤ 𝐹𝔼 [ 𝑇 ∑ 𝑛=𝑚𝑇 1(𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘)) ] 732 Gautron et al.: Preprint submitted to Elsevier Page 34 of 39 Bandits for best crop management identification = 𝐹 +∞ ∑ 𝑛=𝑚𝑇 ℙ(𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘))733 734 Finally, using the Dvoretzky–Kiefer–Wolfowitz inequality (Massart, 1990) we obtain:735 ≤ 𝐹 +∞ ∑ 𝑛=𝑚𝑇 2𝑒−2𝑛 𝜀 2 1736 ≤ 2𝐹𝑒−2𝑚𝑇 𝜀21 1 − 𝑒−2 𝜀 2 1 .737 738 This upper bound holds for any choice of 𝑚𝑇 , 𝜀1, and we remark that if 𝑚𝑇 → +∞ then 𝐴1 → 0.739 Upper bounding𝐴2 The term𝐴2 is then handled with similar tricks, and the arguments used in Baudry et al. (2021a).740 𝐴2 ∶= 𝔼 [ 𝑇 ∑ 𝑡=1 𝑛𝑡 ∑ 𝑓=1 1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∈ 𝜀1 (𝐹𝑘), 𝑐𝑘,𝑡,𝑓 ≥ 𝑐1 − 𝜀2) ] 741 ≤ 𝔼 [ 𝑇 ∑ 𝑡=1 𝐹 ∑ 𝑓=1 1(𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∈ 𝜀1 (