Highlights1

A new adaptive identification strategy of best crop management with farmers2

Romain Gautron,Dorian Baudry,Myriam Adam,Gatien N. Falconnier,Gerrit Hoogenboom,Brian King,Marc Corbeels3

• We introduce a novel adaptive identification strategy of best crop management practices with farmers.4

• Minimizing yield losses in field trials is an exploration-exploitation dilemma.5

• Risk-aware bandit algorithms model farmers’ risk aversion in decision making.6

• A bandit algorithm identifies best nitrogen fertilizer practices for maize in simulated conditions.7

• Our bandit algorithm outperforms Explore-Then-Commit (ETC) methods in a simulated experiment.8


A new adaptive identification strategy of best crop management9

with farmers10

Romain Gautrona,b,c, Dorian Baudryd, Myriam Adame,f,g, Gatien N. Falconniera,b,h,11

Gerrit Hoogenboomi, Brian Kingc and Marc Corbeelsj
12

aAIDA, Université de Montpellier, Montpellier, France13

bCIRAD, Montpellier, France14

cCGIAR Platform for Big Data in Agriculture, Alliance of Bioversity International and CIAT, Km 17, Recta Cali-Palmira, 763537, Colombia15

dCNRS, Université de Lille, INRIA, Villeneuve d’Ascq, France16

eCIRAD, UMR AGAP Institut, Bobo-Dioulasso 01, Burkina Faso17

fUMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France18

gInstitut National de l’Environnement et de Recherches Agricoles (INERA), Burkina Faso19

hInternational Maize and Wheat Improvement Centre (CIMMYT)-Zimbabwe, 12.5 km Peg Mazowe Road, Harare, Zimbabwe20

iAgricultural and Biological Engineering, 289 Frazier Rogers Hall University of Florida Gainesville, Florida, 32611-0570, USA21

jInternational Institute of Tropical Agriculture, PO Box 30772, Nairobi, 00100, Kenya22

23

A R T I C L E I N F O

Keywords:
Crop management
Experimental Design
Bandit Algorithm
Artificial Intelligence
Risk
Uncertainty

24 A B S T R A C T25

26

Identification of best performing fertilizer practices with on-farm trials is challenging, in27

particular in rainfed farming due to weather uncertainty. However, it remains crucial to test28

a range of viable practices to ascertain their performances, given that they are not known29

beforehand. This process also involves the testing of practices that could potentially yield inferior30

results in comparison to the best available practice(s). To identify a best management practice, an31

“intuitive strategy” typically sets up multi-year, multi-location field trials, wherein each practice32

is tested in a proportionally equal manner over a set number of years. Our objective was to33

provide an identification strategy for nitrogen fertilizer management designing a bandit learning34

algorithm. We aimed for the bandit algorithm to be better at minimizing farmers’ losses occurring35

from the testing of management practices that do not perform best, compared with the “intuitive36

strategy” that was formulized as the Explore-Then-Commit strategy. Our case study was for37

maize production in southern Mali. Bandit framework is a machine learning approach in which38

an agent learns from the feedback over time and accordingly selects actions in order to maximize39

its cumulative reward in the long term. To mimic the maize responses to nitrogen fertilization,40

we used the Decision Support System for Agrotechnology Transfer (DSSAT) crop model. We41

compared nitrogen fertilizer practices using a risk-aware measure, the Conditional Value-at-42

Risk (CVaR), and a novel agronomic metric, the Yield Excess (YE). The YE accounts for both43

grain yield and agronomic nitrogen use efficiency. The bandit algorithm performed better than44

the intuitive strategy: it minimized farmers’ yield losses during the identification process. This45

study is a methodological step which opens up new horizons for risk-aware identification of the46

performance of a range of crop management practices in real conditions.47

48

Gautron et al.: Preprint submitted to Elsevier Page 1 of 39


Bandits for best crop management identification

1. Introduction49

Identifying site-specific best-performing crop management is crucial for farmers to increase their income from crop50

production, but also for minimizing the negative environmental impacts of cropping activities (Tilman et al., 2002).51

However, due to weather variability, the identification of these practices can be challenging, in particular with rainfed52

farming: what worked best in a wet year or a year with sufficient rainfall, might not work in the next year, when rainfall is53

lower (Affholder, 1995). The performance of crop management at a given site has an underlying “hidden” distribution54

due to inter-annual weather variability, thus creating great uncertainty (e.g., Fosu-Mensah et al., 2012). Because crop55

management decisions are recurrent, i.e. they are repeated for each new crop growing season, the identification of best56

available crop management falls into the category of sequential decision making under uncertainty (Gautron et al.,57

2022). Computer-based decision support tools can allow farmers to make more informed (less uncertain) decisions58

about their cropping practices from one year to the next, and can facilitate farmers’ risk management in the face of59

seasonal weather variability (Hochman and Carberry, 2011). There exist numerous decision support tools of widely60

ranging complexity for crop management, that have been introduced to farmers with varying degrees of success61

(Gautron et al., 2022).62

Machine learning (ML) and more generally artificial intelligence (AI) can help addressing sequential decision63

making under uncertainty. Specifically, (multi-armed) bandit algorithms are a type of learning algorithm that are64

designed to perform in uncertain environments. The name comes from imagining a gambler at a row of slot machines,65

who has to repeatedly decide which machine to play for a given total number of trials. Thus, bandit problems (Lattimore66

and Szepesvári, 2020) consider a decision-maker, called agent, that repeatedly faces a choice between contending67

actions, and that has to iteratively improve its decision-making with trials in order to get the highest reward in68

expectation. The canonical bandit problem originates from clinical trials with sequential drug allocation (Thompson,69

1933). At each time step, the agent chooses one action (e.g., administering a particular drug for a patient) amongst a70

set of possible actions (e.g., administering other types of drugs). Each action provides a reward (e.g., a certain level71

of tumor cell reduction after taking a particular drug), drawn from a corresponding unknown reward distribution (e.g.,72

the distribution of tumor cell reduction because of the drug). The best available action has the reward distribution73

with the highest mean reward (e.g., the highest mean level of tumor cell reduction). The objective of the agent is to74

sequentially choose actions such that the expected sum of rewards is maximized. Maximizing the total expected reward75

is equivalent to minimizing the regret, which is a measure of the total losses that occur with actions that do not perform76

best (Robbins, 1952).77

∗Corresponding author
r.gautron@cgiar.org (R. Gautron); dorian.baudry@ensae.fr (D. Baudry); myriam.adam@cirad.fr (M. Adam);

gatien.falconnier@cirad.fr (G.N. Falconnier); gerrit@ufl.edu (G. Hoogenboom); b.king@cgiar.org (B. King);
marc.corbeels@cirad.fr (M. Corbeels)

ORCID(s): 0000-0002-3218-7215 (R. Gautron)

Gautron et al.: Preprint submitted to Elsevier Page 2 of 39


Bandits for best crop management identification

In a sequential decision-making setting, the agent refines its next action based on all previous results in an iterative78

way (which can be thought as an iterative “active sampling”). To know how a given action performs, information about79

a sufficient number of (possibly poor) rewards is required by selecting actions that are not the best performers: this is the80

exploration phase. To maximize the expected sum of rewards, the previous actions that provided good results so far must81

be selected more frequently; this is the exploitation phase. Bandit algorithms aim at finding the right balance between82

exploration and exploitation. Through balancing exploration and exploitation, a bandit algorithm may come to describe83

the underlying reward distribution associated to each action with high confidence, allowing for the learner to exploit84

and attain the best expected cumulative reward possible. Maximizing the expected reward (by minimizing the regret)85

is a key objective in bandit problems, and the convergence and robustness of the algorithm are important factors in86

achieving this goal. To do so, bandit algorithms compute and optimize statistics of the reward distribution associated to87

each action (e.g., upper confidence bounds). Optimizing these statistics is mathematically proven to optimally manage88

the exploration-exploitation dilemma (Lattimore and Szepesvári, 2020). The exploration-exploitation dilemma is a89

reality for farmers when implementing crop management. Farmers typically want to minimize overall crop yield losses90

and therefore may explore the performance of promising new crop management practices on small test field plots91

(Cerf and Meynard, 2006; Evans et al., 2017). Thus, they avoid potentially large crop yield losses from new practices92

by managing a gradual transition between the current practices and the promising new one(s), based on the results they93

obtain on the small test plots.94

In our study, we consider a set of pre-defined crop management practices suitable to crop growing conditions that a95

group of farmers experience. Yet, the precise practice(s) that will best perform for the particular growing conditions is96

not known with certainty. The objective of our study is to identify the best crop management through a novel strategy97

which optimally addresses the exploration-exploitation dilemma. During the identification, we want the strategy to98

minimize farmers’ losses that result from crop management practices that do not best perform during field testing.99

Losses are defined as the differences in performance of a given crop management practice compared to the true best100

performing practice, which is unknown beforehand. These are for example crop yield losses. We set as baseline the101

“intuitive strategy” which consists in identifying the best crop management practice through multi-location multi-year102

field trials in which the whole set of pre-defined practices are tested in an equiproportional way during a fixed number103

of years. We compare this intuitive strategy to a novel identification strategy, based on a bandit algorithm. Thus, we104

test the hypothesis that a bandit algorithm can help farmers in better identifying the best crop management practice for105

their context from on-farm trials, by minimizing crop yield losses from crop management practices that do not perform106

best on these trials.107

Our study considers the case of nitrogen fertilization of rainfed maize production in southern Mali. We compare108

both identification strategies of best available nitrogen fertilization practices based on results from maize growth109

Gautron et al.: Preprint submitted to Elsevier Page 3 of 39


Bandits for best crop management identification

simulations with a calibrated crop model in order to mimic real-world performance of crop management. We emphasize110

that we do not strictly optimize the nitrogen management itself, but rather the sequential choice (i.e. identification111

strategy) over time between contrasting nitrogen management practices that were preselected. As an example of112

crop management, we focused on nitrogen fertilization, but our study has a broader objective, i.e. providing a novel113

method for identifying best crop management practices from any set of predetermined practices (e.g., varietal choice,114

irrigation). Also, the identification strategies do not depend on model simulations, and ultimately could be applied115

in real field conditions. We used the Decision Support System for Agrotechnology Transfer (DSSAT) crop simulator116

(Hoogenboom et al., 2019) as crop model.117

2. Methods118

2.1. The virtual crop management problem119

In our virtual crop management identification problem, a population or ensemble of virtual farmers joined a120

participatory experiment to identify the best nitrogen fertilizer practices for maize production in their fields, i.e. in121

the Cercle of Koutiala in southern Mali (Coulibaly et al., 2017). The distribution of soil types of the fields of the group122

of virtual farmers was representative of the region (Table 1). A total population of 500 virtual farmers was considered.123

Each virtual farmer belonged to a cohort that corresponded to farmers growing maize on the same soil type. For each124

cohort, we wanted to identify the best nitrogen fertilizer practice from a set of candidate practices (see Table 2 and125

Section 2.2 for the performance measures we considered). Additionally, the research team set the additional objective126

to limit the maize yield losses of individual virtual farmers that could arise from poor nitrogen fertilizer practice127

recommendations during the identification process.128

At the beginning of each crop growing season, we assumed that a random number of virtual farmers (uniformly129

obtained between 250 and 350) of the total population of 500 farmers volunteered to apply the recommended fertilizer130

applications provided by the research team. Each year, the group of volunteers was variable in size and in the131

representation of cohorts, as could occur in reality (Figure 1). Thus, researchers did not control the composition of132

the group of volunteers. Each virtual farmer indicated the fields and corresponding soil types on which she/he planned133

to grow maize. Following the identification strategies, researchers then provided a fertilizer recommendation (Table 3)134

to each virtual farmer for the ongoing growing season, depending on her/his soil type i.e. cohort. At the end of the135

season, the farmers shared their results in terms of maize grain yields with the research team, allowing to refine the136

recommendations for the next season. The whole experiment was repeated during 20 consecutive years following the137

same steps. Figure 2a illustrates this process, and corresponds to the steps described in Figure 1.138

Gautron et al.: Preprint submitted to Elsevier Page 4 of 39


Bandits for best crop management identification

cohort 1
(soil 1)

cohort 2
(soil 2)

identification
strategy for
cohort 1

identification
strategy for
cohort 2

season
volunteers

...

virtual farmer 
population

census of virtual
farmer volunteers

fertilizer practice
assignation yield reporting

Figure 1: Set-up of the numerical experiment with virtual farmers to identify best nitrogen fertilizer practices. Virtual

farmers (𝑛 = 500) were grouped by cohorts (𝑐 = 7, Table 1), sharing the same soil type. Each cohort defines an independent

identification problem of best nitrogen fertilizer practice. A cohort is represented by identical symbols (only four of the

seven cohorts are represented). At the start of the maize growing season, a random number of individuals (𝑛 = 250 to

350) from the overall farmer population volunteered to test a certain fertilizer practice from a set of pre-selected practices

(Table 2). The specific fertilizer practice to test was defined by the researcher for each cohort independently, according to

the identification strategy employed (see Section 2.1). At the end of the seasons, the virtual farmers reported the maize

yield from their fields. Maize yields were generated through DSSAT model simulations (see Section A in Supplementary

Materials). The process was repeated over 20 growing seasons.

Nitrogen fertilizer practices. Ten nitrogen fertilizer practices were considered as recommendations in the virtual139

experiment (see Table 2). Practices 0 to 7 represent the following set of split fertilizer practice for a total amount of140

135 kg N/ha applied:141

- Two split applications (Practice 0): 15 kg N/ha at 15 days after planting (DAP), and 120 kg N/ha at 30 DAP.142

- Three split applications (Practice 4): 15 kg N/ha at 15 DAP, 60 kg N/ha at 30 DAP and 60 kg N/ha at 45 DAP.143

- Split applications according to the rainfall amount (Practices 2, 3 and 6, 7): 2nd and 3rd top-dressing applications144

only if the cumulated rainfall amount from the start of the season to 30 DAP exceeds the 30th percentile of145

historical rainfall i.e. 200 mm.146

- Split applications according to plant nitrogen status (Practices 1, 3 and 5, 7): 2nd and 3rd top-dressing147

applications only if the simulated nitrogen stress factor (NSTRES in DSSAT, see below) exceeds 0.2 (0 standing148

for no stress, 1 for maximal stress) at 30 DAP, hereby mimicking the use of a portable chlorophyll meter to149

monitor plant nitrogen status (e.g. Kalaji et al., 2017).150

Gautron et al.: Preprint submitted to Elsevier Page 5 of 39


Bandits for best crop management identification

For T years:

1.b get
volunteer
farmers for
current year

1.a farmer
population

4 get all
volunteers’

yield
outcomes

2 researcher’
identification
strategies

3 assign the
specific fertilizer

practices to
the volunteers

beginning of the season

end of the
season

year← year + 1

(a) Best fertilizer practice identification process. At the start of
the season, a number of farmers (𝑛 = 250 to 350) volunteer
1.b to test fertilizer practices recommended by the researcher

following an identification strategy 2, 3 . At the end of each
season, the farmers share their yield outcomes with the experts
4 . The experts will use these results to improve their fertilizer

recommendations for the next growing season. The process is
repeated for a total number of 𝑇 = 20 years.

For T times:

1 choose
an action kt

from K actions

3 observe an
uncertain result
rt of action kt

2 make the
action kt

t← t+ 1

(b) Canonical bandit problem. For 𝑇 times, an agent sequen-
tially makes decisions on an action 𝑘𝑡 from the set {1,⋯ , 𝐾}

of possible actions 1 . After making the action 𝑘𝑡 2 , the agent
observes an uncertain result 𝑟𝑡 3 . This result is sampled from
a fixed distribution, unknown to the agent, which corresponds
to the effect of action 𝑘𝑡.

Figure 2: Schematic representation of the ensemble best fertilization identification process (a) and the canonical bandit

problem (b).

Split fertilizer applications were considered in order to adjust the amount of nitrogen applied to the likely crop demand151

as the season develops. This adjustment can rely on factors such as weather conditions and the crop performance (Piha,152

1993).153

Practice 8 corresponds to the recommended fertilizer application for maize (70 kg N/ha) in the study region, which154

was determined based on model simulations (Huet et al., 2022), i.e. the average of the nitrogen fertilizer rates that were155

expected to result in maximum positive return on fertilizer investment (Getnet et al., 2016). Practice 9 (180 kg N/ha)156

corresponds to a nitrogen fertilizer application that is likely excessive. In our model simulations (see below), the type157

of nitrogen fertilizer applied for all practices was set as ammonium nitrate broadcasted on the soil surface.158

Maize growth simulations. In order to get a proxy for real-world performances of the maize nitrogen fertilizer159

practices, we simulated maize growth responses to fertilization under the growing conditions of the Cercle of Koutiala160

in southern Mali using gym-DSSAT v0.0.7 developed from DSSAT v4.7 (Gautron and Padrón González, 2022).161

Gautron et al.: Preprint submitted to Elsevier Page 6 of 39


Bandits for best crop management identification

Table 1

Main properties of the soil types of the fields of farmers growing maize in Koutiala, Mali (Adam et al., 2020).

Soil name Texture SLDR SLOC SLDP AWCH pH Prop.

ITML840101 clay loam 0.60 0.20 110 115 5.7 7
ITML840102 loam 0.60 0.45 100 124 5.5 9
ITML840103 silty loam 0.60 0.27 160 98 6.5 21
ITML840104 silty clay loam 0.25 0.70 105 101 5.5 4
ITML840105 silty clay loam 0.40 0.38 120 108 5.8 24
ITML840106 loam 0.60 0.30 110 115 5.7 27
ITML840107 silty clay loam 0.25 0.60 105 101 5.5 8

‘SLDR’: soil drainage rate (fraction/day); ‘SLOC’: soil organic matter (g C/ 100 g soil) in the 0-30 cm topsoil; ‘SLDP’: soil

depth (cm); ‘AWCH’: soil available water-holding capacity (mm); ‘pH’ is the pH in water; ‘Prop’ stands for the percentage

of each soil type present in the study area.

Table 2

Maize nitrogen fertilizer practices for maize considered during the virtual experiment in Koutiala, Southern Mali. The

inclusion of rainfall and plant nitrogen stress as threshold factors in the fertilizer practice is denoted by “Yes” or “No”.

Index of
fertilizer
practice

Max. # of
fertilizer
applica-
tions

Rainfall
threshold

NSTRES
threshold

Application
at 15 DAP
(kgN/ha)

Application
at 30 DAP
(kgN/ha)

Application
at 45 DAP
(kgN/ha)

Max. total
amount
applied
(kgN/ha)

0 2 No No 15 120 0 135
1 2 No Yes 15 120 0 135
2 2 Yes No 15 120 0 135
3 2 Yes Yes 15 120 0 135
4 3 No No 15 60 60 135
5 3 No Yes 15 60 60 135
6 3 Yes No 15 60 60 135
7 3 Yes Yes 15 60 60 135
8 2 No No 23 0 47 70
9 3 No No 60 60 60 180

‘NSTRES’ stands for plant nitrogen stress and ‘DAP’ for days after planting.

gym-DSSAT is a modification of the DSSAT crop simulator (Hoogenboom et al., 2019) to allow a user to read daily162

internal DSSAT states and, accordingly, to be able to take fertilization decisions on a daily basis. Evidence of the163

reliability of DSSAT in simulating maize responses to different nitrogen fertilization practices under the conditions of164

southern Mali is provided by Falconnier et al. (2020); Huet et al. (2022). The soils (and associated model parameters)165

we used for simulations are the same as the ones used by Adam et al. (2020) who calibrated DSSAT for sorghum166

under different plant densities and nitrogen fertilizer practices in southern Mali. For each soil type (Table 1) that167

was parameterized in DSSAT (soil parameter files *.SOL), each simulated maize grain yield value is a sample of the168

yield response distribution for the considered fertilizer practice. This response distribution is the result of weather169

variability, generated in our study by the stochastic weather generator WGEN (Richardson and Wright, 1984; Soltani170

and Hoogenboom, 2003), which was parameterized using the 47-year weather records from the N’Tarla agricultural171

Gautron et al.: Preprint submitted to Elsevier Page 7 of 39


Bandits for best crop management identification

research station of the Institute of Rural Economics (12◦35’ N, 5◦42’ W, 302 m.a.s.l.), about 30 km from the city of172

Koutiala (Ripoche et al., 2015). The ‘Sotubaka’ maize cultivar (original name ‘Suwan 1 SR’), from the DSSAT default173

cultivar list) was used for all model simulations as a representative of the maize varieties grown in southern Mali. This174

cultivar was parameterized by the DSSAT team for the conditions of southern Mali (Jones et al., 1998). Planting date175

was defined by an automatic rule (see Table A.2 in Supplementary Materials) depending on soil water conditions. At176

the start of the simulations, the initial soil mineral nitrogen content was set to a fixed, depending on the soil type as177

in Adam et al. (2020). Still, the variability of the weather from the beginning of the simulation to the occurrence of178

the automatic planting (itself dynamic) induced a variable initial soil mineral nitrogen content at the planting date for179

each simulation. Water and nitrogen stresses were simulated but yield reduction through pests and diseases were not180

considered, neither was weed competition.181

In the model simulations, a different weather time series was generated for each growing season but also for182

each farmer and thus fertilizer recommendation within a growing season using WGEN, inducing sets of independent183

simulated maize yield responses to nitrogen fertilization. The variability introduced by weather randomness was indeed184

the main source of uncertainty. If identical weather data would have been applied to all farmers during a single growing185

season, it would have resulted in gathering redundant information for those implementing the same fertilizer strategies186

under identical soil conditions. The modeling approach we embraced can be seen as distant farms encountering distinct187

weather patterns within the same year. Section A of Supplementary Materials gives further details of the DSSAT188

simulation settings.189

We simulated 105 times the maize grain yield responses to a given fertilizer practice for the different soil types,190

which corresponds to 105 hypothetical growing seasons. These samples were used i) to ensure that simulated maize191

yield responses were in realistic ranges, ii) to evaluate the complexity of the decision problem, and iii) to determine192

best nitrogen fertilizer practices whilst analyzing the performance of the crop management identification strategies. The193

samples were not provided to the algorithms prior to their learning (i.e. there was no prior knowledge of the problem).194

2.2. Performance indicators of fertilizer practices195

An indicator to evaluate both the economic and environmental performance of a nitrogen fertilizer practice 𝜋 is196

Agronomic Nitrogen use Efficiency (ANE), as defined by Vanlauwe et al. (2011):197

ANE𝜋 ∶= Y𝜋 − Y0

N𝜋 (1)198

where Y𝜋 is the crop yield obtained with the nitrogen fertilizer practice 𝜋 with a quantity N𝜋 of nitrogen, and Y0 is199

the yield of the control obtained in the same conditions without nitrogen fertilization. Maximizing ANE is a proxy of200

minimizing the quantity of nitrogen losses, e.g. through nitrate leaching.201

Gautron et al.: Preprint submitted to Elsevier Page 8 of 39


Bandits for best crop management identification

However, there are certain limitations associated with using ANE as an indicator for optimizing fertilizer rates. For202

example, an ANE value of 25 kg grain/kg N can be achieved with a fertilizer input of 20 kg N/ha resulting in a total203

yield gain of 500 kg/ha, or with an input of 60 kg N/ha resulting in a total gain of 1500 kg/ha. For the same ANE, a204

farmer is likely to prefer the fertilizer practice that provides the greatest crop yield gain, i.e. with 60 kg N/ha. Similarly,205

selecting fertilizer practices only based on the associated crop yield gains is not satisfying. A similar yield gain can be206

achieved with different nitrogen fertilizer rates which result in different ANEs: the practice with the highest ANE must207

be preferred as it requires less nitrogen fertilizer to achieve the same yield gain.208

We introduced the Yield Excess (YE) indicator that favors the nitrogen fertilizer practice with the highest yield209

gain for those practices having the same ANE, and favors the practice with the highest ANE for those practices having210

the same yield gain. YE of a nitrogen fertilizer practice 𝜋 with respect to the reference practice 𝜋ref of fixed efficiency211

ANEref using the same quantity of nitrogen fertilizer as practice 𝜋, denoted N𝜋 , is computed as follows:212

YE𝜋 ∶= Y𝜋 − Y𝜋ref (2)213

= Y𝜋 − Y0
⏟⏞⏞⏟⏞⏞⏟

yield gain of 𝜋
w.r.t. control

−
( Y𝜋ref − Y0)

⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟
yield gain of 𝜋refw.r.t. control

(3)214

= Y𝜋 − Y0 − N𝜋 × ANEref (4)215

=
(Y𝜋 − Y0) ×

(

1 −
ANEref
ANE𝜋

)

⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟
penalization factor

(5)216

The YE of practice 𝜋 with respect to the reference practice 𝜋ref corresponds to the yield difference between the217

practice 𝜋 and a reference practice that has a fixed ANE equal to ANEref and which uses the same quantity N𝜋 of218

nitrogen fertilizer as 𝜋. YE𝜋 increases with ANE𝜋 (Figure 3). YE𝜋 is negative and decreases with Y𝜋 − Y0 when219

ANE𝜋 < ANEref and is positive and increases with Y𝜋 − Y0 when ANE𝜋 ≥ ANEref. The YE of fertilizer practices220

with efficiency below ANEref are negatively affected by this metric. We chose ANEref = 15 kg grain/kg N for our study,221

i.e. the average ANE currently achieved by farmers across sub-Saharan Africa (Ten Berge et al., 2019; Vanlauwe et al.,222

2011).223

Because farmers are usually risk averse (e.g. Cerf and Sebillotte, 1997; Menapace et al., 2013; Jourdain et al., 2020),224

they are likely to prefer a stable maize grain yield of, for example, 3000 kg/ha rather than a yield of 5000 kg/ha in half of225

the years, and of 1000 kg/ha in the other half of the years, while both distributions have the same expectation. To account226

for risk aversion, we computed the Conditional-Value-at-Risk (CVaR, Mandelbrot, 1997; Acerbi and Tasche, 2002),227

a risk-aware measure that originated from the finance sector. Two definitions of the CVaR coexist in the literature,228

Gautron et al.: Preprint submitted to Elsevier Page 9 of 39


Bandits for best crop management identification

Y π− Y 0
(kg/ha)

0
1000

2000
3000

4000
5000 AN

E
π (k

g
gr

ai
n/k

g
N

)

10
20

30
40

50
60

70

Y
E
π

(k
g/

h
a)

−10000

−8000

−6000

−4000

−2000

0

2000

4000

ANEref = 15 kg grain/ kg N

Y π− Y 0
(kg/ha)

0
1000

2000
3000

4000
5000 AN

E
π (k

g
gr

ai
n/k

g
N

)

10
20

30
40

50
60

70

Y
E
π

(k
g/

h
a)

−10000

−8000

−6000

−4000

−2000

0

2000

4000

ANEref = 30 kg grain/ kg N

−10000

−8000

−6000

−4000

−2000

0

2000

4000

Figure 3: Yield Excess (YE𝜋 , Equation 5) for ANEref = 15 kg grain /kg N (left) and ANEref = 30 kg grain /kg N (right)

as a function of ANE𝜋 and Y𝜋 − Y0. ANE𝜋 is the Agronomic Nitrogen use Efficiency of the nitrogen fertilizer practice 𝜋

(Equation 1). Y𝜋 is the maize grain yield obtained with nitrogen fertilizer practice 𝜋, and Y0 is the yield obtained with no

nitrogen fertilization (control).

depending if an outcome is considered as a gain or a cost (Dowd, 2007). We adopted the gain point of view, in which229

the CVaR puts emphasize on the lower tail of a distribution. For a (continuous) random variable X with cumulative230

distribution function 𝐹𝑋 , we call Value-at-Risk (VaR) of level 𝛼 the quantile of probability 𝛼 ∈ (0, 1] of X, defined as:231

VaR𝛼(𝑋) ∶= inf {𝑥 ∈ ℝ ∶ 𝐹𝑋(𝑥) > 𝛼} (6)232

Then the CVaR of X of level 𝛼 ∈ (0, 1] is the mean value of the left tail of X of probability 𝛼, defined as:233

CVaR𝛼(𝑋) ∶= 𝔼[𝑋|𝑋 ≤ VaR𝛼(𝑋)] (7)234

A farmer is likely to choose the practice with the highest CVaR for the considered level 𝛼. The more 𝛼 → 0+, the235

more the measure puts emphasize on the worst observable yields. On the contrary, the more 𝛼 → 1, the less risk averse236

is the CVaR. When 𝛼 = 1, the CVaR equals the usual expectation 𝔼 [𝑋], which is risk neutral (Figure 4). In our study,237

we chose 𝛼 = 30%. The CVaR30% represents the mean crop yield of the 30% lowest observable yields.238

2.3. Identification of the best fertilizer practices239

2.3.1. A special type of bandit problem240

The identification of the best crop management with the constraint of minimizing farmers’ crop yield losses241

occurring during the process (Section 2.1) can be modeled as a special type of bandit problems. The canonical bandit242

Gautron et al.: Preprint submitted to Elsevier Page 10 of 39


Bandits for best crop management identification

α

C
V
aR

α

V
aR

α µ

yield

density

(a) High risk aversion (𝛼 ≈ 20%)

α

C
V
aR

α

V
aR

αµ

yield

density

(b) Low risk aversion (𝛼 ≈ 80%)
Figure 4: Conditional Value-at-Risk (CVaR) of level 𝛼 in the case of high (a) and low(b) risk aversion. CVaR is the mean

value of the blue area of the distribution of probability 0 < 𝛼 ≤ 1 . VaR𝛼 stands for Value-at-Risk of level 𝛼 and is the

quantile of probability 𝛼 of the distribution. The more 𝛼 → 1, the more risk neutral is the CVaR. 𝜇 represents the mean

value of the distribution which is equivalent to the CVaR of level 𝛼 = 100%.

problem assumes that at each time step, a single trial is made and is followed by a single observation of a result,243

in a purely sequential mode. In contrast, the batched bandit setting (Perchet et al., 2015) assumes that at each time244

step an ensemble of trials are conducted in parallel, followed by the observation of an ensemble of results. Figure 2245

illustrates on the one hand the ensemble identification process of best crop fertilizer practices (Figure 2a), modeled as246

a batched-bandit problem, and on the other hand the canonical bandit problem (Figure 2b). In bandits problems that247

are risk-aware (Cassel et al., 2018), the agent maximizes a risk-aware measure of the collected rewards, such as the248

CVaR (Section 2.2), instead of the expectation of rewards. Our ensemble fertilizer decision problem can be described249

as a risk-aware batched-bandit decision problem.250

Formally, in our virtual experiment, for 𝑡 ∈ {1, 2,⋯ , 𝑇 }, in each season 𝑡, researchers assigned a number 𝑛𝑡251

of volunteer farmers for season 𝑡 with a nitrogen fertilizer practice 𝜋 ∈ {1, 2,⋯ , 𝐾}. Each farmer belonged to a252

cohort 𝑐 ∈ {1, 2,⋯ , 𝐶}. At the end of season 𝑡, researchers assemble rewards 𝑌𝑡 = {𝑦1𝑡 ,… , 𝑦𝑛𝑡𝑡 } as a result of253

the fertilizer practices of all farmers for season 𝑡. For each cohort 𝑐 ∈ {1,⋯ , 𝐶}, rewards are independently and254

identically distributed from unknown stationary distributions {𝜈𝑐1,⋯ , 𝜈𝑐𝐾}. These reward distributions are the YE with255

ANEref = 15 kg grain/kg N associated to each of the 10 recommended nitrogen fertilizer practices, for a given soil256

type. We denote 𝑇 =
⋃𝑇

𝑡=1 𝑌𝑡 the set of all rewards obtained by all farmers between 𝑡 = 1 and 𝑡 = 𝑇 . The objective257

of an identification strategy is to maximize, for a given CVaR level 𝛼 and any time horizon 𝑇 ≥ 1:258

𝔼[CVaR𝛼(𝑇 )] (8)259

Gautron et al.: Preprint submitted to Elsevier Page 11 of 39


Bandits for best crop management identification

For each cohort 𝑐 ∈ {1,⋯ , 𝐶}, the best performing nitrogen fertilizer practice 𝜋𝑐
∗ is given by:260

𝜋𝑐
∗ = argmax

𝑘
CVaR𝛼(𝜈𝑐𝑘) (9)261

Consequently, an optimal identification strategy always assigns nitrogen fertilizer practice 𝜋𝑐
∗ to all farmers belonging262

to cohort 𝑐.263

2.3.2. Identification strategies264

We expected fertilizer practices to perform differently within each cohort, i.e. for each soil type. For example, the265

best performing nitrogen fertilizer practices were expected to be different between cohorts of farmers growing maize on266

a shallow sandy soil versus a deep clayey soil. Consequently, the results of one cohort were not supposed to be directly267

relevant for another cohort. Thus, each cohort was considered to represent an independent identification problem, i.e.268

had its own independent identification strategy that did not share information with the identification strategies of other269

cohorts. On the other hand, for a given cohort, from one season to another, the identification strategy kept memory of270

all results obtained during past seasons. In our study, we considered two types of identification strategies: the standard271

ETC (Explore-Then-Commit) strategy, previously referred as the “intuitive strategy”, and the risk-aware-bandit based272

BCB (Bounded-CVaR-Thompson-Sampling Batch) strategy. For the seven soil types in Table 1, the two identification273

strategy types were either all ETC, or all BCB, but not a mix of both.274

Intuitive identification strategy (ETC) ETC provides a simple and intuitive solution to the exploration-exploitation275

dilemma (Lattimore and Szepesvári, 2020). During an initial exploration phase of an arbitrary number of years, ETC276

equiproportionally test all nitrogen fertilizer practices. Thereafter, the exploitation phase starts and ETC chooses for277

the remaining time the fertilizer strategy that has shown best performance during the exploration phase. In Section B.2278

of Supplementary Materials, we provide a simple adaptation of ETC to the batch setting (see Section 2.1) using the279

CVaR of rewards rather than the classical expectation of rewards. We considered ETC-3 and ETC-5, with respectively280

three and five years for the exploration phase. During the exploration phase, fertilizer practices are randomly assigned281

in equal proportions to the farmers within the cohort.282

Bandit based identification strategy (BCB) BCB is a risk-aware bandit algorithm (Cassel et al., 2018) which uses283

the CVaR of rewards as decision criterion, in the batched bandit setting. BCB is an extension of the algorithm presented284

in Baudry et al. (2021a). We specifically designed BCB for the identification of best management practices with a group285

of farmers. The general idea of this bandit algorithm is, to use in each crop growing season the rewards acquired during286

all past growing seasons, such that the algorithm learns to optimally manage the exploration-exploitation dilemma.287

Gautron et al.: Preprint submitted to Elsevier Page 12 of 39


Bandits for best crop management identification

An overview of the execution of BCB is shown in Algorithm 1. More detailed information can be found in288

Supplementary Materials Section B.1. For the execution of BCB (see first step of Algorithm 1), we set the maximum289

obtainable maize YE at 4000 kg/ha (Figure 3) for ANEref = 15 kg grain/kg N for all fertilizer practices. The statistical290

performance guarantees are demonstrated in Section C, Supplementary Materials.291

Algorithm 1 Simplified pseudo-code of BCB (Bounded-CVaR-Thompson-Sampling Batch).
for fertilizer practice 𝑘 ∈ {1,⋯ , 𝐾} do

Add maximum observable value to the results of fertilizer practice 𝑘 // prior to any experiments

end
for season 𝑡 ∈ {1,⋯ , 𝑇 } do

for farmer 𝑓 ∈ {1,⋯ , 𝑛} do
for fertilizer practice 𝑘 ∈ {1,⋯ , 𝐾} do

Re-weight the rewards of the fertilizer practice 𝑘 with random weights sampled from a Dirichlet
distribution (Everitt and Skrondal, 2002)

Score practice 𝑘 with a noisy empirical measure of the CVaR at level 𝛼 of practice 𝑘 from the
re-weighted rewards

end
Recommend to the farmer 𝑓 the fertilizer practice with the maximum score

end
Collect and store all results of the season for all fertilizer practices

end

2.4. Performance measures of the identification strategies292

In order to compare the identification strategies, for each season 𝑡, we computed:293

1. An empirical measure of the objective that was defined in Equation 8. This is an estimate of the CVaR at 30%294

of the YE obtained by an identification strategy; it should be maximized. See Section D.1 of Supplementary295

Materials.296

2. The cumulated regret, which mirrors the empirical measure of the objective expressed in term of cumulated maize297

yield loss, and should be minimized. Yield loss is measured as the performance difference in yield between an298

omniscient identification strategy (that always chooses the best available fertilization practice for each soil type)299

and the strategy being evaluated. The cumulated regret is a convenient statistic for theorical performance analysis300

and for representing the performance of an algorithm with less noise compared to the empirical measure of the301

objective. See Section D.2 of Supplementary Materials.302

Gautron et al.: Preprint submitted to Elsevier Page 13 of 39


Bandits for best crop management identification

3. Results303

3.1. Simulated maize yield responses to nitrogen fertilizer practices304

All simulated maize yield responses to nitrogen fertilization showed values within the expected ranges for the305

growing conditions in Koutiala, with an average grain yield varying from 3125 kg/ha for a sandy soil with low306

fertility (ITML84105) up to 3945 kg/ha for a loamy soil (ITML84106). When applying the most promising fertilization307

strategies, YE (i.e. yield gain compared to the reference) ranged from 1200 kg/ha to 1800 kg/ha, and CVaR30%(YE)308

(i.e. the mean crop YE of the 30% lowest yields) from 500 kg/ha to 1032 kg/ha. Table 3 provides the statistics of the309

best available nitrogen fertilizer practices for each soil type (Table 1), and Figure A.1 in Supplementary Materials310

shows the distributions of grain maize yield, ANE and YE responses.311

There was no simple parametric assumption that could be made about YE, such as its probability distribution to be312

Gaussian (e.g., for fertilizer Practice 5, Figure A.1e). The left fat tails for e.g. Practices 0 and 4 or the bi-modality of YE313

e.g. for Practices 6 and 7 (Figure A.1e), further supported the use of CVaR as a relevant risk measure. Above all, CVaR314

is most relevant for asymmetric and irregularly shaped distributions, such as fat-tailed or multi-modal distributions315

(e.g. Rockafellar et al., 2000). For all soil types, the best available nitrogen fertilizer practices were either Practice 0 or316

8 i.e. practices with a single nitrogen top-dressing application that is not threshold dependent (Table 3).317

Yet, the fertilizer practices had different responses for the different soil types in terms of grain yield and ANE (and318

consequently YE), and ranking of the practices was inconsistent across the soil types (Figure A.1). For instance, for319

the soil ITML840104 (silt clay loam of medium fertility), Practices 0 to 4 all had similar YE values (Figure A.1e),320

whilst, for the soil ITML840105 (silt clay loam of low fertility), Practices 0, 1 and 4 had substantially higher YE values321

compared to Practices 2 and 3 (Figure A.1f).322

Threshold-based fertilizer practices behaved inconsistently across the soil types. For example, is the case of the323

bi-modal YE distribution of the Practice 1, the YE probability density was predominantly concentrated around 0 kg/ha324

for the soil ITML840104 (Figure A.1e) and around 1800 kg/ha for the soil ITML840105 (Figure A.1f). The low to zero325

YE values in the case of soil ITML840104 and Practice 1 can be attributed to the fact that the nitrogen-stress factor326

threshold of 0.2 was not reached in most seasons, and consequently no top-dressing occurred (Table 2). In such cases,327

only a basal dressing of 15 kg N/ha was applied, instead of the total 135 kg N/ha when the top dressing was triggered.328

The associated probability density of grain yield was concentrated around 1000 kg/ha (Figure A.1a). On the other329

hand, for the soil ITML840105, the nitrogen-stress threshold of 0.2 was reached most seasons and Practice 1 involved330

both basal and top-dressing fertilization. This corresponded to YE values of around 1800 kg/ha (Figure A.1f), and the331

corresponding grain yields were generally around 4000 kg/ha (Figure A.1b).332

Gautron et al.: Preprint submitted to Elsevier Page 14 of 39


Bandits for best crop management identification

Table 3

Statistics of the best available nitrogen fertilizer practices for each of the soil types presented in Table 1.

soil 𝜋∗ N̄𝜋∗ CVaR30%(Y
𝜋∗ ) Ȳ𝜋∗ ̄ANE𝜋∗ CVaR30%(YE𝜋∗ ) ȲE𝜋∗

(kg/ha) (kg/ha) (kg/ha) (kg/kg) (kg/ha) (kg/ha)

ITML840101 0 120.0 (1.0) 3091 3874 (666) 30.0 (5.4) 1032 1795 (651)
ITML840102 8 69.8 (4.0) 2391 3150 (653) 33.2 (7.5) 652 1270 (529)
ITML840103 8 70.0 (0.4) 2539 3152 (526) 34.4 (6.8) 808 1356 (475)
ITML840104 8 69.9 (2.7) 2533 3339 (682) 31.7 (8.1) 500 1169 (565)
ITML840105 8 70.0 (1.2) 2467 3127 (570) 34.2 (7.3) 757 1346 (508)
ITML840106 0 120.0 (1.2) 3132 3945 (695) 28.9 (5.5) 900 1667 (660)
ITML840107 8 69.9 (2.7) 2472 3247 (659) 32.5 (8.0) 565 1226 (559)

For the corresponding best available nitrogen fertilizer practice 𝜋∗, we define N𝜋∗ : quantity of nitrogen fertilizer applied;

CVaR30%(𝑋): conditional Value-at-Risk of 𝑋 of level 30% (Section 2.2); 𝑋̄: mean value of 𝑋; Y𝜋∗ : maize grain yield;

ANE𝜋∗ : Agronomic Nitrogen use Efficiency; YE𝜋∗ : Yield Excess (Section 2.2); values in parentheses indicate standard

deviations.

3.2. Identification of best fertilizer practices333

Section 3.2.1 provides a visual comparison of nitrogen fertilizer recommendations following the BCB and ETC-5334

identification strategies. In Section 3.2.2, we present a direct measure of empirical performances of the nitrogen335

fertilizer practice identification strategies (see also Section D.1 in Supplementary Materials), and in Section 3.2.3,336

we illustrate the regret as a proxy measure (see also Section D.2).337

3.2.1. Visualization of identification strategies338

Figure 5 provides the average proportions at which the fertilizer practices were selected by the identification339

strategies, from the beginning of the experiment to time 𝑇 , exemplified for soil types ITML840105 and ITML840101.340

For the soil ITML840105 (silt clay loam of low fertility), after a span of 20 years of experimentation, BCB selected the341

Practice 8, which was the best available one for this soil type (see Table 3), with an average proportion of 50%. ETC-5342

also decided on the same practice, with an average proportion of 31%. For the soil ITML840101, BCB and ETC-5343

similarly performed after 20 years of experimentation. For this soil type, BCB sampled the best available Practice 0344

(Table 3) with an average proportion of 27%, ETC-5 selected the same practice with an average proportion of 26%. In345

the case of ETC-5, the constant and equal proportions of each management practice during the five first years seen in346

Figures 5b and 5d illustrate the equiproportional initial exploration phase used by the strategy.347

3.2.2. Empirical measure of the objective348

On average, farmers following the nitrogen fertilizer recommendations based on the BCB identification strategy had349

a higher empirical CVaR at 30% of YE than farmers following the recommendations from the ETC strategies, from350

the second year of the experiment onwards. Figure 6 shows the evolution of the CVaR at 30% of the YE for all cohorts351

Gautron et al.: Preprint submitted to Elsevier Page 15 of 39


Bandits for best crop management identification

2 4 6 8 10 12 14 16 18 20
time step (year)

0%

20%

40%

60%

80%

100%

p
ro

p
o
rt

io
n
 i
n
 s

a
m

p
lin

g

Identification strategy of BCB ; soil ITML840105 
960 replications

practice index
8

0

4

1

5

9

7

3

2

6

(a) BCB sampling proportions for soil ITML840105.

2 4 6 8 10 12 14 16 18 20
time step (year)

0%

20%

40%

60%

80%

100%

p
ro

p
o
rt

io
n
 i
n
 s

a
m

p
lin

g

Identification strategy of ETC_5 ; soil ITML840105 
960 replications

practice index
8

0

4

1

5

9

7

3

2

6

(b) ETC-5 sampling proportions for soil ITML840105.

2 4 6 8 10 12 14 16 18 20
time step (year)

0%

20%

40%

60%

80%

100%

p
ro

p
o
rt

io
n
 i
n
 s

a
m

p
lin

g

Identification strategy of BCB ; soil ITML840101 
960 replications

practice index
0

1

4

8

9

5

7

6

2

3

(c) BCB sampling proportions for soil ITML840101.

2 4 6 8 10 12 14 16 18 20
time step (year)

0%

20%

40%

60%

80%

100%
p
ro

p
o
rt

io
n
 i
n
 s

a
m

p
lin

g

Identification strategy of ETC_5 ; soil ITML840101 
960 replications

practice index
0

1

4

8

9

5

7

6

2

3

(d) ETC-5 sampling proportions for soil ITML840101.
Figure 5: Averaged sampling proportions for soils ITML840105 and ITML840101, 𝑇 = 20 years. The whole experiment

was replicated 960 times with a different random generator initialization each time. The fertilizer practices are ordered

according to the true Conditional Value-at-Risk at level 30% (CVaR) of their Yield Excess (YE) with ANEref=15 kg

grain/kg N ; the greener the color, the better a fertilizer practice is. Close colors indicate similar practice performances.

BCB: Bounded-CVaR-Thompson-Sampling Batch; ETC-5: Explore-Then-Commit strategy with an exploration phase of 5

years.

(soil types) throughout the years (Equation D.2). The difference in performance between BCB and ETC is relatively352

high during the initial years. For instance, at year 4, farmers following recommendations from the BCB identification353

strategy had a CVaR at 30% of YE of 318 kg/ha, compared to 168 kg/ha (47% less than BCB) and 74 kg/ha (77% less354

than BCB) for farmers following the recommendations from the ETC-3 and the ETC-5 strategies, respectively. Thus,355

BCB allowed to identify sooner the best available fertilizer practices and consequently further avoided low crop yield356

outcomes compared to ETC strategies. ETC strategies were adversely affected by their exploration phases during which357

Gautron et al.: Preprint submitted to Elsevier Page 16 of 39


Bandits for best crop management identification

2 4 6 8 10 12 14 16 18 20
time step T (year)

100

200

300

400

500

e
m

p
ir

ic
a
l 
C

V
a
R

 o
f 

Y
E
 (

kg
/h

a
)

Empirical CVaR @ alpha=30% ; mean batch size: 299 
960 replications

BCB

ETC_3

ETC_5

90% confidence interval

Figure 6: Empirical conditional Value-at-Risk (CVaR) at level 30% (CVaR) of maize Yield Excesses (YE) between 𝑇 = 0

and the considered 𝑇 ; ANEref = 15 kg grain/kg N. The whole experiment was replicated 960 times with a different random

generator initialization each time. One time step 𝑇 is one year ; ‘mean batch size’ is the number of virtual farmers who

have volunteered to participate in the experiment, averaged over all years and all replications. Confidence intervals were

computed following Thomas and Learned-Miller (2019).

all fertilizer practices were equiproportionally tested. In contrast, BCB had a continuously increasing empirical CVaR,358

during the whole duration of the experiment.359

3.2.3. Cumulated regret360

For 𝛼 = 30%, the BCB identification strategy outperformed ETC strategies, regardless of the number of years361

during which the strategy was applied. Figure 7 shows the evolution of the mean cumulated regret for all cohorts362

throughout the years of the simulated experiment (Equation D.5). The difference in performance between BCB and ETC363

increased for the whole duration of the experiment. After 20 years, farmers following recommendations from the BCB364

identification strategy experienced a mean cumulated regret of 2400 kg/ha, compared to 3385 kg/ha (41% more than365

BCB) and 3701 kg/ha (54% more than BCB) for farmers following the recommendations respectively from the ETC-3 and366

ETC-5 strategies. Consequently, farmers following BCB recommendations accumulated less regret compared to farmers367

following ETC recommendations. Furthermore, the variance of the cumulated regret (due to the different weather series368

in the experiments, for each season and each field trial, and the variability in cohorts each year) was smaller for BCB than369

for ETC, confirming that the BCB strategy was more robust (see quantile ranges in Figure 7) for this decision problem.370

Gautron et al.: Preprint submitted to Elsevier Page 17 of 39


Bandits for best crop management identification

0 2 4 6 8 10 12 14 16 18 20
time step T (year)

0

1000

2000

3000

4000

5000

cu
m

u
la

te
d
 Y

E
 C

V
A

R
 r

e
g
re

t 
(k

g
/h

a
)

Averaged over #960 replications for alpha=30%
mean batch size: 299

BCB

ETC_3

ETC_5

0.05 to 0.95 quantile range

Figure 7: Mean cumulated regret of population, for the Conditional Value-at-Risk (CVaR) at level 30% of Yield Excess

(YE); ANEref = 15 kg grain/kg N. The cumulated regret is averaged over the virtual farmers’ population, between 𝑇 = 0

and the considered 𝑇 . The whole experiment was replicated 960 times with a different random generator initialization each

time. One time step 𝑇 is one year, ‘mean batch size’ is the number of virtual farmers who have volunteered to participate

in the experiment, averaged over all years and all replicates.

3.2.4. Sensitivity analysis371

In Section E of Supplementary Materials, we present the same as in Sections 3.2.2 and 3.2.3 but for higher CVaR372

levels of 𝛼 = 50% and 𝛼 = 100%, respectively. The CVaR with 𝛼 = 1 recovers the usual expectation 𝔼[𝑋]. For373

𝛼 = 50%, the difference between BCB on one side and ETC-3 and ETC-5 on the other other side is similar to what374

was observed for 𝛼 = 30%. For 𝛼 = 100%, ETC-3 was the best performer, and BCB and ETC-5 performed equally.375

Nonetheless, BCB showed a smaller variance than both ETC-3 and ETC-5.376

4. Discussion377

4.1. Benefits from an adaptive identification strategy378

Practical perspective In multi-year multi-location on-farm trials, participating farmers simultaneously conduct field379

experiments with crops over multiple seasons to compare e.g. crop management practices (e.g. Naudin et al., 2010;380

Baudron et al., 2012; Falconnier et al., 2016). After a given number of years, results i.e. crop yields, are typically381

analyzed using mixed linear models (Laird and Ware, 1982) to account for random effects associated with fields382

and farms. Best crop management practices are then identified by researchers, based on the statistical analyses. In383

our simulated nitrogen fertilization practice decision problem, we adopted the intuitive ETC identification strategy as384

Gautron et al.: Preprint submitted to Elsevier Page 18 of 39


Bandits for best crop management identification

a substitute for the traditional approach of designing and analyzing multi-year, multi-location on-farm trials. Both385

replicated on-farm trials and ETC consist of an exploration phase of a fixed duration (data collection), followed by386

an exploitation phase (application of the best identified practice(s) after analysis of collected data). Consequently,387

both replicated on-farm trials and ETC can be considered as non-adaptive identification strategies: before the end388

of the exploration phase, the intermediary results are not exploited to gradually refine the experimental setup. In389

contrast, bandit-based identification strategies, such as BCB, refine the recommendations every year, based on the results390

observed in the previous years. The better a crop management practice, the more its representation among the tested391

practices grows over time. From a farmer’s perspective, this mean that the probability of testing worse performing392

recommendations (compared to the best available practice) decreases over time. This is in contrast with non-adaptive393

identification strategies, that equi-proportionally recommend all crop management practices during the exploration394

phase. The cost of the identification of best management practices is likely to be reduced for the farmers when using395

bandit-based approaches. Another common method to generate crop management recommendation consists in the396

use of calibrated crop simulation models and scenario analyses (e.g. Huet et al., 2022). Although this method has397

its limitations due to model uncertainty (Yin et al., 2017), it can be complementary to the bandit-based approach.398

For example, a set of candidate crop management practices can first be determined based on outcomes from crop399

modeling, and out of those, the true best option can then be identified from field trials with the bandit algorithms for400

the experimental set-up.401

Theoretical perspective ETC is theoretically proven to be a sub-optimal identification strategy without a calibration402

of the duration of the exploration phase that would require strong prior knowledge on the complexity of the decision403

problem, which is usually unavailable (Lattimore and Szepesvári, 2020, Chapter 6). In the numerical experiments, for404

𝛼 = 100%, ETC-3, with the tree years of an exploration phase, performed best. This can be likely associated with the405

particular YE distributions and the size of the farmer groups in question. A relatively minor change in the decision406

problem may, however, induce that three years of exploration is no longer optimal (e.g. by changing 𝛼 to 30% or 50%).407

More generally, prior to an experiment, there is no guarantee than an arbitrary number of years of exploration in the408

ETC strategy will be optimal, and consequently there are no guarantees about the performance of ETC, as opposed to409

BCB (see theoretical results in Section C). The main benefit of the BCB strategy over the ETC strategy is that it eliminates410

the need to select parameter values that require knowledge that is a priori not available. BCB neither requires strong411

assumptions about probability laws of reward distributions, as opposed to other common bandit algorithms. The only412

requisite for BCB is knowledge of the maximum observable reward. In agronomy, such knowledge is usually available413

through expert knowledge. For instance, considering crop yield as reward, an expert can usually estimate the crop yield414

Gautron et al.: Preprint submitted to Elsevier Page 19 of 39


Bandits for best crop management identification

potential under the given crop growing conditions, e.g., through crop growth modeling or from field experiments that415

are conducted under optimal growing conditions (Affholder et al., 2013).416

4.2. Adaptations to real field conditions417

Definition of fertilization practices In our study, the mean simulated maize yield with a total mineral nitrogen418

fertilizer application of 70 kg N/ha (Practice 8) ranged from 2.6 to 3.2 t/ha depending on soil type. In a set of on-farm419

experiments in the region of Koutiala, Falconnier et al. (2016) observed that maize grain yield interquartile ranged420

between 1.2 and 3.1 t/ha for a local variety and a total mineral nitrogen fertilizer application of 80 kg N/ha (86 on-farm421

trials), which confirms our model simulations were reliable. As reported by Falconnier et al. (2016), yield was mostly422

influenced by seasonal weather, soil type (water holding capacity) and the preceding effect of the previous crop. For all423

soil types, none of the identified best performing nitrogen fertilizer practices in our simulated experiment were based424

on thresholds linked to rainfall or plant nitrogen status, nor involved split top-dressing. This does, however, not discard425

the potential benefit from threshold-based fertilizer practices, or split top-dressing. In semi-arid conditions such as426

in Koutilia, crop nitrogen demand can be highly different between seasons and, therefore, should be evaluated as the427

season develops (Piha, 1993). It is however, not certain that models such as DSSAT reproduce the crop responses to split428

applications, if no nitrogen losses occur through e.g., nitrate leaching. It is important that the set of candidate fertilizer429

practices to explore are carefully selected from the vast possible combinations of practice attributes, e.g., application430

time, rate and number of splits. In our study, the values for the fertilizer practice attributes were most likely not optimal,431

because our objective was on establishing an improved generic identification method for crop management, rather than432

designing refined fertilizer recommendations. For an application in real field conditions, we recommend these attributes433

to be first estimated using existing expert knowledge and/or crop growth model simulations (see Section 4.1). The434

set of candidate practices can also comprise practices that are based on advanced methods, such as refined balance-435

based methods or machine learning-based methods for nitrogen fertilization (e.g., Morris et al., 2018; Timsina et al.,436

2021). More generally, the design of fertilizer recommendations must include experts, local extensionists and farmers437

themselves (Cerf and Meynard, 2006; Hochman and Carberry, 2011). Finally, it is important to take into account that438

the quantity of mineral fertilizer a farmer can apply often depends on access to financial resources and markets (Jayne439

et al., 2003).440

Objective to maximize We defined the farmers’ objective as maximizing the CVaR at level 𝛼 = 30% of the YE with441

ANEref = 15 kg grain/kg N. The YE indicator is meaningful as it represents the yield gain compared to a reference442

fertilizer practice, and can be easily calculated. The value of 𝛼 allows to adjust the risk aversion level for a cohort443

of farmers. The value of ANEref defines an invariant economic and environmental trade-off setting the boundaries of444

the performance of nitrogen fertilizer use. Considering a set of pre-defined nitrogen fertilizer practices, yield losses445

Gautron et al.: Preprint submitted to Elsevier Page 20 of 39


Bandits for best crop management identification

were defined as the expected performance difference between the best available fertilizer practice and the other less446

performing practices, in the face of the seasonal weather uncertainty.447

However, in our study we did not evaluate fertilizer practices by their economic return that depends on many factors,448

such as fertilizer subsidies, fertilizer market price, application costs, and grain selling prices. Including those factors449

dramatically increases the complexity of the identification problem, and so does the required amount of data to identify450

the best practices (we provide more details in Supplementary Materials, Section F). In this context, we must keep in451

mind the inherent constraints of modeling farmer’s objectives and decisions, that always remains a proxy for real life452

situations and choices (McCown, 2002). It is evident that farmers should play an active role in the formulation and453

validation of the objective to maximize, ensuring that mathematical terms are meaningfully translated into practical454

cases of crop management decision problems.455

Evaluation of the method Statistical performance guarantees of the identification strategies hold in the face of the456

many possible decision problems, i.e., in many different real field conditions. In the simulated experiment, we quantified457

the performances of identification strategies with thousands of replications in order to be as confident as possible in their458

evaluation. However, conducting the identification strategies in real conditions poses challenges, and is not meaningful459

when done on a single experiment. Rather, the identification strategies should be tested in as many as possible real-460

world situations. Such large-scale experiments are costly, but necessary to objectively evaluate identification strategies,461

including the "intuitive strategy".462

4.3. Limits and possible improvements463

In our simulated crop management decision problem, we largely simplified the experimental structure of multi-464

location, multi-year replicated field trials. First, weather time series random variables were independent and identically465

distributed for all model simulations. Such assumption is unlikely to be true in the real world, because weather spatial466

correlations can be high, for instance in case of extreme weather events (Tack and Holt, 2016). Second, within the467

same cohort, it was assumed that had identical soil type and maize cultivars, and were closely adhering to the assigned468

fertilizer practices. For the application of our methodology in real field conditions, variations in site conditions and469

other potential random effects should be properly considered. This means that the bandit identification strategy we470

introduced should be extended to account for experimental structure and multiple factors at stake. Here, contextual471

bandits (Lattimore and Szepesvári, 2020) could potentially provide solutions by enabling the sharing of information472

between similar decision contexts (the cohorts) and similar fertilizer practices (i.e., the data collected about a given473

context/fertilizer practice provides also insights for other “close” contexts/fertilizer practices).474

Finally, in the model simulations, we assumed average weather (rainfall, temperature) in southern Mali to remain475

the same throughout the 20 years of the experiment. Such hypothesis is unlikely in real conditions given climate476

Gautron et al.: Preprint submitted to Elsevier Page 21 of 39


Bandits for best crop management identification

change (e.g., Traore et al., 2017). From a decision problem perspective, best available management practices are likely477

to change over time under climate change, as a response to the increasing occurrences of heat and water stress (Adam478

et al., 2020). Such problem can be formalized as a non-stationary bandit problem (Lattimore and Szepesvári, 2020). To479

handle this, the BCB strategy can be equipped with a sliding window approach where the algorithm primarily focuses480

on the most recent rewards, discarding older ones over time (Garivier and Moulines, 2011; Baudry et al., 2021b).481

5. Conclusion482

Bandit learning algorithms aim at optimally balancing between exploration (gathering information) and exploita-483

tion (using the information) in uncertain decision problems with repeated choice between contending actions. In a484

simulated problem of the identification of best fertilizer practices with virtual farmers, we compared our bandit-based485

algorithm to the “intuitive strategy” of Explore-Then-Commit (ETC) in which the set of pre-defined practices are tested486

in an equiproportional way during a fixed number of years. During simulated field trials in southern Mali, the bandit-487

based identification strategy minimized maize yield losses from testing worse performing fertilizer practices compared488

to the true best available practice by up to 35% after 20 years. This novel approach opens up new perspectives as489

an alternative to the usual multi-year, multi-location on-farm trials. The bandit-based identification strategy can be490

employed to identify best management practices in real field conditions, if variability in site conditions, possible491

correlations between site conditions, and other potential random effects are further considered.492

Software availability493

All the numerical experiments in this paper are meant to be as reproducible as possible, and the code is open source.494

The Python code with the necessary packages, instructions and experimental data are provided in the following public495

GitLab repository: https://gitlab.inria.fr/rgautron/batch-cvts/-/tree/master. The simulations are496

performed with gym-DSSAT (https://gitlab.inria.fr/rgautron/gym_dssat_pdi), a modified version of the497

Decision Support System for Agrotechnology Transfer (DSSAT) software (https://dssat.net/).498

Declaration of competing interests499

The authors declare that they have no known competing financial interests or personal relationships that could have500

appeared to influence the work reported in this paper.501

Acknowledgments502

This work has been supported by:503

• The French Agricultural Research Centre for International Development (CIRAD).504

Gautron et al.: Preprint submitted to Elsevier Page 22 of 39

https://gitlab.inria.fr/rgautron/batch-cvts/-/tree/master
https://gitlab.inria.fr/rgautron/gym_dssat_pdi
https://dssat.net/


Bandits for best crop management identification

• The Consultative Group for International Agricultural Research (CGIAR) Platform for Big Data in Agriculture.505

• The French Ministry of Higher Education and Research, Hauts-de-France region, Inria within the Scool team506

project and MEL.507

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest508

group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see509

https://www.grid5000.fr).510

References511

Acerbi, C., Tasche, D., 2002. On the coherence of expected shortfall. Journal of Banking & Finance 26, 1487–1503. doi:10.1016/512

S0378-4266(02)00283-2.513

Adam, M., MacCarthy, D.S., Traoré, P.C.S., Nenkam, A., Freduah, B.S., Ly, M., Adiku, S.G., 2020. Which is more important to sorghum production514

systems in the sudano-sahelian zone of west africa: Climate change or improved management practices? Agricultural Systems 185, 102920.515

Affholder, F., 1995. Effect of organic matter input on the water balance and yield of millet under tropical dryland condition. Field Crops Research516

41, 109–121.517

Affholder, F., Poeydebat, C., Corbeels, M., Scopel, E., Tittonell, P., 2013. The yield gap of major food crops in family agriculture in the tropics:518

Assessment and analysis through field surveys and modelling. Field Crops Research 143, 106–118.519

Agrawal, S., Koolen, W.M., Juneja, S., 2021. Optimal best-arm identification methods for tail-risk measures, in: Advances in Neural Information520

Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual.521

Baudron, F., Tittonell, P., Corbeels, M., Letourmy, P., Giller, K.E., 2012. Comparative performance of conservation agriculture and current522

smallholder farming practices in semi-arid zimbabwe. Field crops research 132, 117–128.523

Baudry, D., Gautron, R., Kaufmann, E., Maillard, O., 2021a. Optimal thompson sampling strategies for support-aware cvar bandits, in: International524

Conference on Machine Learning, PMLR. pp. 716–726.525

Baudry, D., Russac, Y., Cappé, O., 2021b. On Limited-Memory Subsampling Strategies for Bandits, in: ICML 2021- International Conference on526

Machine Learning, Vienna / Virtual, Austria.527

Cassel, A., Mannor, S., Zeevi, A., 2018. A general approach to multi-armed bandits under risk criteria, in: Conference On Learning Theory, PMLR.528

pp. 1295–1306.529

Cerf, M., Meynard, J.M., 2006. Les outils de pilotage des cultures: diversité de leurs usages et enseignements pour leur conception. Natures Sciences530

Sociétés 14, 19–29.531

Cerf, M., Sebillotte, M., 1997. Approche cognitive des décisions de production dans l’exploitation agricole [confrontation aux théories de la532

décision]. Economie rurale 239, 11–18.533

Coulibaly, D., Sissoko, F., Doumbia, S., Ba, A., Dembele, B., 2017. Evaluation de l’effet de la fertilisation minerale sur la production de varietes534

ameliorees de mais et le disponible fourrager en zone cotonniere du mali-sud (mali). Agronomie Africaine 29, 109–117.535

Dowd, K., 2007. Measuring market risk. John Wiley & Sons.536

Evans, K.J., Terhorst, A., Kang, B.H., 2017. From data to decisions: helping crop producers build their actionable knowledge. Critical reviews in537

plant sciences 36, 71–88.538

Everitt, B., Skrondal, A., 2002. The Cambridge dictionary of statistics. volume 106. Cambridge University Press Cambridge.539

Gautron et al.: Preprint submitted to Elsevier Page 23 of 39

https://www.grid5000.fr
http://dx.doi.org/10.1016/S0378-4266(02)00283-2
http://dx.doi.org/10.1016/S0378-4266(02)00283-2
http://dx.doi.org/10.1016/S0378-4266(02)00283-2


Bandits for best crop management identification

Falconnier, G.N., Corbeels, M., Boote, K.J., Affholder, F., Adam, M., MacCarthy, D.S., Ruane, A.C., Nendel, C., Whitbread, A.M., Justes, É., et al.,540

2020. Modelling climate change impacts on maize yields under low nitrogen input conditions in sub-saharan africa. Global change biology 26,541

5942–5964.542

Falconnier, G.N., Descheemaeker, K., Van Mourik, T.A., Giller, K.E., 2016. Unravelling the causes of variability in crop yields and treatment543

responses for better tailoring of options for sustainable intensification in southern mali. Field Crops Research 187, 113–126.544

Fosu-Mensah, B., MacCarthy, D., Vlek, P., Safo, E., 2012. Simulating impact of seasonal climatic variation on the response of maize (zea mays l.)545

to inorganic fertilizer in sub-humid ghana. Nutrient cycling in agroecosystems 94, 255–271.546

Garivier, A., Moulines, E., 2011. On upper-confidence bound policies for switching bandit problems, in: International Conference on Algorithmic547

Learning Theory, Springer. pp. 174–188.548

Gautron, R., Maillard, O.A., Preux, P., Corbeels, M., Sabbadin, R., 2022. Reinforcement learning for crop management support: Review, prospects549

and challenges. Computers and Electronics in Agriculture 200, 107182.550

Gautron, R., Padrón González, E.J., 2022. gym-DSSAT - A crop model turned into a Reinforcement Learning environment. URL: https:551

//gitlab.inria.fr/rgautron/gym_dssat_pdi.552

Getnet, M., Van Ittersum, M., Hengsdijk, H., Descheemaeker, K., 2016. Yield gaps and resource use across farming zones in the central rift valley553

of ethiopia. Experimental Agriculture 52, 493–517.554

Hochman, Z., Carberry, P., 2011. Emerging consensus on desirable characteristics of tools to support farmers’ management of climate risk in555

australia. Agricultural Systems 104, 441–450.556

Hoogenboom, G., Porter, C., Boote, K., Shelia, V., Wilkens, P., Singh, U., White, J., Asseng, S., Lizaso, J., Moreno, L., et al., 2019. The dssat crop557

modeling ecosystem. Advances in crop modelling for a sustainable agriculture , 173–216.558

Huet, E., Adam, M., Traore, B., Giller, K., Descheemaeker, K., 2022. Coping with cereal production risks due to the vagaries of weather, labour559

shortages and input markets through management in southern mali. European Journal of Agronomy 140, 126587.560

Jayne, T.S., Govereh, J., Wanzala, M., Demeke, M., 2003. Fertilizer market development: a comparative analysis of ethiopia, kenya, and zambia.561

Food policy 28, 293–316.562

Jones, J., Tsuji, G., Hoogenboom, G., Hunt, L., Thornton, P.K., Wilkens, P., Imamura, D., Bowen, W., Singh, U., 1998. Decision support system563

for agrotechnology transfer: Dssat v3. Understanding options for agricultural production , 157–177.564

Jourdain, D., Lairez, J., Striffler, B., Affholder, F., 2020. Farmers’ preference for cropping systems and the development of sustainable intensification:565

a choice experiment approach. Review of Agricultural, Food and Environmental Studies 101, 417–437.566

Kalaji, H.M., Dabrowski, P., Cetner, M.D., Samborska, I.A., Lukasik, I., Brestic, M., Zivcak, M., Tomasz, H., Mojski, J., Kociel, H., et al., 2017. A567

comparison between different chlorophyll content meters under nutrient deficiency conditions. Journal of Plant Nutrition 40, 1024–1034.568

Laird, N.M., Ware, J.H., 1982. Random-effects models for longitudinal data. Biometrics , 963–974.569

Lattimore, T., Szepesvári, C., 2020. Bandit algorithms. Cambridge University Press.570

Mandelbrot, B.B., 1997. The variation of certain speculative prices, in: Fractals and scaling in finance. Springer, pp. 371–418.571

Massart, P., 1990. The tight constant in the dvoretzky-kiefer-wolfowitz inequality. Annals of Probability 18.572

McCown, R.L., 2002. Changing systems for supporting farmers’ decisions: problems, paradigms, and prospects. Agricultural systems 74, 179–220.573

Menapace, L., Colson, G., Raffaelli, R., 2013. Risk aversion, subjective beliefs, and farmer risk management strategies. American Journal of574

Agricultural Economics 95, 384–389.575

Morris, T.F., Murrell, T.S., Beegle, D.B., Camberato, J.J., Ferguson, R.B., Grove, J., Ketterings, Q., Kyveryga, P.M., Laboski, C.A., McGrath, J.M.,576

et al., 2018. Strengths and limitations of nitrogen rate recommendations for corn and opportunities for improvement. Agronomy Journal 110, 1.577

Gautron et al.: Preprint submitted to Elsevier Page 24 of 39

https://gitlab.inria.fr/rgautron/gym_dssat_pdi
https://gitlab.inria.fr/rgautron/gym_dssat_pdi
https://gitlab.inria.fr/rgautron/gym_dssat_pdi


Bandits for best crop management identification

Naudin, K., Gozé, E., Balarabe, O., Giller, K.E., Scopel, E., 2010. Impact of no tillage and mulching practices on cotton production in North578

Cameroon: a multi-locational on-farm assessment. Soil and Tillage Research 108, 68–76.579

Perchet, V., Rigollet, P., Chassang, S., Snowberg, E., 2015. Batched bandit problems, in: Grünwald, P., Hazan, E., Kale, S. (Eds.), Proceedings of580

The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, JMLR.org. p. 1456. URL: http://proceedings.mlr.581

press/v40/Perchet15.html.582

Piha, M., 1993. Optimizing fertilizer use and practical rainfall capture in a semi-arid environment with variable rainfall. Experimental Agriculture583

29, 405–415.584

Richardson, C.W., Wright, D.A., 1984. WGEN: A model for generating daily weather variables. ARS (USA) .585

Ripoche, A., Crétenet, M., Corbeels, M., Affholder, F., Naudin, K., Sissoko, F., Douzet, J.M., Tittonell, P., 2015. Cotton as an entry point for soil586

fertility maintenance and food crop productivity in savannah agroecosystems–evidence from a long-term experiment in southern Mali. Field587

crops research 177, 37–48.588

Robbins, H., 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58, 527–535.589

Rockafellar, R.T., Uryasev, S., et al., 2000. Optimization of conditional value-at-risk. Journal of risk 2, 21–42.590

Soltani, A., Hoogenboom, G., 2003. A statistical comparison of the stochastic weather generators WGEN and simmeteo. Climate Research 24,591

215–230.592

Tack, J.B., Holt, M.T., 2016. The influence of weather extremes on the spatial correlation of corn yields. Climatic Change 134, 299–309.593

Tamkin, A., Keramati, R., Dann, C., Brunskill, E., 2020. Distributionally-aware exploration for cvar bandits, in: NeurIPS 2019 Workshop on Safety594

and Robustness in Decision Making; RLDM 2019.595

Ten Berge, H.F., Hijbeek, R., Van Loon, M., Rurinda, J., Tesfaye, K., Zingore, S., Craufurd, P., van Heerwaarden, J., Brentrup, F., Schröder, J.J.,596

et al., 2019. Maize crop nutrient input requirements for food security in sub-saharan africa. Global Food Security 23, 9–21.597

Thomas, P., Learned-Miller, E., 2019. Concentration inequalities for conditional value at risk, in: International Conference on Machine Learning,598

PMLR. pp. 6225–6233.599

Thompson, W.R., 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25,600

285–294.601

Tilman, D., Cassman, K.G., Matson, P.A., Naylor, R., Polasky, S., 2002. Agricultural sustainability and intensive production practices. Nature 418,602

671–677.603

Timsina, J., Dutta, S., Devkota, K.P., Chakraborty, S., Neupane, R.K., Bishta, S., Amgain, L.P., Singh, V.K., Islam, S., Majumdar, K., 2021.604

Improved nutrient management in cereals using nutrient expert and machine learning tools: Productivity, profitability and nutrient use efficiency.605

Agricultural Systems 192, 103181.606

Traore, B., Descheemaeker, K., Van Wijk, M.T., Corbeels, M., Supit, I., Giller, K.E., 2017. Modelling cereal crops to assess future climate risk for607

family food self-sufficiency in southern mali. Field Crops Research 201, 133–145.608

Vanlauwe, B., Kihara, J., Chivenge, P., Pypers, P., Coe, R., Six, J., 2011. Agronomic use efficiency of n fertilizer in maize-based systems in609

sub-saharan africa within the context of integrated soil fertility management. Plant and soil 339, 35–50.610

Yin, X., Kersebaum, K.C., Kollas, C., Baby, S., Beaudoin, N., Manevski, K., Palosuo, T., Nendel, C., Wu, L., Hoffmann, M., Hoffmann, H., Sharif,611

B., Armas-Herrera, C.M., Bindi, M., Charfeddine, M., Conradt, T., Constantin, J., Ewert, F., Ferrise, R., Gaiser, T., de Cortazar-Atauri, I.G.,612

Giglio, L., Hlavinka, P., Lana, M., Launay, M., Louarn, G., Manderscheid, R., Mary, B., Mirschel, W., Moriondo, M., Öztürk, I., Pacholski,613

A., Ripoche-Wachter, D., Rötter, R.P., Ruget, F., Trnka, M., Ventrella, D., Weigel, H.J., Olesen, J.E., 2017. Multi-model uncertainty analysis614

in predicting grain n for crop rotations in europe. European Journal of Agronomy 84, 152–165. URL: https://www.sciencedirect.com/615

Gautron et al.: Preprint submitted to Elsevier Page 25 of 39

http://proceedings.mlr.press/v40/Perchet15.html
http://proceedings.mlr.press/v40/Perchet15.html
http://proceedings.mlr.press/v40/Perchet15.html
https://www.sciencedirect.com/science/article/pii/S1161030116302532
https://www.sciencedirect.com/science/article/pii/S1161030116302532
https://www.sciencedirect.com/science/article/pii/S1161030116302532


Bandits for best crop management identification

science/article/pii/S1161030116302532, doi:https://doi.org/10.1016/j.eja.2016.12.009.616

Gautron et al.: Preprint submitted to Elsevier Page 26 of 39

https://www.sciencedirect.com/science/article/pii/S1161030116302532
https://www.sciencedirect.com/science/article/pii/S1161030116302532
http://dx.doi.org/https://doi.org/10.1016/j.eja.2016.12.009


Bandits for best crop management identification

617

Supplementary Materials618

A. Maize simulations619

The cultivation scenarios were based on the the conditions found in Southern Mali. The soils came from Adam620

et al. (2020) who compiled and supplemented with survey data the soils found in the literature for the location of621

Koutiala, Mali. The data of Adam et al. (2020) included soils’ depth, texture, water capacity, bulk density, organic622

matter content, pH and initial mineral nitrogen content. Soil characteristics and proportions in the population were623

summarized in Table 1, based on Adam et al. (2020). During the simulations, the weather times series were generated624

using the WGEN weather model (see Richardson and Wright, 1984; Soltani and Hoogenboom, 2003). WGEN had625

been parameterized on 47-year-long historical daily weather records from a weather station located in N’Tarla found626

in Ripoche et al. (2015), which was located about 20 km from Koutiala ; these historical weather records were the627

best available. The cultivars used in the simulation and its parameterization in DSSAT are presented in Table A.1 ;628

this cultivars comes with DSSAT default data and was representative of the cultivars used in Mali. The cultivars were629

already calibrated based on experiments carried out in Mali. The simulations were initiated on Day Of Year (DOY) 140630

and the planting is automatically performed in a window ranging from DOY 155 to 185 ; we specified the parameters631

of the automatic planting with Table A.2. For each soil, the initial soil nitrogen content was set according to the values632

found in Adam et al. (2020). The soil water content was set to crop lower limit, as a result of the end of the dry season633

at the usual planting dates. Because the simulations were initiated prior to planting date and because the weather was634

stochastically generated, the soil nitrogen mineral and water contents were uncertain at planting time. Each simulation635

was performed independently from the previous ones. At the beginning of the experiment, all the soils described in636

Table 1 were randomly distributed amongst the initial group of farmers following the proportions provided in Table 1.637

Figure A.1 shows the simulated yield distributions for ITML840104 and ITML840105 soils.638

B. Algorithms639

B.1. Details about BCB640

In algorithm B.1, we provide the detailed pseudo-code of BCB (BCB). As shown by Figure B.1, the higher the641

number of collected rewards, the less the weights sampled from Dirichlet distributions exhibit variance. This variance642

directly relates to the noise introduced in the computation of the score of the different available actions.643

Remark B.1 (First season). Algorithm B.1 is well defined for the first season as without data all CVaRs will be equal644

to the maximum observable result, making the algorithm choose each option arbitrarily at random. On average, each645

Gautron et al.: Preprint submitted to Elsevier Page 27 of 39


Bandits for best crop management identification

Table A.1

Maize cultivar parametrization in DSSAT

name ecotype P1 P2 P5 G2 G3 PHINT

Sotubaka IB0001 300.0 0.520 930.0 500.0 6.00 38.90

Table A.2

Automatic planting parametrization in DSSAT. PFRST: Starting date of the planting window; PLAST: End date of the

planting window; PH2OL: Lower limit on soil moisture for automatic planting; PH2OU: Upper limit on soil moisture

for automatic planting; PH2OD: Depth to which average soil moisture is determined for automatic planting; PSTMX:

Maximum temperature of planting; PSTMN: Minimum temperature of planting.

PFRST (DOY) 155
PLAST (DOY) 185
PH2OL (%) 40
PH2OU (%) 100
PH2OD (cm) 30
PSTMX (◦C) 40
PSTMN (◦C) 10

option will be equally explored. Note that we could replace this step by an equi-proportional exploration step (similar646

to Explore-Then-Commit, see B.2) without changing the theoretical properties of our algorithm. Furthermore, the647

decision maker could also include any additional results collected before the experiment (if the practices has already648

been tested for some time) in the initialization of the algorithm.649

B.2. Explore-Then-Commit (ETC)650

We provide the pseudo-code of the Explore-Then-Commit (ETC) strategy with algorithm B.2. The noise in-651

troduced by random weights and the presence of the maximum observable results in the histories manage the652

exploration/exploitation dilemma. BCB will favor fertilizer practices with higher CVaR compared to the others. But,653

the algorithm will still prevent the under-exploration of fertilizer practices by choosing them with a proper probability,654

even if e.g. poor YE have been observed due to rare unfavorable weather events. Indeed, with the extra randomness655

introduced by the random weighting of rewards, poor rewards may be re-weighted by smaller weights compared to656

higher rewards, yielding a good score. The amount of noise introduced by the random weights sampled from the657

Dirichlet distribution is related to variance of these random weights. The greater the number of rewards, the lesser the658

variance and consequently the lesser the noise (Figure B.1). Thereby, the more a fertilizer practice was tried by the659

algorithm, the closer its score gets to the true CVaR of rewards. The presence of the maximum observable YE acts as660

an “optimistic bonus" in the computation of the scores, encouraging exploration even for sub-optimal practices, as it661

raises up their initial values when few rewards have been observed.662

Gautron et al.: Preprint submitted to Elsevier Page 28 of 39


Bandits for best crop management identification

0 2000 4000 6000 8000
dry grain yield (kg/ha)

0
1
2
3
4
5
6
7
8
9

(a) Yield distributions for soil
ITML840104. Stars represent the
CVaR at level 30%.

0 2000 4000 6000 8000
dry grain yield (kg/ha)

0
1
2
3
4
5
6
7
8
9

(b) Yield distributions for soil
ITML840105. Stars represent the
CVaR at level 30%.

0 10 20 30 40 50 60 70
nitrogen use efficency (kg/kg)

0
1
2
3
4
5
6
7
8
9

(c) Agronomic Nitrogen Efficiency
(ANE) distributions for soil
ITML840104. Stars represent the
mean value.

0 10 20 30 40 50 60 70
nitrogen use efficency (kg/kg)

0
1
2
3
4
5
6
7
8
9

(d) Agronomic Nitrogen Efficiency
(ANE) distributions for soil
ITML840105. Stars represent the
mean value.

1000 0 1000 2000 3000 4000
yield excess (kg/ha)

0
1
2
3
4
5
6
7
8
9

(e) Yield Excess (YE) distributions for
soil ITML840104 with ANEref=15 kg
grain/kg N. Stars represent the CVaR at
level 30%.

1000 0 1000 2000 3000 4000
yield excess (kg/ha)

0
1
2
3
4
5
6
7
8
9

(f) Yield Excess (YE) distributions for
soil ITML840105 with ANEref=15 kg
grain/kg N. Stars represent the CVaR at
level 30%.

Figure A.1: Simulated impact of maize fertilizer practices on grain yield, Agronomic Nitrogen use Efficiency (ANE), Yield

Excess (YE) for 105 hypothetical years using a weather generator. Maize cultivar was the same for all simulations. Practices

indexes are indicated on the left-hand side of each sub-figure.

C. Theoretical Analysis663

This section is devoted to the theoretical analysis of the BCB algorithm. We will mostly adapt the analysis of Baudry664

et al. (2021a), and show that the problem of learning with batched data of finite upper bounded size is no harder than665

the pure online learning problem considered in the original paper.666

Gautron et al.: Preprint submitted to Elsevier Page 29 of 39


Bandits for best crop management identification

Algorithm B.1 BCB: identification strategy at cohort level (detailed)
Input: Level 𝛼, horizon 𝑇 , 𝐾 options, upper bounds 𝐵1,… , 𝐵𝐾 ,  𝑐 the set of all farmers in the cohort
Init.: ∀𝑘 ∈ {1, ..., 𝐾}: 𝑘 = {𝐵𝑘}, 𝑁𝑘 = 0 ;  𝑐

1 = {𝑓1,⋯ , 𝑓𝑛1} ; 𝑡 = 1 ; 1 = {∅}
// Beginning of first season
for 𝑓 ∈  𝑐

1 do
Randomly assign a crop management option 𝑎 ∈ {1,… , 𝐾} to the farmer 𝑓
1 = 1 ∪ {𝑎}

end
// End of first season
for (𝑎, 𝑓 ) ∈ (1, 𝑐

1 ) do
Receive the result of the option 𝑎 from farmer 𝑓 : 𝑟𝑓,𝑎Update 𝑎 = 𝑎 ∪ {𝑟𝑓,𝑎}, 𝑁𝑎 = 𝑁𝑎 + 1

end
for 𝑡 ∈ {2,… , 𝑇 } do

// Beginning of season 𝑡
Get  𝑐

𝑡 = {𝑓1,⋯ , 𝑓𝑛𝑡} ; // the set of farmers of the same cohort to provide recommendations

for 𝑘 ∈ {1,… , 𝐾} do
Update the empirical CVaR of action 𝑘: 𝑐𝑘,𝑡−1 = 𝐶𝛼(𝑘)

end
for 𝑓 ∈  𝑐

𝑡 do
Update the empirical regret of farmer 𝑓 : 𝑙𝑓,𝑡−1 = 𝑅𝛼

𝑓 (𝑡 − 1)
end
𝑡 = {∅} ; // the set of recommendations to provide to the farmers
for 𝑓 ∈  𝑐

𝑡 do
for 𝑘 ∈ {1,… , 𝐾} do

Draw 𝜔𝑘 = {𝑤1,⋯ , 𝑤𝑁𝑘
} ∼ 𝑁𝑘

; // Dirichlet of concentration parameter (1,⋯ , 1)
⏟⏞⏟⏞⏟
𝑁𝑘 times

Search 𝑗 the maximum index such that ∑𝑗
𝑖=1𝑤𝑖 ≤ 𝛼

Sort 𝑘 in increasing order
Compute 𝑐𝑘 = 𝑥𝑗 −

1
𝛼
∑𝑁𝑘

𝑖=1𝑤𝑖max(𝑥𝑗 − 𝑥𝑖, 0) ; // assign a score to action 𝑘

end
𝑎 = argmax𝑘∈{1,…,𝐾}𝑐𝑘
𝑡 = 𝑡 ∪ {𝑎}

end
for (𝑎, 𝑓 ) ∈ (𝑡, 𝑐

𝑡 ) do
Assign action 𝑎 to farmer 𝑓

end
// End of season 𝑡
for (𝑎, 𝑓 ) ∈ (𝑡, 𝑐

𝑡 ) do
Receive result of action a from farmer 𝑓 : 𝑟𝑓,𝑎Update 𝑎 = 𝑎 ∪ {𝑟𝑓,𝑎}, 𝑁𝑎 = 𝑁𝑎 + 1

end
end

Theorem C.1 (𝛼-CVaR Regret of BCB). Consider a bandit problem (𝐹1,… , 𝐹𝐾 ) ∈ 𝐾 , with respective CVaR𝛼667

denoted by (𝑐1,… , 𝑐𝐾 ) with 𝑐1 = argmax𝑘=1,…,𝐾𝑐𝑘. Assume that BCB runs for 𝑇 seasons, and that at each season the668

size of the batch is 𝑛𝑇 ≤ 𝐹 ∈ ℕ. Then, for any 𝜀 > 0 small enough there exists some 𝜀1 > 0, 𝜀2 > 0 such that the669

Gautron et al.: Preprint submitted to Elsevier Page 30 of 39


Bandits for best crop management identification

0.00 0.05 0.10 0.15 0.20 0.25
weights

100

10re
wa

rd
 n

um
be

r

Random weights from Dirichlet distributions

Figure B.1: Examples of weights sampled from Dirichlet distributions during BCB execution, respectively for 10 and 100

rewards. The greater the number of rewards, the less variance the weights show. The variance of weights is related to the

noise level in the computation of the empirical CVaR of BCB.

regret of BCB satisfies :670

𝛼
𝑇 ≤

𝐾
∑

𝑘=2
Δ𝛼
𝑘

(

𝑚𝑘
𝑇 + 𝐹 + 2𝐹 𝑒−2𝑚

𝑘
𝑇 𝜀21

1 − 𝑒−2 𝜀
1
1

+ 𝐶𝛼
1,𝜀2

)

,671

where 𝑚𝑘
𝑇 = log(𝑇 )+log(𝐹 )

𝛼,
inf (𝐹𝑘,𝑐1)−𝜀

and 𝐶1,𝜀2 is a constant depending only on the distribution 𝐹1, the family  and 𝜀2.672

It is interesting to compare this regret upper bound to the one obtained in the purely sequential setting, that we673

recall in Theorem C.2.674

Theorem C.2 (𝛼-CVaR Regret of B-CVTS with time horizon 𝑆𝑇 (adapted from Theorem 3 in Baudry et al. (2021a))).675

Consider a bandit problem (𝐹1,… , 𝐹𝐾 ) ∈ 𝐾 , with respective CVaR𝛼 denoted by (𝑐1,… , 𝑐𝐾 ) with 𝑐1 = argmax𝐾𝑐𝑘.676

Consider a number of data collected 𝑆𝑇 . Then, for any 𝜀 > 0 small enough there exists some 𝜀1 > 0, 𝜀2 > 0 such that677

the CVaR-regret of B-CVTS satisfies678

𝛼
𝑇 ≤

𝐾
∑

𝑘=2
Δ𝛼
𝑘

(

𝑛𝑘𝑆𝑇
+ 2 𝑒−2𝑛

𝑘
𝑆𝑇

𝜀21

1 − 𝑒−2 𝜀
1
1

+ 𝐶𝛼
1,𝜀2

)

,679

where 𝑚𝑘
𝑆𝑇

= log(𝑆𝑇 )
𝛼,

inf (𝐹𝑘,𝑐1)−𝜀
and 𝐶1,𝜀2 is a constant depending only on the distribution 𝐹1, the family  and 𝜀2.680

First, we see that if 𝐹 is indeed a constant (i.e do not depend on the time) then when 𝑇 is large enough then 𝐹681

has not impact on the scaling of the regret. In our proof the main impact of the batch setting is an additive term 𝐹 for682

each arm, hence the regret becomes close to the one of the sequential setting once 𝑚𝑘
𝑇 ≫ 𝐹 . Finally, if the number683

of farmers in each batch is exactly 𝐹 at each step then 𝑆𝑇 = 𝐹𝑇 and, 𝑚𝑘
𝑇 = 𝑛𝐾𝑆𝑇

, hence the asymptotically dominant684

(logarithmic) term is the same in the two settings.685

These theoretical results show that learning with batch feedback does not introduce theoretical limitations in our686

setting, and so the BCB algorithm is theoretically grounded.687

Gautron et al.: Preprint submitted to Elsevier Page 31 of 39


Bandits for best crop management identification

Algorithm B.2 ETC: identification strategy at cohort level
Input: Level 𝛼, horizon 𝑇 , 𝐾 options,  𝑐 the set of all farmers in the cohort, 𝑡trials the number of years of trials
Init.: ∀𝑘 ∈ {1,⋯ , 𝐾} ∶ 𝑁𝑘 = 0
// Do trials during 𝑡trials years
for 𝑡 ∈ {1,⋯ , 𝑡trials} do

// Beginning of the season 𝑡
Get  𝑐

𝑡 = {𝑓1,⋯ , 𝑓𝑛𝑡} ; // get the farmers willing to participate
𝑡 = {∅}

Fill 𝑡 by uniformly distributing the 𝐾 options to the farmers in  𝑐
𝑡

// End of the season 𝑡
for (𝑎, 𝑓 ) ∈ (𝑡, 𝑐

𝑡 ) do
Receive the result of the option 𝑎 from farmer 𝑓 : 𝑟𝑓,𝑎Update 𝑎 = 𝑎 ∪ {𝑟𝑓,𝑎}, 𝑁𝑎 = 𝑁𝑎 + 1

end
end
for 𝑘 ∈ {1,… , 𝐾} do

Compute the empirical CVaR of action 𝑘: 𝑐𝑘,𝑡−1 = 𝐶𝛼(𝑘)
end
𝑎max = argmax𝑘∈{1,…,𝐾}𝑐𝑘 ; // get the action that best performed during trials
// After trial phase, always recommend the action that best performed during trials
for 𝑡 ∈ {𝑡trials + 1,⋯ , 𝑇 } do

// Beginning of the season 𝑡
Get  𝑐

𝑡 = {𝑓1,⋯ , 𝑓𝑛𝑡}
for 𝑓 ∈  𝑐

𝑡 do
Assign option 𝑎max to the farmer 𝑓

end
// End of the season 𝑡
for 𝑓 ∈  𝑐

𝑡 do
Receive the result of the option 𝑎max from farmer 𝑓 : 𝑟𝑓,𝑎maxUpdate 𝑎max

= 𝑎max
∪ {𝑟𝑓,𝑎max

}, 𝑁𝑎max
= 𝑁𝑎max

+ 1
end

end

Proof of Theorem C.1. As in the proof of Baudry et al. (2021a) we will decompose the expected number of pulls of688

each sub-optimal arm inside the cohort according to several possible events, corresponding to "good" scenarios (the689

empirical distributions accurately reflect the true distributions) and "bad" ones (the empirical distributions give a wrong690

idea of the true performance of some arms) for the trajectory of the bandit algorithms. We denote by 𝑇 the number of691

seasons in the experiments and 𝑛𝑡 the number of farmers at each season 𝑡 for this cohort, and by 𝐹 the total number692

of farmers available for the experiment. Then, the expected number of pulls of arm 𝑘 during the total duration of the693

experiment inside the cohort is694

𝔼[𝑁𝑘(𝑇 )] = 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘)

]

,695

where 𝐴𝑡,𝑓 denotes the recommendation to farmer 𝑓 at season 𝑡.696

Gautron et al.: Preprint submitted to Elsevier Page 32 of 39


Bandits for best crop management identification

The first step of the proof consists in considering the number of pulls of 𝑘 when its sample size is larger (resp.697

smaller) than some fixed threshold 𝑚𝑇 , that we will specify later.698

𝔼[𝑁𝑘(𝑇 )] = 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘)

]

699

≤ 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 )

]

+ 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑇
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 )

]

700

701

We now consider the first term and introduce the random variable 𝜏 = {sup𝑡≤𝑇 ∶ 𝑁𝑘(𝑡−1) ≤ 𝑚𝑇 }. By construction,702

𝜏 is the last season for which the total number of observations for arm 𝑘 is smaller than 𝑚𝑇 . Using the basic properties703

of 𝜏 we obtain that:704

𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 ) ≤

𝜏
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 ) +

𝑇
∑

𝑡=𝜏+1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≤ 𝑚𝑇 )705

≤ 𝑁𝑘(𝜏) +
𝑛𝜏+1
∑

𝑓=1
1(𝐴𝜏,𝑓 = 𝑘)706

≤ 𝑚𝑇 + 𝐹707

708

As this result does not depend on the value of 𝜏, we can then obtain:709

𝔼[𝑁𝑘(𝑇 )] ≤ 𝑚𝑇 + 𝐹 + 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 )

]

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
𝐴

.710

At this step, the only difference with the purely sequential bandit problem is the additional 𝐹 . We now consider the711

term 𝐴, that we further analyze according to three events: (1) the empirical distribution of arm 𝑘 is not close to its true712

distribution, (2) the empirical distribution of arm 𝑘 is close to its true distribution but the "noisy" CVaR computed for713

arm 𝑘 over-estimates its true CVaR, and (3) the "noisy" CVaR computed for the optimal arm 1 under-estimates its true714

CVaR. Classically in bandit analysis, we decompose the number of pulls of arm 𝑘 according to these three events, as715

at least one of them must be true when 𝐴𝑡,𝑓 = 𝑘 holds, that is716

{𝐴𝑡 = 𝑘} ⊂ {𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘)} ∪ {𝐹𝑘,𝑡−1 ∈ 𝜀1 (𝐹𝑘), 𝑐𝑘,𝑡,𝑓 ≥ 𝑐1 − 𝜀2} ∪ {𝑐1,𝑡,𝑓 ≤ 𝑐1 − 𝜀2} ,717

Gautron et al.: Preprint submitted to Elsevier Page 33 of 39


Bandits for best crop management identification

where 𝜀1 (𝐹𝑘) is an 𝜀1-Levy ball around 𝐹𝑘, and 𝜀1, 𝜀2 are two small positive constants. This leads to718

𝐴 ≤ 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘))

]

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
𝐴1

719

+ 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∈ 𝜀1 (𝐹𝑘), 𝑐𝑘,𝑡,𝑓 ≥ 𝑐1 − 𝜀2)

]

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
𝐴2

720

+ 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝑐1,𝑡,𝑓 ≤ 𝑐1 − 𝜀2)

]

⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
𝐴3

.721

Upper bounding 𝐴2 Denoting by 𝐹𝑘,𝑛 the empirical distribution of arm 𝑘 after a total number of pulls 𝑛 (instead of722

after season 𝑡), we obtain723

𝐴1 ∶= 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘))

]

724

≤ 𝔼

[ 𝑇
∑

𝑡=1
1(𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘))

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘)

]

725

≤ 𝔼

[ 𝑇
∑

𝑡=1

𝑇
∑

𝑛=𝑚𝑇

1(𝑁𝑘(𝑡 − 1) = 𝑛, 𝐹𝑘,𝑡−1 ∉ 𝜀1 (𝐹𝑘))
𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘)

]

,726

727

with a union bound on the number of pulls. Under 𝑁𝑘(𝑡 − 1) = 𝑛 it holds that 𝐹𝑘,𝑡−1 = 𝐹𝑘,𝑛, and so we can further728

write that729

𝐴1 ≤ 𝔼

[ 𝑇
∑

𝑡=1

𝑇
∑

𝑛=𝑚𝑇

1(𝑁𝑘(𝑡 − 1) = 𝑛, 𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘))
𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘)

]

730

≤ 𝔼

[ 𝑇
∑

𝑛=𝑚𝑇

1(𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘))
𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) = 𝑛)

]

731

≤ 𝐹𝔼

[ 𝑇
∑

𝑛=𝑚𝑇

1(𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘))

]

732

Gautron et al.: Preprint submitted to Elsevier Page 34 of 39


Bandits for best crop management identification

= 𝐹
+∞
∑

𝑛=𝑚𝑇

ℙ(𝐹𝑘,𝑛 ∉ 𝜀1 (𝐹𝑘))733

734

Finally, using the Dvoretzky–Kiefer–Wolfowitz inequality (Massart, 1990) we obtain:735

≤ 𝐹
+∞
∑

𝑛=𝑚𝑇

2𝑒−2𝑛 𝜀
2
1736

≤ 2𝐹𝑒−2𝑚𝑇 𝜀21

1 − 𝑒−2 𝜀
2
1

.737

738

This upper bound holds for any choice of 𝑚𝑇 , 𝜀1, and we remark that if 𝑚𝑇 → +∞ then 𝐴1 → 0.739

Upper bounding𝐴2 The term𝐴2 is then handled with similar tricks, and the arguments used in Baudry et al. (2021a).740

𝐴2 ∶= 𝔼

[ 𝑇
∑

𝑡=1

𝑛𝑡
∑

𝑓=1
1(𝐴𝑡,𝑓 = 𝑘,𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∈ 𝜀1 (𝐹𝑘), 𝑐𝑘,𝑡,𝑓 ≥ 𝑐1 − 𝜀2)

]

741

≤ 𝔼

[ 𝑇
∑

𝑡=1

𝐹
∑

𝑓=1
1(𝑁𝑘(𝑡 − 1) ≥ 𝑚𝑇 , 𝐹𝑘,𝑡−1 ∈ 𝜀1 (