MACHINE LEARNING AND BIG DATA TECHNIQUES FOR SATELLITE-BASED RICE PHENOLOGY MONITORING A thesis submitted to the University of Manchester for the degree of Master of Philosophy in the Faculty of Science & Engineering August 2019 By Andre´s Aguilar Ariza School of Physics and Astronomy Contents Abstract 9 Declaration 10 Copyright 11 Acknowledgements 12 1 General Introduction 13 1.1 Remote sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 Remote sensing in agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3 Rice phenological detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Crop diseases detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 Rice Growth Phases Detection 25 2.1 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.1 Study areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.2 Ground data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.3 Satellite data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.4 Satellite images processing . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1.5 Vegetation index time series processing . . . . . . . . . . . . . . . . . . . 31 2.1.6 Supervised classification algorithms . . . . . . . . . . . . . . . . . . . . . 32 2.1.7 Evaluation metrics and scoring . . . . . . . . . . . . . . . . . . . . . . . 36 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.1 Optical images pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.2 NDVI time profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2 2.2.3 Training features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.2.4 Training and validation sets . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.5 Parameters grid search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.2.6 Machine learning models testing in Cesar . . . . . . . . . . . . . . . . . . 45 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3 Rice Disease Detection 52 3.1 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Study area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.2 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.3 Satellite data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.4 Rice fields manual digitalization . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.5 Spectral reference patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.6 Phenological changes detection . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.1 Spatial fields preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.2 Healthy and unhealthy rice canopy spectral reference . . . . . . . . . . . 59 3.2.3 Rice field of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.4 Rice growth phases detection . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2.5 Heading and maturity stages identification . . . . . . . . . . . . . . . . . 62 3.2.6 Rice disease detection within the rice fields . . . . . . . . . . . . . . . . . 66 3.2.7 Application in other fields . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4 Summary and Further Research 73 3 List of Tables 1.1 List of standard vegetation indices, and their mathematical equation. . . . . . . 18 2.1 Corresponding central wavelength and spatial resolution available at each band for each instrument. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 List of features used in the models. . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Confusion matrix for binary classification. . . . . . . . . . . . . . . . . . . . . . 36 2.4 List of Sentinel-2 and Landsat images used in Saldan˜a . . . . . . . . . . . . . . . 38 2.5 Model inputs per fold and growth phase. . . . . . . . . . . . . . . . . . . . . . . 42 2.6 Percentage of cases that the model classifications are within the growth-phase period. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1 Rice diseases incidence score for rice fields cultivated with Fedearroz 67. The rice field size is the field area at hectares. . . . . . . . . . . . . . . . . . . . . . . 61 4 List of Figures 1.1 Rice plant cycle with phase and froth stages. the curve represents the temporal profile for NDVI (modified after Kuenzer and Knauer (2013); Mosleh et al. (2015)). 19 1.2 Rice-canopy spectral profiles for four severity levels of Helminthosporium Oryzae (modified after Zhao et al. (2012)). . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1 Location map of the rice fields used for training and validationin the Saldaa region. 27 2.2 Location map of the rice fields used for testing in the Cesar region. . . . . . . . 27 2.3 Overview of the methodology for processing Sentinel-2 and Landsat images. . . . 31 2.4 Example of the separable case for two classes (blue and red dots). The solid black line represents the decision boundary. . . . . . . . . . . . . . . . . . . . . 33 2.5 Location of the study area with Landsat-7 true-color image (blue, green and red bands) acquired on 7 July 2015. The black lines are regions in which there is no data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.6 Geometric registration results. The left and central images are the NIR re- flectance band at 10 m measured by Landsat-8 and Sentinel-2, respectively. The right image is the cross-correlation matrix. . . . . . . . . . . . . . . . . . . . . . 39 2.7 Boxplots of the captured NDVI reflectance distributions in each rice fields pixel in Saldan˜a at each date of interest. The lines represent the NDVI trend along the time. The colors refer to each rice growth phase. . . . . . . . . . . . . . . . 40 2.8 single pixel NDVI time series comparison between initial time series (red), after smoothing (green), and after regression (blue). . . . . . . . . . . . . . . . . . . . 41 2.9 Rice growth phase that was registered in each visit date for each field. The colors represent the growth phases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 2.10 Machine learning models results for different parameters combinations. Left: F1-score values obtained from Random Forest at a different number of trees and number of features. Right: F1-score obtained from SVM radial function that was trained with several combinations of γ and C values. The grey color represents combinations that are below 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.11 Results of the parameter grid search for SVM polynomial basis kernel. (a) The f1-scores for each of four parameters; (b) The f1-scores for cost and coef0 for and degree constants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.12 Results of the parameter grid search for XGBboost approach. (a) The f1-scores for each of five parameters; (b) The f1-scores for cost and coef0 keeping subsam- ple, eta, and maxdepthasconstants. . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.13 f1-score per growth phase for each classification model approach. . . . . . . . . . 45 2.14 Satellite data availability for the Cesar region. . . . . . . . . . . . . . . . . . . . 45 2.15 Rice growth phase detection results for two fields in Cesar at different times. The stacked bars compare the cumulative classification obtained per pixel. These bars are obtained at each date in which the model was computed. The ground observation data is exhibited below the bars. The lines are the technician’s registers, and the contour represents the rice field growth phase. The colors are the different classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.16 Comparison between the ground observation and the estimates growth phases per field. The points and shapes refer to the results obtained from the models at each date. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.17 Map of rice growth estimation in Saldan˜a region . . . . . . . . . . . . . . . . . . 50 2.18 Multi-temporal characterisation for a field affected by drought. The red line represents the date at which the technicians stopped the field monitoring. . . . . 51 3.1 GEE platform. The left panel contains two plots that show the time profiles for NDVI and green, NIR, red bands. The spectral data was extracted from the Sentinel-2 and Landsat missions, these are freely available on the platform for querying. The right panel points out a true color high resolution image, which was used to draw the polygon, represented in yellow color. . . . . . . . . . . . . 55 6 3.2 Spectral response of Sentinel-2 in the visible, near infrared and short infrared range. The spectral signatures that characterise rice canopy affected by several brown spot incidence levels (blue, yellow, green and red refers to D0, D1, D2, D3, and D4, respectively) are also shown. . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Location map of the rice fields located in the northern region of Tolima. . . . . . 59 3.4 Simulated spectral reflectance response to different Brown Spot infection levels across the Sentinel-2 spectral bands. . . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Results of Growth phase detection for “healthy” and “unhealthy” fields. . . . . . 62 3.6 Sentinel-2 true color images during the reproductive and ripening periods for the “healthy” and “unhealthy” fields. The red line is the field contour. . . . . . . . . 63 3.7 Spectral profiles. The red lines are the spectral information per pixel. The black line shows the averaged spectral profiles per time. . . . . . . . . . . . . . . . . . 64 3.8 Comparison between the simulated spectral reference and the spectral profiles for each date and for each field (“healthy” and “unhealthy”). The colorbar indicates Euclidean distance value, which its low values are colored in dark purple, whereas high values are colored as yellow. . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.9 Comparison between the spectral reference profiles (blue and red lines) and the rice fields averaged spectral profiles (green and orange lines) computed for both fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.10 Euclidean distances result from comparing the rice fields reflectance spectral profiles during the ripening phase. . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.11 Comparison between the reflectance spectral profiles per pixel for each field (i.e., “healthy” and “unhealthy” ) during the maturity stage. . . . . . . . . . . . . . . 66 3.12 Boxplots of the normalised bi-stage distributions by rice field disease status. . . 67 3.13 Number of cluster selected for the “healthy” and “unhealthy” fields. The red points highlight which cluster were picked. . . . . . . . . . . . . . . . . . . . . . 67 3.14 Map of the clusters calculated for the “healthy” field. The true color image is the date in which the rice field was in the heading stage. . . . . . . . . . . . . . 68 3.15 Averaged spectral profiles for each cluster that characterised the “Healthy” field. 68 7 3.16 Map of the clusters calculated for the “unhealthy” field. The true color image is the date in which the rice field was in the heading stage. . . . . . . . . . . . . . 69 3.17 Averaged spectral profiles for each cluster that characterised the “Unhealthy” field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.18 Results for three different rice fields. The map for each field is exhibited in the left panel; The disease incidence score is pointed out above the map. In the right, the averaged spectral profile for each cluster and field. . . . . . . . . . . . 70 3.19 Rice canopy spectral profiles at various nitrogen rates (Gnyp et al. (2013)). . . . 72 8 Abstract New sources of information are required to support rice production decisions. To cope with this challenge, studies have found practical applications on mapping rice through the use of remote sensing techniques. This study attempts to implement a methodology aimed at mon- itoring rice phenology using optical satellite data. The relationship between rice phenology and reflectance metrics was explored at two levels: growth stages and biophysical modifications caused by diseases. Two optical moderate-resolution missions were combined to detect growth phases. Three machine learning approaches (random forest, support vector machine, and gra- dient boosting trees) were trained with multitemporal NDVI data. Analytics from validation showed that the algorithms were able to estimate rice phases with performances above 0.94 in f-1 score. Tested models yielded an overall accuracy of 71.8%, 71.2%, 60.9% and 94.7% for vegetative, reproductive, ripening and harvested categories. A second exploration was carried out by combining Sentinel-2 data and ground-based information about rice disease incidence. K-means clustering was used to map rice biophysical changes across reproductive and ripening phases. The findings ascertained the remote sensing capabilities to create new information about rice for Colombia’s conditions. 9 Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institution of learning. 10 Copyright i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the Copyright) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the Intellectual Property) and any reproductions of copyright works in the thesis, for example graphs and tables (Reproductions), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commer- cialisation of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see http://www.campus. manchester.ac.uk/medialibrary/policies/intellectual- property.pdf), in any relevant Thesis re- striction declarations deposited in the University Library, The University Librarys regulations (see http://www.manchester.ac.uk/library/ aboutus/regulations) and in The University’s pol- icy on presentation of Theses. 11 Acknowledgements I would like to express my profound gratitude: to the RADA project, which sponsored my MPhil. To my supervisors Joseph Fennell and Sarah Bridle, who guided me with their valuable knowledge during this year. To the Colombian National Rice Growers Federation (Fedearroz) who made this study possible by providing the ground-based data. To the International Centre for Tropical Agricultura (CIAT), especially the data-driven agronomy team, which help to discover the data analysis impact on agriculture. To my beloved parents, Elcy Ariza and Jose` Roberto Aguilar, and my brothers, Ernesto, Alex y Felipe, who are my daily motivation. To my friends that despite all, they still have support words. 12 Chapter 1 General Introduction Many have contributed their knowledge and efforts for the consolidation of rice as one of the essential short-cycle crops in Colombia during the last decades. Government entities (Ministry of Agriculture and Rural Development), the National Rice Growers Federation (Fedearroz) and international research institutes (e.g., International Center for Tropical Agriculture) have played a key role in promoting rice research and improving rice production conditions. As a con- sequence, rice production in Colombia has increased from 2 tons.ha−1 in 1960 to 5.7 tons.ha−1 in 2016 (UNEP (2005)), and the rice production area has increased in 23 out of 32 departments with a total harvested area of 570,802 hectares (Fedearroz (2017)). Such expansion has turned rice into a vital crop for Colombia society (McLean et al. (2013)). Nevertheless, rice crop is ex- posed to several factors that affect its profitability (Amaya Montoya (2011)). High production cost, fluctuation on rice prices, international trades, disease outbreaks, and overproduction are some of the challenges that Colombian rice growers must face. Rice information is therefore essential in monitoring factors that may affect crop production. Rice crop data is valuable in diverse aspects to bring farmers agronomical decisions tools (Delerce et al. (2016); Jime´nez et al. (2016)), to serve as a base in political decisions related with food security (Cihlar (2000); Dong and Xiao (2016)), and to monitor pest and disease outbreaks (Gnanamanickam et al. (2010)). Due to its importance, many efforts have been carried out to get reliable information about rice-ecosystems. One of the most common methods is through surveys. Regardless of its high accuracy, this methodology is time-consuming and hard to implement at a large scale (Mosleh et al. (2015)). After that, new sources of information are required to obtain high-frequency data at low cost. Remote sensing has gained interest due 13 CHAPTER 1. GENERAL INTRODUCTION 14 to its capacity to monitor crops at different scales in a cost-effective manner (Dong and Xiao (2016); Kuenzer and Knauer (2013)). During the last decade, several studies have successfully proved the remote sensing capability on monitoring rice, creating valuable data for characterizing rice-crop conditions. Although the method has been widely implemented in countries which rice is extensively sowed (e.g., China, India, Vietnam) (Mosleh et al. (2015)), in Colombia there are few reported studies that have explored the relationship between rice and spectral satellite-derived metrics (Mart´ınez (2017)). The potential for growth phase detection can be further explored to characterise the intra- variability on rice fields. New metrics can be derived in order to monitor rice biophysical changes across phenological stages. A comprehensive assessment may link those changes to rice stresses factors such as diseases. This technology can be therefore used to identify and treat rice damages in fields (Gonza´lez-Betancourt and Mayorga-Ru´ız (2018); Wu et al. (2018)). The purpose of this study is to establish new methods for monitoring rice growth phase and disease incidence in Colombia. The objective was focused on creating remote sensing metrics to better characterise the rice morphological and physiological changes through its growth cycle. Using ground-based information shared by Fedearroz, a methodology was developed to detect rice growth phases using optical satellite data. Using this to control for the phenological stage of the crop, a method for detecting zones affected by diseases within the rice fields was developed. The remainder of this chapter contains the background information related to remote sens- ing techniques and general aspects of rice phenology. Chapter two explains the methodology developed for rice growth phase detection by blending Landsat and Sentinel-2 images. Chapter three explores a method for detecting phenological changes caused by diseases during repro- ductive and ripening phases. Chapter 4 gives a synthesis and future works that must be done in order to validate methodologies. CHAPTER 1. GENERAL INTRODUCTION 15 1.1 Remote sensing Remote Sensing is a method that measures properties from an object using devices which are not in contact with (Khorram et al. (2012); Mulla (2013)). These devices can be mounted on aircrafts, earth orbiting spacecrafts, or even held by hand (ground-based). The electromagnetic energy emitted or reflected by the object is captured by the devices or sensors. Any object at a given temperature should naturally emit electromagnetic radiation. The amount of energy is proportional to the source temperature (Joshi and Kumar (2008)). The electromagnetic (EM) radiation is the energy that moves at a speed of light in a harmonic wave pattern. The EM waves comprise an electromagnetic spectrum. One property of the electromagnetic radiation is the wavelength; this is a spatial distance from one wave position to the next wave in the same point (Khorram et al. (2012)). This attribute is used to split the spectrum into seven categories: gamma rays, x-rays, ultraviolet, visible rays, infrared rays, microwaves, and radio waves (Joshi and Kumar (2008)). One of the principal energy sources on earth is the sun. Various types of interactions can occur between EM and matter, such as absorption, reflection, scattering, emission, and transmission. Thus, a material has a characteristic absorbance and/or reflectance to the original EM radiation; this phenomenon is known as a spectral profile. The key of remote sensing is that the devices used can record the spectral profile (Khorram et al. (2012)). The sensors that detect electromagnetic radiation from natural sources are known as passive sensors. In contrast, active systems rely on illuminating the subject with a pulse or beam of radiation and measuring the backscatter. Airborne photography and optical satellites belong to the first group, while Light Detection and Ranging (LIDAR) and Radio Detection and Ranging (RADAR) are examples for the second one. Likewise, remote sensing information can be obtained in two ways: non-imaging, and imag- ing. Sensors such as spectroradiometers which are commonly used in ground-based applications are in the first category. For the imaging methods, one example is the earth observation data that is captured by the satellites (Martinelli et al. (2015)). The digital images can be stored at different spatial, temporal, spectral, and radiometric resolutions. Spatial resolution is the image grid size. Cadence is the revisiting time on a specific geographical location. Spectral CHAPTER 1. GENERAL INTRODUCTION 16 resolution is the sensor ability for registering wavelength intervals. Finally, radiometric resolu- tion characterises the sensor sensitivity (Khorram et al. (2012)). These parameters constrain the applicability of the product. Currently, close to 200 Earth Observation satellites are continuously registering information (Ma et al. (2015)). This amount of data has facilitated the development of applications in many fields such as oceanography, atmospheric, weather forecasting, environmental monitoring, urban structuration, and agriculture. CHAPTER 1. GENERAL INTRODUCTION 17 1.2 Remote sensing in agriculture The first studies using remote sensing techniques were focused on characterizing leaf chlorophyll content (Benedict and Swidler (1961); Thomas and Gausman (1977)). This pigment absorbs energy centered in blue and red, while reflects the wavelength in green, yet, these spectral regions are not the only ones related to vegetation. Baret et al. (1987), found that the red edge spectrum provides information related to the leaf area and the percentage of ground coverage. Bannari et al. (1995), mentioned that near-infrared (NIR) is affected by leaf cellular structure. To overall the plant-spectral relationship, the reflectance in the visible spectrum is low mainly caused by photosynthetic pigments. In the red-edge region, retrieval is influenced by chlorophyll content (Ramoelo et al. (2012)). The leaf cellular structure controls the absorbance in the near infrared region. Finally, for short-wavelength infrared, the reflectance is influenced by water, proteins, and other carbon constituents content (Huang et al. (2012); Pen˜uelas and Filella (1998)). Since the first Earth Observation satellite mission, studies have focused on characterizing radiometric response and vegetation cover using broadband sensors. Among spectral bands, red and NIR especially showed high sensitivity to vegetation conditions. The red band is absorbed by chlorophyll that is essential for the plant photosynthetic process, while NIR is reflected by leaf cellular structures. Thus, the combination of both bands allows identifying vegetation covers from others (e.g., soil, and water) and quantify the vigor of the plant (Bannari et al. (1995)). The quantification metrics derived from spectral band combinations are known as vegeta- tion indices. The first indices were computed as ratios of the green and NIR bands. Although these ratios are used to measure relative greenness, these indices may be affected by atmo- spheric effects, vegetative density, and study location (Bannari et al. (1995); Tucker (1979)). Rouse et al. (1973), found a way to reduce the negative aspect of existing vegetation indices by calculating the normalised difference between red and NIR bands. Thus, the normalised dif- ference vegetation Index (NDVI) gained notorious importance for plant biophysical properties monitoring. However, the NDVI is sensitive to soil background, atmospheric conditions, and heterogeneous canopies (Rondeaux et al. (1996); Xue and Su (2017)). To adequately assess the plant response using band combinations, many studies have developed different vegetation CHAPTER 1. GENERAL INTRODUCTION 18 indices to tackle specific problems (table 1.1). Table 1.1: List of standard vegetation indices, and their mathematical equation. Vegetation Index Equation Reference Radio Vegetation Index (RVI) (red) (nir) Pearson and Miller (1972) normalised Difference Vegetation Index (NDVI) (nir−red) (nir+red) Rouse et al. (1973) Enhanced Vegetation Index (EVI) 2 ∗ (nir−red) (nir+6∗red−7.5∗blue+1) Liu and Huete (1995) Enhanced Vegetation Index 2 (EVI2) 2 ∗ (nir−red) (nir+2.4∗red+1) Qiu et al. (2015) Land Surface Water Index (LSWI) (nir−swir1) (nir+swir1) Ceccato et al. (2002) normalised Difference Red Edge (NDRE) (nir−red−edge) (nir+red−edge) Liu et al. (2019) CHAPTER 1. GENERAL INTRODUCTION 19 1.3 Rice phenological detection Rice phenology can be divided into three main phenological phases: vegetative (germination to panicle initiation), reproductive (panicle initiation to flowering), and maturity or ripening (grain filling to maturity)(Moldenhauer and Slaton (2001)). Changes in the plant morphology characterise each agronomic stage. The vegetative phase initiates with the plant emergence; during this phase, the main characteristics are the gradual increase of plant height, leaf area, and an active tillering. The phase transition takes place when the tiller number per plant reaches a maximum. The reproductive stage follows maximum tillering. During this phase, the panicle primordia stage starts its initiation. The panicle needs approximately 25 days to achieve the heading stage, which refers to when 50% of the panicles have flowered. After 100% of flowering, the ripening stage takes place. The maturity stage is characterised by leaf senescence and grain filling (Yoshida (1981)). The rice cycle takes from 3 to 5 months; the difference is subject to the conditions in which the crop is developed (Kuenzer and Knauer (2013)). Figure 1.1: Rice plant cycle with phase and froth stages. the curve represents the temporal profile for NDVI (modified after Kuenzer and Knauer (2013); Mosleh et al. (2015)). As the rice plant change its growth phenology, the reflectance is affected at different spectral wavelength. Studies have adequately described the rice cycle as a function of vegetation indices. Several indices have shown a strong correlation with rice growth. For example, NDVI is affected by the low percentage of vegetation cover at the early vegetative phase, due to that its values are close to zero (figure 1.1). While the plant is growing, the chlorophyll quantity increases, CHAPTER 1. GENERAL INTRODUCTION 20 which exerts considerable influence in the absorbance of light in the red domain. Although the absorbance in the blue region also increments due to chlorophyll content, this domain is also influenced by carotenoids absorption (Pen˜uelas and Filella (1998)). In NIR region, the reflectance is increased by foliar and tillering development. The NDVI characterises this process as a gradual increment in its values. Finally, NDVI starts to decrease due to biomass reduction (decay and loss of leaves), greenness diminishing (chlorophyll content), and yellowness increase (rice grains filling)(Kuenzer and Knauer (2013); Mosleh et al. (2015)). The growth stage can, therefore, be predicted by the NDVI trajectory in time. Several stud- ies have shown the feasibility in detecting rice phenological changes using time-series NDVI reg- istered by satellite missions. Frequently studies referred to MODIS vegetation indices products that can produce NDVI and EVI on 16 days intervals at multiple spectral resolutions (NASA (2018)). Thus, Sakamoto et al. (2005) and Shihua et al. (2014) used EVI multi-temporal profiles to characterise the rice cycle. Three phenological dates (planting, heading, and harvest) were calculated based on the inflection points that were found in the time-series. Although MODIS products have a high temporal resolution (the products are staked from a daily measurement), their spatial resolution is low (>250 m), which hampers a precise identification (Onojeghuo et al. (2018)). Other studies have focused their analysis on satellites with higher spatial resolution than MODIS, such as Landsat or Sentinel-2. Nevertheless, their low temporal resolution compromises data availability. Recent methodologies merge multiple mission to overcome this inconvenience. For instance, Wang et al. (2015) created smoothed vegetation indices time-series by combining two optical satellites, Landsat-8, and HJ-1 CCD. They found reliable performances when both satellites were combined, instead of using only one. They assigned the rice phenological shift for those days when the vegetation index time-series reached a maximum. Wu et al. (2018) used radar and optical data to detect rice in early middle, and late phases for areas located in China. Thus, they summarized Sentinel-1s VH backscattering images taken during the rice plantation period in three layers minimum, difference, and maximum. The imagery stack was compared with the minimum and maximum coefficients, which defined the rice growth stages. Finally, they calculated the Landsat-8’s NDVI values for which areas filtered from the Sentinel-1. The NDVI was classified using the k-means approach to differentiate each rice stage. Although the CHAPTER 1. GENERAL INTRODUCTION 21 study reported a high level of overall accuracy (98.1 %) in their methodology, they worked under local rice system conditions, for example, assuming a general crop calendar and considering only irrigation rice CHAPTER 1. GENERAL INTRODUCTION 22 1.4 Crop diseases detection Plant pathogens are significant causes of yield loss (Huang et al. (2012)). The primary or- ganisms that cause damage to rice plants are bacteria, viruses, fungi, and nematodes (Ahn and Jennings (1982)). Hence, many efforts have risen in order to understand the symptoms caused by the pathogens, as well as to better characterise the specific environmental condi- tions that facilitate their outbreaks. For example, The Centre for Tropical Agriculture (CIAT) and Fedearroz have identified five significant pathogens for Colombia (Correa-Victoria and Zei- gler (1993); Fedearroz (2014)). These are Pyricularia Orizae, Helminthosporium, Rhizoctonia Solani, Rhynchosporium, Gaemannomyces. Rice blast disease is caused by the fungal pathogen Pyricularia Orizae (teleomorph Mag- naporthe Orizae); this is of considerable importance to rice producers (Dean et al. (2012)). The fungus affects all foliar tissues, and the major diagnostic symptoms are lesions on leaves, nodes, and different parts of the panicle (Ou (1985)). Although the disease is reported world- wide, symptoms depend on the climatic conditions, especially relative humidity (Ou (1985)). For example, in Colombia, the rice blast is mainly located in the eastern part of the country due to a suitable temperature and humidity conditions (Correa-Victoria and Zeigler (1993); Fedearroz (2014)). The establishes during early plant stages, tillering and neck emerge stages are susceptible to this pathogen (Venkatarao and Muralidharan (1982)) and is immediately controlled by farmers when symptoms present. Different factors affect the incidence of Brown Spot disease (Helminthosporium Orizae), but, environments with a scarce water supply and lacking in soil nutrient minerals, or with accumulated toxic substances, facilitated the diseases development (Ou (1985); Barnwal et al. (2013)). Indeed, Ou (1985) reported that Brown Spot disease is caused by a deficiency of one or more nutrient elements (nitrogen, silica, potassium manganese, and magnesium), and in practice, disease and deficiency symptoms are often inseparable. Thereby, the Brown Spot is sometimes used as a reference to mineral deficiencies (Ahn and Jennings (1982); Daytnoff et al. (1991); Barnwal et al. (2013)). Plant age is also relevant for disease growing. Although Brown Spot can attack at early plant stages, its damage is commonly noted in dough and maturity stages (Singh (2016)). Diseases modify plant physiology. These changes mostly occur in leaf structure and pigments CHAPTER 1. GENERAL INTRODUCTION 23 content. Consequently, the electromagnetic spectrum absorption at different wavelengths varies depending on the level of disease. For instance, one symptom of Magnaporthe Orizae is leaf cell necrosis, which causes pigments degradation; these changes can be therefore spotted by the visible spectral region (Zhou et al. (2019)). Kobayashi et al. (2001) reported that blue and red regions reflectance increases at dough stage, because of the carotenoids and chlorophyll content decrement. Pathogens such as Rhizoctonia and Helminthosporium also have been detected using the visible and NIR regions (Qin and Zhang (2005)). For instance, Liu et al. (2008) and Zhao et al. (2012) characterised rice infected with Helminthosporium Oryzae using hyperspectral ground-measurements taken at a laboratory and field-canopy level, respectively. They found differences in the near-infrared spectrum, specifically at the ranges of 740 nm to 790 nm and 1550 nm to 1750 nm, comparing healthy and diseased rice (figure 1.2). Zheng et al. (2018) proposed a new index vegetation index, red-edge disease stress index (REDSI), for detecting yellow wheat rust at canopy and plant level using Sentinel-2. Zhihao Qin et al. (2003) identified rice incidence on Rhizoctonia Solani combining four broad bands that were captured by airborne. Thus, they pointed out that the disease incidence correlated to indexes calculated from blue, red, red-edge, and NIR band values. Figure 1.2: Rice-canopy spectral profiles for four severity levels of Helminthosporium Oryzae (modified after Zhao et al. (2012)). The practical implementation of remote sensing has been hampered by differentiating the disease from other stress factors (e.g., lack of nutrients or drought) which can trigger similar changes in reflectance (Huang et al. (2012); Liu et al. (2019)). Recent studies have blended CHAPTER 1. GENERAL INTRODUCTION 24 hyperspectral and multi-temporal satellite data. Shi et al. (2018) mapped rice damage from diseases using two different acquisition dates from the high spatial resolution PlanetScope. Huang et al. (2012) firstly characterised yellow rust on wheat using hyperspectral ground- based measurements that then was used as a reference for satellite implementation. Although PlanetScope offers useful information at high spectral and temporal resolutions, their images ac- quisition prices constrain regional application. These studies show the potential use of spectral reflectance measurements in quantifying the incidence or severity of rice diseases. Chapter 2 Rice Growth Phases Detection In this section, a novel approach for identifying rice growth phases using optical satellite im- ages and supervised classification approaches is presented. Sentinel-2 and Landsat data were downloaded and processed for two localities in Colombia. Three state-of-the-art machine learn- ing supervised classification algorithms (random forest, support vector machine, and gradient boosting trees) were trained, validated, and tested using ground-based data shared by The Colombian National Rice Growers Federation (Fedearroz). The analysis showed: 1) clear re- lationship in various rice growth phases with multi-temporal NDVI data; 2) feasible to obtain a considerable number of optical satellite data by blending Landsat and Sentinel-2 missions; 3) training supervised classification approaches allowed applying the methodology in another locality. 25 CHAPTER 2. RICE GROWTH PHASES DETECTION 26 2.1 Materials and methods 2.1.1 Study areas Two sites in Colombia were chosen to train and test the spatial phenological detection approach using optical images. Saldan˜a and Purificacio´n are two Tolima municipalities that cover nearly 21,400 hectares from 4°N and 75°6.01'W to 3°48.6'N and 74°54.475'W. The zones are in the central part of Colombia. The municipalities temperature ranges from 27°C to 28.5°C over the year. Precipitation has a bimodal distribution spanning from March to May and from September to November. The climatic conditions are suitable for rice production, which has influenced the local economic activity (Delerce et al. (2016)). Thus, the agricultural sector is responsible for almost 77% of the municipalities’ Gross Value Added (Fedearroz (2010)). The rice annual planting area is nearly 15,390 hectares in both localities. Hereafter the name Saldan˜a is used to refer to both municipalities. The second site is in the Cesar department. The Cesars climate is classified as tropical wet and dry. The wet season begins in March, and dry season starts in December, with an average annual temperature that ranges from 27°C to 34°C. This region has specific crop management differences compared with Saldan˜a. The most notable is the farming system. While lowland irrigated rice is the only system implemented in Saldan˜a, in Cesar the 5% of the system is rainfed rice (DANE (2017)), which means that irrigation only depends on the rainfall duration. 2.1.2 Ground data The Colombian National Rice Growers Federation (Fedearroz) collected the ground data as part of a separate initiative. Twenty rice fields plantations were monitored in the second semester of 2015 (figure 2.1). Rice fields were visited on five dates from November 2015 to January 2016. Rice growth stages, field geospatial location, plant height, soil water content, emergence date, field extension, and an overall appreciation over rice cycle conditions were variables included in the survey. A second campaign was undertaken in 2018 in the Cesar department. Nineteen rice fields were surveyed (figure 2.2). The new data allowed testing models on a new geographic area with slightly different climatic conditions. In addition to the parameters measured above, any external incidence, such as drought or agronomic practices, was also included. CHAPTER 2. RICE GROWTH PHASES DETECTION 27 Figure 2.1: Location map of the rice fields used for training and validationin the Saldaa region. Figure 2.2: Location map of the rice fields used for testing in the Cesar region. CHAPTER 2. RICE GROWTH PHASES DETECTION 28 2.1.3 Satellite data Sentinel-2 The Sentinel-2 mission is part of the Global Measurement for Earth and Security (GMES) pro- gram, which is responsible for delivering data products for environmental and security services (Drusch et al. (2012)). Sentinel-2 reflectance measurements provide essential data for stud- ies of land management, agriculture, and disaster monitoring (Aschbacher and Milagro-Pe´rez (2012)). Sentinel-2 has two identical satellites with revisit cycle of 10 days. Thus, the mission captures information globally every five days. Each satellite has a Multispectral Instrument (MSI), which is used to monitor the earth surface in 13 spectral bands (table 2.1). The instru- ment has a spectral range from the visible to the short wave infrared (Drusch et al. (2012)). The spatial resolution varies regarding the wavelength, thus visible and NIR are available at 10 m, while red-edge and SWIR are recorded at 20 m. All data were downloaded directly from the Sentinels Scientific Data Hub (https://scihub.copernicus.eu/ ). Landsat-7 and Landsat-8 The first Landsat mission was launched in 1972. Since then, six satellites were put into orbit to monitor the global earth surface. This program has been led by the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS). The Enhanced Thematic Mapper Plus (Landsat-7 ETM+) and the Operation Land Imager (Landsat-8 OLI) are two satellites which are still capturing data every 16 days. Both can scan surface reflectance and land surface temperature as a result of measuring the visible, near infrared, and short wave infrared portions of the spectrum (Claverie et al. (2018))(table 2.1). USGS processes Landsat- 8 and Landsat-7 surface reflectance products that are then distributed in the EarthExplorer repository (https://earthexplorer.usgs.gov). The available data has the highest-level quality and is achieved through two processing softwares, Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) and the Landsat Surface Reflectance Code (LaSRC), designed for Landsat-7 and Landsat-8 respectively. Although both products are processed using different data sources, their inputs are water vapor, ozone, geopotential height, aerosol optical thickness, and digital elevation (USGS (2018a,b)). Unfortunately, the Landsat-7 scan line corrector (SLC) failed in 2003. The SLC controls the forward motion of the satellite. Therefore, close to 22% CHAPTER 2. RICE GROWTH PHASES DETECTION 29 of the pixels of the data are missing since then (Chen et al. (2011)). Table 2.1: Corresponding central wavelength and spatial resolution available at each band for each instrument. Band Central wavelength (µm) Spatial Resolution (m) S2A S2B Landsat-7 Landsat-8 Sentinel Landsat coastal 0.44 0.44 0.44 60 30 blue 0.49 0.49 0.49 0.48 10 30 green 0.56 0.56 0.56 0.56 10 30 red 0.66 0.66 0.66 0.65 10 30 red edge 1 0.7 0.7 20 30 red edge 2 0.74 0.74 20 30 red edge 3 0.78 0.78 20 30 near in- frared (NIR) 0.83 0.83 0.84 0.86 10 30 narrow nir 0.86 0.86 20 30 water vapor 0.95 0.94 60 30 cirrus 1.37 1.38 1.37 60 30 short in- frared 1 (swir1) 1.61 1.61 1.65 1.61 20 30 short in- frared 2 (swir2) 2.2 2.19 2.22 2.2 20 30 Thermal Infrared (TIRS) 1 11.34 10.9 60 - 100 Thermal Infrared (TIRS) 2 12 100 2.1.4 Satellite images processing Surface reflectance Atmospheric variability (e.g. from clouds, water vapor, dust, etc.) must be corrected in order to recover the Bottom of Atmosphere Reflectance (Chen and Cheng (2012); Tian et al. (2018)). In Sentinel-2, such process is carried through the state version of sen2cor. The module uses two CHAPTER 2. RICE GROWTH PHASES DETECTION 30 external sources, radiative transfer tables and a digital elevation model (DEM). The tables con- tain absorbance as a function of the frequency, pressure, temperature, and water vapor (Buehler et al. (2011)). The atmospherically corrected product also contains a Scene Classification Layer, providing a broad classification of the ground cover at a 400 m2 resolution. Sentinel-2 and Landsat data work with different spatial resolution levels. An image resam- pling was applied to compare the reflectance of data sources. Considering that 10 meters are the highest spatial resolution available for the data, this was selected as a reference for all products. There are several techniques developed for image resampling purposes. This study used two: bilinear spatial interpolation and nearest neighborhood. The first algorithm takes the nearest cells values and applies them an average. Hence, the new value will be the mean of their neighborhoods. This model was applied on images with continuous numeric data, and for categorical layers (e.g., the scene classification layer) were processed using the nearest neigh- borhood method. This model, as its name refers, assigns the pixel of interest value equal to its most adjacent pixels. Landsat geometric registration Studies have revealed the existence of misregistration between Landsat and Sentinel-2 products. Some have reported a shift of up to 1.6 pixels at 30 meters resolution. Thus it is advisable to apply a co-registration method to the images (Skakun et al. (2017)). For this reason, many methodologies have been developed for geographic correction. This study uses phase correlation, which was reported in Skakun et al. (2017). The phase correlation algorithm is based on the cross-correlation algorithm that transforms both images into a Fourier space. The peak of the cross-correlation represents the shift that must be applied to the register image (Guizar-Sicairos et al. (2008)). One Sentinel-2 image and one Landsat-8 were chosen considering two conditions: 1) the acquisition dates are the same for both images, and 2) the cloud cover over the region is close to zero. Both images were subset for the corresponding area of interest (i.e., Saldan˜a extension). The final registered imagery had a 10 m spatial resolution. CHAPTER 2. RICE GROWTH PHASES DETECTION 31 Figure 2.3: Overview of the methodology for processing Sentinel-2 and Landsat images. 2.1.5 Vegetation index time series processing The multitemporal stacked layers were used as input for the vegetation time series creation. The red and NIR bands were extracted from each adquisition image; these bands were used to compute the NDVI layer. Each pixel had information about the vegetation index throughout time. However, values from a multi-temporal sequence can be affected by the presence of noise (e.g., atmospheric, surface reflectance methods, shadows). Therefore, a filtering technique is necessary for handling these errors. The Savitzky-Golay algorithm was chosen, which uses a moving window to smooth values that span into it (Savitzky and Golay (1964); Cao et al. (2018)). The smooth approach has two parameters: the window size and the polynomial fitting degree. To summarize the NDVI time series, five features were calculated. The rice growth phenological phase detection problem has been addressed through charac- terising in which dates the NDVI time profile has inflection points, these changes may describe the shift into a new growth phase (Shihua et al. (2014)). Thus, the inflection points are obtained from derivatives methods. In this study, two derivatives values were computed at beginning and ending of the smoothed time series. Additionally, five features were obtained aimed at summarizing the NDVI time series. CHAPTER 2. RICE GROWTH PHASES DETECTION 32 The kernel regression method was applied in order to estimate new values across the time series. The simplest kernel weights average estimation is the Nadaraya Watson form (Hastie et al. (2009)): f (̂x0 ) = ∑n i=1Kλ(xi, x0)y0∑n i=1Kλ(xi, x0) (2.1) Where, Kλ is a gaussian kernel function for λ which controls the variance of the gaussian density. This regression was performed for every pixel in the multi-temporal stack. The result- ing NDVI time series is composed of seven points that are separated by 16 days. Finally, 14 features were created for classification (table 2.2) Table 2.2: List of features used in the models. Feature Name Meaning NDVI 1 NDVI value for 96 days before the date of interest. NDVI 2 NDVI value for 80 days before the date of interest. NDVI 3 NDVI value for 64 days before the date of interest. NDVI 4 NDVI value for 48 days before the date of interest. NDVI 5 NDVI value for 32 days before the date of interest. NDVI 6 NDVI value for 16 days before the date of interest. NDVI 7 NDVI value for 0 days before the date of interest. NDVI sd ts Standard Deviation of the NDVI time series. NDVI max ts Minimum NDVI value. NDVI min ts Maximum NDVI value. maximum NDVI day day in which the NDVI time series reached a maxi- mum. minimum NDVI day day in which the NDVI time series reached a mini- mum. first derivative tsending value on the date of interest for a first derivative of the NDVI time series. first derivative tsstarting value on 80 days prior the date of interest for a first derivative of the NDVI time series. 2.1.6 Supervised classification algorithms Herein, three classification approaches were used and compared in this study: support vec- tor machines, random forest, and gradient boosting machine. The implementation of each algorithm was done through packages that were developed for the software R. Support vector machine, and random forest modes have been widely used in remote sensing studies (Mountrakis et al. (2011); Belgiu and Drgu (2016)). Recently, gradient boosting machine has gained noto- rious popularity because of its superior performances compared to more traditional machine CHAPTER 2. RICE GROWTH PHASES DETECTION 33 learning techniques (Georganos et al. (2018)). Support Vector Machines The support vector machines (SVM) is based on the linear model classification, which uses a hyperplane for class separation. A linear function is suitable for the cases where the data is linearly separable. The optimal hyperplane is one that maximizes the margin width between the nearest classes (figure 2.4). This margin depends on one vector (β) which is perpendicular to the linear function. Therefore, the unknown point classification depends on its location, considering the hyperplane. Thus, to maximize the margin, the algorithm finds the minimum distance from the origin vector (β) Min 1 2 ‖β‖ subject to yi(xiT. β + β0) ≥ 1, i = 1, 2, ...,N (2.2) Figure 2.4: Example of the separable case for two classes (blue and red dots). The solid black line represents the decision boundary. Although the hyperplane was conceived to separate the classes as best as possible, there are cases where categories are overlapping in the feature spaces. Thus, this scenario is tackled by allowing errors when data are separated (Cortes and Vapnik (1995)). The errors can be expressed as εi which are greater or equal to 0. The total of errors is also constrained by a constant (C), aimed to regularise the function ∑N i εi ≤ C. Thus, large C values allow the εi increment and, as a consequence, the model will tend to overfit, while with small values, the CHAPTER 2. RICE GROWTH PHASES DETECTION 34 model will be restricted (Hastie et al. (2009)). Adding cost values to all points that violated the constraints, the optimization problem is defined as: Min 1 2 β2 + C N∑ i εi subject to εi ≥ 0, yi(xiT . β + β0) ≥ 1 − εi, i = 1, 2, ..., N(2.3) Linear separation models work accurately under scenarios where a linear function can sep- arate the data, but for data with a more complex distribution, it is necessary to add new non-linear features. However, the transformation and selection of new features require con- siderable computational resources that make it challenging to find new solutions (Muller and Guido (2017)). Cortes and Vapnik (1995), found a way to deal with this problem, they demon- strated that it is not necessary to transform the new feature. It is only required to obtain the inner products (e.g., dot product) from a small amount of training data. After that, a non-linear transformation is applied to the vectors. The training points are known as support vectors. The inner product in the new space is found through a kernel function. Two of the more commonly used functions, in support vector machines, are polynomial kernel and radial basis function. Polynomial kernel basis : K (xi, x) = ( xi Tx + coef )d (2.4) Radial basis function : K (xi, x) = exp− γ || xi − x ||2 (2.5) SVM is sensitive to feature magnitudes, so all features must have the same scale. Although NDVI data ranges from 0 to 1, the other features values oscillate on another scale. For this reason, the min-max normalisation method was used to transform all data on the same scale [0,1]. Random Forest Random Forest is an ensemble model of a finite number of decision trees. The tree models recursively split the data into subsets through a serial of rules. However, when a tree has many splits, it can overfit the data. On the one hand, this model has a straightforward interpretation of the results. On the other hand, the model is unstable, and the final pruning will depend on CHAPTER 2. RICE GROWTH PHASES DETECTION 35 the primary partition. Breiman (2001) showed significant improvements in reducing overfitting and variance scenario on tree based approaches, through training many trees with random data; thus, each tree casts a class, and the final decision depends on the number of votes. The new training sets are drawn from randomising the original data with replacements (the same point can be picked more than one time). This method is known as bootstrap (Breiman (2001); Muller and Guido (2017)). Using bootstrap, we can obtain multiple datasets that have the same length than the original dataset. The bootstrap aggregation or bagging is the results of averaging the models’ predictions fitted with the bootstrap samples. This technique reduces high-variances and low-biased obtained from approaches such as trees (Hastie et al. (2009)). Additionally, to ensure that each tree grows independently, a random subset of the features is used in each node. Consequently, the trees can predict based on different features combinations. Therefore, the grown trees are not pruned. Two parameters are needed to construct a random forest model: the number of trees and the maximum number of features used at each node. Gradient Boosting Trees Gradient boosting is another ensemble model based on trees approaches. However, instead of using bootstrap as a sampling method, it uses the boosting method. Contrasting with random forest, with which predictions are derived from multiple bootstrap samples, gradient boosting machine is a way of combining the performance of many weak classifiers to improve the general performance additively (Friedman et al. (2000); Muller and Guido (2017)). Thus, for the tree based approach, the trees grow in an adaptative way to reduce bias (Hastie et al. (2009)). In general, the learning model aimed to train a function F (x) using a training sample ({xi, yi}N1 ) that minimises the expected value (E) of some specified loss function Friedman (2001) is: ϕ(F (x)) = arg min F Ex[EyL(y, F (x)))|x] (2.6) In order to find a solution for ϕ(F (x)), Friedman (2001) proposed an alternative where the model is trained in an additive way. Thus, the solution is obtained from the previous approximation, t−1. This strategy is known as stagewise-greedy or gradient boosting procedure. CHAPTER 2. RICE GROWTH PHASES DETECTION 36 F (x)t = F (x)t−1 + αh(x)t (2.7) ϕ(F (x))= arg min F Ex[EyL(y, F (x)t−1 + αh(x)t)|x], (2.8) where α is a step size and the function h(x) is a base predictor (or also known as a weak learner) that belongs to the family functions F , which usually are classification trees. One inconvenience in this model is the tendency to overfit the data. This problem has been tackled with multiple algorithms. Among the gradient boosting machine implementations, XGBoost has proved to have good performances in many studies. The method implements stronger regularization parameters that constrain the model (Georganos et al. (2018)). For the purpose of the study, five parameters were modified. The learning rate (eta), col sample (percentage of columns using in each tree), gamma (the loss reduction), max depth (maximum depth of the tree), and sub sample (ratio of the training set) (Chen and Guestrin (2016)). 2.1.7 Evaluation metrics and scoring The performance of each supervised classification approaches was assessed through metrics that compare predictions to real values. The confusion matrix method is commonly used for binary classification (table 2.3). Although the confusion matrix is an efficient tool to visualise performances, the matrix by itself lacks a metric to evaluate the classification. However, some techniques allow summarising the confusion matrix. Precision, recall, and f1-score are evalua- tion metrics that are commonly used not only for binary classification but also for multi-class problems (Sokolova and Lapalme (2009)). Table 2.3: Confusion matrix for binary classification. Real Positive Real Negative Predicted Positive true positive (TP) false positive (FP) Predicted Negative false negative (FN) true negative (TN) Precision is the proportion of samples classified as positive by the model, which are truly positive. Recall measures the ratio of the amount of correct positive classifications to all possible positive values in the data (Lantz (2013); Muller and Guido (2017)). F1-score is the harmonic mean of precision and recall. Micro averaging, macro averaging, and per-instance averaging CHAPTER 2. RICE GROWTH PHASES DETECTION 37 are variations of the three mentioned metrics that are applied to multi-class problems (Lipton et al. (2014)). Micro is the sum of all true positive (TP), false positive (FP), true negative (TN), and false-negative (FN) quantities for all classes. Whereas, macro averaging calculates the parameters per class, and then averages the results. Finally, per-instance averaging assigns weights to each class. Macro averaging is used when the methodology requires to treat all categories as equal, yet in cases when bigger classes must be favored, the micro averaging is computed (Sokolova and Lapalme (2009)). In this study, the macro averaging metric is applied to gauge the model performances. Precision macro = C∑ i=1 TP i TP i + TP i (2.9) Recall macro = C∑ i=1 TP i TP i + FN i (2.10) f1 Score macro = 2 Precision ∗ Recall (Precision+Recall) ∗ C (2.11) Training machine learning approaches involves tuning several parameters that may impact the final robustness of the model (Verrelst et al. (2012)). A cross-validation method was com- puted to select the best tuning parameters values. Once the model classified the validation data, the f-1 score macro metric was computed to measure the model performance. Finally, the parameters configuration for which f1-score was the highest was selected. Each supervised classification approach varies in the number of parameters to optimize. For instance, ran- dom forest only depends on the trees numbers and features used as input at each node. On the other hand, SVM polynomial kernel has four parameters to modify: regularization (cost), polynomial-degree, γ, and initial coefficient (coef0). CHAPTER 2. RICE GROWTH PHASES DETECTION 38 2.2 Results 2.2.1 Optical images pre-processing Twelve Landsat and four Sentinel-2 images were downloaded (table 2.4). The MSI images were atmospherically corrected using the Sen2Cor module, while for Landsat, the surface reflectance product was directly obtained from USGS. All images were projected to Universal Trans Merca- tor projection (UTM Zone 18N) and then clipped using a mask that encompassed the Saldan˜a region. A noise percentage index was calculated using the scene layer classification to indicate the percentage of clouds and shadows over the interested region. Although for Landsat-7 im- ages, this index shows relatively low values, it must be considered the error caused by the SLC control, which adds another 22% of unusable pixels in the image (figure 2.5) Table 2.4: List of Sentinel-2 and Landsat images used in Saldan˜a Mission Date Cloud Percentage Landsat-7 07-07-2015 21.68 Landsat-7 23-07-2015 6.5 Landsat-8 31-07-2015 12.27 Landsat-8 01-09-2015 5.75 Landsat-7 09-09-2015 10.12 Landsat-7 25-09-2015 3.59 Landsat-8 19-10-2015 4.18 Sentinel-2 22-10-2015 12.03 Landsat-8 22-11-2015 33.24 Landsat-8 06-12-2015 50.77 Sentinel-2 11-12-2015 3.08 Sentinel-2 21-12-2015 2.21 Landsat-7 30-12-2015 27.24 Sentinel-2 10-01-2016 15.27 Landsat-7 15-01-2016 17.29 Landsat-8 23-01-2016 31.97 The geometric registration was achieved using one Landsat-8 image that was taken on 22 of December, and one Sentinel-2 product captured on 21 of December. The NIR reflectance band from both images was used because this band is less affected by atmospheric effects (Skakun et al. (2017)). The OLI band image has a spatial resolution of 30-meter; thus, a bilinear interpolation was applied to obtain a 10-meter. Once both layers shared similar spatial resolution and extension. To calculate the shift, the phase correlation method was used and implemented with the Scikit-learn python package (figure 2.6). As a result, the shift in longitude CHAPTER 2. RICE GROWTH PHASES DETECTION 39 Figure 2.5: Location of the study area with Landsat-7 true-color image (blue, green and red bands) acquired on 7 July 2015. The black lines are regions in which there is no data. and latitude between both images were 10 and 40 meters, respectively. Due to ETM+ and OLI products share the same registration grid, the same shift is thus applied for both. Finally, the NDVI layer was calculated for each image. Figure 2.6: Geometric registration results. The left and central images are the NIR reflectance band at 10 m measured by Landsat-8 and Sentinel-2, respectively. The right image is the cross-correlation matrix. 2.2.2 NDVI time profiles The five ground survey days were chosen as a reference to extract the NDVI time profile of each field. Each rice field was drawn as a polygon and used to extract the NDVI information per date. the NDVI time profiles were extracted from 130 days prior to the date in which the survey CHAPTER 2. RICE GROWTH PHASES DETECTION 40 to the survey date. The NDVI profiles was thereafter assigned to each rice field and labeled with the growth-phase stage that was registered by the technician (figure 2.7). Although the rice fields were monitored using growth-stages notation, this phenological characterisation may not be adequately identified by the satellite approach used in this study (Wang et al. (2014)). Therefore, the growth stages were grouped into three major phenological phases: vegetative, reproductive, and ripening. Figure 2.7: Boxplots of the captured NDVI reflectance distributions in each rice fields pixel in Saldan˜a at each date of interest. The lines represent the NDVI trend along the time. The colors refer to each rice growth phase. Two additional classes, soil and other, were added to the analysis. The first characterises the rice fields cover during two scenarios: 1) one month after being harvested, and 2) one month before the sowing date. The second category aimed at representing pixels that do not have enough information to construct a proper NDVI time series and to encompass values with an anomalous tendency. Two additional steps were applied to the NDVI time profile: smoothing and fitting. 2.2.3 Training features The Savitzky Golay smoothing method (Savitzky and Golay (1964)) was used to smooth the NDVI time profiles (figure 2.8). The method depends on two parameters: window size and CHAPTER 2. RICE GROWTH PHASES DETECTION 41 polynomials degree. Both parameters were modified according to the number of time points available per pixel. The kernel regression was computed on the NDVI data available in each ranged time. The cross-correlation method was applied to find the bandwidth parameter; the regression was calculated with 70% of the data; the remaining points were used to validate each estimation. Seven different values of the NDVI that describe the pixels multi-temporal reflectance were obtained. Figure 2.8 shows the three different NDVI profiles obtained from the smoothing and fitting steps applied to one pixel data. Figure 2.8: single pixel NDVI time series comparison between initial time series (red), after smoothing (green), and after regression (blue). 2.2.4 Training and validation sets Although all rice fields were monitored during one crop season, not all of them were sowed at the same time. Thus, some of them started to be monitored at the end of their crop cycle (e.g., rice field 52B002, figure 2.9). This condition affected the number of observations per growth phase. Other limitations were: the number of cloud-free images during the evaluation time, and the rice fields extension. The data were randomly divided into six subsets/folds. The number of groups was chosen considering the total amount of rice fields per stage. Data were split based on the number of rice fields per growth phase, instead of the number of pixels per class. Thus, the pixels within each rice field may share similar NDVI features. The information per fold at each growth-phase was split into two sets: 70% for training, and 30% for validation. Table 2.5 summarizes the number of pixels from each group in each label as well as the pixel amount destinated for validation and training. CHAPTER 2. RICE GROWTH PHASES DETECTION 42 Figure 2.9: Rice growth phase that was registered in each visit date for each field. The colors represent the growth phases. Table 2.5: Model inputs per fold and growth phase. Set Folds Growth Phase vegetative reproductive ripening harvested soil other Training 1 3642 3787 2339 2155 2425 2170 2 3310 3760 2800 2194 2827 2170 3 3175 3457 2954 2530 2505 2170 4 3399 3937 2655 2294 2521 2170 5 3730 3686 2617 2230 2060 2170 6 3656 4038 2684 2155 2661 2170 Validation 1 753 1419 1278 814 877 930 2 688 1446 817 775 475 930 3 870 1749 663 439 797 930 4 646 1269 962 675 781 930 5 665 1520 1000 739 1242 930 6 739 1168 933 814 641 930 2.2.5 Parameters grid search The random forest model was computed on 108 different parameters combinations when the number of features sampled at each split, was four (figure 2.10). The increment in the number of trees had no significant relevance in the f1-score. From the SVM radial basis kernel, the best results were obtained using a γ value equal to 0.0001 and a regularization parameter equal to 3600 (figure 2.10). SVM polynomial basis kernel and XGBoost have more than two parameters to optimise; hence, to gauge the influence that each parameter exerts on the model, the f1-scores results were grouped for each parameter, and then were averaged. CHAPTER 2. RICE GROWTH PHASES DETECTION 43 Figure 2.10: Machine learning models results for different parameters combinations. Left: F1- score values obtained from Random Forest at a different number of trees and number of features. Right: F1-score obtained from SVM radial function that was trained with several combinations of γ and C values. The grey color represents combinations that are below 0.9. Figure 2.11: Results of the parameter grid search for SVM polynomial basis kernel. (a) The f1-scores for each of four parameters; (b) The f1-scores for cost and coef0 for and degree constants. The SVM polynomial kernel showed scores over 0.915 when γ parameter was equal to 0.05. The second-degree polynomial had better results than three-grade (figure 2.11-a). For the cost and coef0 parameters, there was not a long difference in the f-1score; these values were choosen CHAPTER 2. RICE GROWTH PHASES DETECTION 44 using a second plot (figure 2.11-b) where polynomial-degree and γ parameters equal to 2 and 0.05, respectively. Thus, the grid search revealed that the parameters for SVM polynomial kernel basis were: γ = 0.05, polynomial-degree = 2, cost = 2, and coef0 = 4. The XGBoost results showed that the score reached a peak for a maximum depth greater than 8 (figure 2.12-a). This value was chosen, considering that at increasing the max depth may cause a model overfitting. For learning rate (eta) and subsample, the best results were obtained at 0.01 and 0.7, respectively. Similarly to SVM polynomial, a second plot was created to choose the best results for colsample and minimum split loss (λ) (figure 2.12-b). The highest f-1 score was found for max depth = 8, eta = 0.01, sub sample = 0.7, colsample = 0.7, and minimum split loss= 4. Figure 2.12: Results of the parameter grid search for XGBboost approach. (a) The f1-scores for each of five parameters; (b) The f1-scores for cost and coef0 keeping subsample, eta, and maxdepthasconstants. After parameter grid search, the final classification models obtained 0.94, 0.943, 0.946, CHAPTER 2. RICE GROWTH PHASES DETECTION 45 and 0.966 in f1-score for the random forest, SVM radial kernel, SVM polynomial kernel, and XGBoost, respectively. Figure 2.13 shows the performances achieved per class in which the best category performance was achieved for vegetative, followed by reproductive. The “other” class had the lowest scores. Figure 2.13: f1-score per growth phase for each classification model approach. 2.2.6 Machine learning models testing in Cesar The Cesar dataset was used to test the machine learning approaches. Fedearroz technicians monitored 19 rice field from March to September of 2018, but four rice fields were affected by high temperatures that caused total production loss in two fields and partial loss in the remaining two. The same pre-processing procedure used for Saldan˜a was applied for Cesar's satellite images (figure 2.3). A 10 meters shift across longitude and latitude was applied in geometric registration. A total of 29 Sentinel-2 and 6 Landsat-8 images were used. The images acquisition dates ranged from January to September 2018 (figure 2.14). The number of cloud- free images, from April to May is low, due to the start of precipitation season. Figure 2.14: Satellite data availability for the Cesar region. The NDVI time series were extracted using the spatial points for each rice field. The CHAPTER 2. RICE GROWTH PHASES DETECTION 46 series were preprocessed and finally classified using the trained models. The rice growth phase detection approach was applied at different times, aimed at characterising the whole rice cycle. The dates of interests, in which the model was applied, were those acquisition dates where the image cloud percentage was close to zero. In order to compare the model classification results with ground based data, the Fedearroz growth stage notation was transformed into growth phase notation. Therefore, vegetative phase was assigned for the time between emergence and maximum tillering. Reproductive phase was referred to as the interval from flowering initiation to 100% flowering. The interval from ten days previous harvest date until harvest day was assigned as ripening phase. Figure 2.15 shows the classification results obtained from applying the XGBoost model on two different rice fields (DIAM25 and TRANQ11). The NDVI time profiles from nine different dates were classified. The model classified 75% of the rice field pixels as a vegetative phase on 24 April; the remaining pixels were classified as soil. Technicians reported that the emergence and maximum tillering dates were on 6th April and 14 May, which means that in April the rice field was in the vegetative phase. During June 2018, mode classified the rice field in reproductive stage, while ground-based registers noted that the 100% flowering occurred on 10 June. Finally, according to technicians, the rice field was harvested on 23 July, whereas the model estimate that the field was already harvested on 7th August. The second rice field (TRANQ11) was classified at ten different times. The model found that on 29 May, close to one third parts of the field was at bare soil, and the remaining field was at vegetative phase. After ten days, half of the field reached the vegetative phase. However, the ground data reported that the plant emerged on 26 April and reached maximum tillering on 5th June; thus, model was not able to wholly classify the entire field in this stage. Likewise, the rice field was in reproductive at the beginning of July, but most of the pixels were classified as a vegetative phase. Yet, this rice field suffered a harvest delay due to drought. The rice field growth phase class at a given date was chosen considering the percentage of pixels. Thus, the category with a cumulative pixel classification higher than 50% was assigned to each field. Some fields were harvested in October (e.g., SNIC4), yet there were not cloud- free images for this time; for that reason, the classification approach was calculated until 31st September. CHAPTER 2. RICE GROWTH PHASES DETECTION 47 Figure 2.15: Rice growth phase detection results for two fields in Cesar at different times. The stacked bars compare the cumulative classification obtained per pixel. These bars are obtained at each date in which the model was computed. The ground observation data is exhibited below the bars. The lines are the technician’s registers, and the contour represents the rice field growth phase. The colors are the different classes. To assess the models’ performances, the number of cases in which the classification ac- curately classified the phase period was estimated (figure 2.16). Lastly, the number of true positive classifications were divided by the total of observations (Sakamoto et al. (2005)) (table 2.6). CHAPTER 2. RICE GROWTH PHASES DETECTION 48 Figure 2.16: Comparison between the ground observation and the estimates growth phases per field. The points and shapes refer to the results obtained from the models at each date. Table 2.6: Percentage of cases that the model classifications are within the growth-phase period. Growth Phase Supervised Classification Models SVM radial (%) SVM polyno- mial (%) random forest (%) XGBoost vegetative 77.3 73.9 64 72 reproductive 64.5 65.6 74.2 80.6 ripening 55.6 58.8 61.1 68.4 harvested 89.5 89.5 100 100 CHAPTER 2. RICE GROWTH PHASES DETECTION 49 2.3 Discussion In this study, a feasibility method for detecting rice stages at 10 meters by blending different optical satellite missions (Landsat-7, Landsat-8, and Sentinel-2) was shown. Although similar studies, Sakamoto et al. (2005) reported classification performances above of 0.8 (using accuracy metric), these results were achieved though a coarser resolution (250 m). The rice phenolog- ical identification using low resolution is suitable for areas where the rice is planted on large extensions, but considering Colombia'conditions, where close to 70% of the rice production is taking place on fields in which area extension is less than 10 hectares (Fedearroz (2017)). The low spatial resolution approaches may not be the best solution. The methodology developed in this chapter is highly dependent on cloud conditions. How- ever, recent studies have combined radar and optical derived features in order to map rice fields (Onojeghuo et al. (2018)).These methodologies have proved to improve the rice phenological detection, especially for the early stage. For example, Tian et al. (2018) found a high correlation between rice backscattering and the field when this is inundated by water. The classification process used VH and VV polarisations data to create RADAR features. The Colombia Government and Fedearroz have been delivering bi-semester information re- lated to rice harvested area. This information has been useful for planning decision making. This study aimed to complement national surveys, offering spatial data more detail and addi- tional crop characteristic. Thus, the model could estimate which is the rice phenological area for a given time at a regional scale. For example, for Saldan˜a, the extension area for each growth phase detection on 21st December of 2015 was mapped (figure2.17). This exercise was done using a rice map and the XGBoost model. Among the state-of-the-art machine learning algorithms, the XGboost model has been re- ported as a powerful technique for classification tasks. This study was able to validate its poten- tial. However, SVM showed slightly better performances classifying vegetative phase. Previous studies have reported differences among tree-based and SVM approaches (Feyisa et al. (2016)). They referred to classifiers’ robustness as the main factor for efficiency in both approaches. From the tuning parameters section, it was possible to note a high range of performances. Using NDVI time series, optical data from two satellites were merged. This approach allows estimating the growth phases at any time. In this sense, the rice fields cycle was progressively CHAPTER 2. RICE GROWTH PHASES DETECTION 50 Figure 2.17: Map of rice growth estimation in Saldan˜a region monitored. This advantage may offer a tool to warn those fields in which the classification shows an irregular pattern. For example, one of the fields located in Cesar (figure 2.18), that lost its rice production by drought, had normal phenological evolution until 12th July of 2018, after that the technicians stopped monitoring labor due to the plant damage. Similarly, the model classification estimated different classes within the field after the mentioned date. CHAPTER 2. RICE GROWTH PHASES DETECTION 51 Figure 2.18: Multi-temporal characterisation for a field affected by drought. The red line represents the date at which the technicians stopped the field monitoring. Chapter 3 Rice Disease Detection This section aims to present a methodology that uses Sentinel-2 data for detecting diseases areas within productive rice fields. The processes are described as follows: 1) To detect rice canopy reflectance changes during two growth stages. 2) To group pixels that share similar reflectance characteristics. 3) To compare the reflectance profiles with “healthy” and “unhealthy” rice canopy profiles reported in previous studies. The study uses ground-based information from the northern region of Tolima. 52 CHAPTER 3. RICE DISEASE DETECTION 53 3.1 Materials and methods 3.1.1 Study area Tolima has the second largest area sowed with rice in Colombia. The department has tradi- tionally led the rice production. The rice crop plays an essential role in the local agricultural sector. Thus, nearly 27% of the Tolima agriculture area is planted with rice (MADR (2014)). Lowland irrigated rice is mainly planted in the flatlands at an altitude that ranges from 300 to 1200 meters (Delerce et al. (2016)). Tolima precipitation is distributed in two rainy seasons, the first occurs from March to May, and the second one is from September to November. The annual rainfall is 1450 mm, while the mean temperature varies from 23 to 27°C (Sarmiento (2018)). The department is split into four major regions in terms of rice production. Those regions are characterised by different geographical, soil, and agronomic conditions. One of these is the northern region that comprises the municipalities of Lerida, Ambalema, and Armero. The region annually produces 111,000 tons of rice, which is close to 15% of total department pro- duction (MADR (2016)). The municipalities mainly plant irrigated rice in flatlands in which altitude ranges from 200 m to 400 m. 3.1.2 Data sources Fedearroz implemented the phytosanitary brigade program in 2012. This initiative aims to monitor disease distribution on rice fields across Colombia. Such data comes from registers taken by Fedearroz technicians, who annually visited the major Colombian rice production regions. The information has been used for understanding the disease impact in rice production (Fedearroz (2016)). Disease monitoring was carried out in commercial rice fields during the ripening stage, where in each field, the technicians chose at least 25 control points. In each sampling location, the person took a handful of rice stems. The evaluator registered the number of diseased plants. This metric is used to gauge disease incidence. Finally, the disease severity score is evaluated for a single plant. The person scored from 0 to 100, where 0 means no disease affection, and 100 represents total damage over the plant. The breeder registered four features per evaluation CHAPTER 3. RICE DISEASE DETECTION 54 point, disease name, number of stems, incidence, and severity (Fedearroz (2014)). The eval- uated diseases are Rhizoctonia, Piricularia, Gaemannomyces, Helmintosporiosis, Sarocladium. Additionally, the technician registered the rice field geographic position using a GPS. 3.1.3 Satellite data Sentinel-2 data was used to detect the diseased areas within the rice fields. The growth phase identification was tackled using Landsat-8 and Sentinel-2. For more information, please refer to 2.1.3 section. 3.1.4 Rice fields manual digitalization The rice fields geolocation was referenced as a spatial point which would have constrained the study scope to one pixel. Thus, a polygon delimitation method was implemented per each evaluated field. To effectively assert the polygon with the real field extension, a methodology which involves the Google Earth Engine (GEE) platform was used. GEE is a cloud-based platform for global geospatial analysis that brings high-performance computing resources to studies with high impact social issues (Gorelick et al. (2017)). The GEE data catalog offers free information sources like Sentinel and Landsat, among others. The tool access is controlled by an application programming interface (API), this combined with an interactive development environment (IDE), offers a quick visualization of analysis. Each rice field contour was defined using a high-resolution image as a reference. The multi- temporal spectral profile for each polygon was visualized from April to November (according to Fedarroz the rice season was held this period) through the GEE platform (figure 3.1). The polygon was assigned as rice field if its NDVI profile had a typical characteristic pattern for rice canopy) otherwise, a new polygon was created or the rice field was removed from the study. 3.1.5 Spectral reference patterns The ground base data, that was shared by Fedearroz, offers information about rice disease incidence. However, it does not pin down which zones were affected by the pathogen. Lack of knowledge about evaluation points hampers a correct grasp of the relationship between reflectance metrics with damage caused by diseases. In order to detect those reflectance profiles CHAPTER 3. RICE DISEASE DETECTION 55 Figure 3.1: GEE platform. The left panel contains two plots that show the time profiles for NDVI and green, NIR, red bands. The spectral data was extracted from the Sentinel-2 and Landsat missions, these are freely available on the platform for querying. The right panel points out a true color high resolution image, which was used to draw the polygon, represented in yellow color. . linked to unhealthy rice, this study used spectral profiles previously reported for healthy and diseased rice canopies. Although there are no reported spectral studies for rice in Tolima, some studies have suc- cessfully assigned spectral patterns for healthy rice conditions in other countries. Zhao et al. (2012) published a spectral characterisation of rice at a different brown spot (Helminthospo- rium Oryzae) incidence levels. The evaluations were taken on 25 rice fields located in China, in which they characterised the infested rice canopy by using hyperspectral measurements. Regis- ters were thereafter grouped into four categories: D0, healthy canopy; D1, severity index from 0% to 27%; D2, index from 27% to 44%; D3, severity index greater than 44%. The evaluation was carried out during the heading stage. The narrowband measurements were transformed into the Sentinel-2 broadbands. This mission was chosen over Landsat because of its MSI sensor offers information that covers the red-edge spectrum, this spectral region is strongly affected by chlorophyll content (Kobayashi et al. (2001)). The transformation was achieved by integrating the hyperspectral measurements into the Sentinel-2’s relative spectral response domain (Zheng et al. (2018)). Rsb = ∫ λend λstart f (x) dx (3.1) where, Rsb is the simulated reflectance value for the band b, and λ is the wavelength range for the band b. f(x) is the spectral response function of the Sentinel-2 mission. The simulated CHAPTER 3. RICE DISEASE DETECTION 56 Figure 3.2: Spectral response of Sentinel-2 in the visible, near infrared and short infrared range. The spectral signatures that characterise rice canopy affected by several brown spot incidence levels (blue, yellow, green and red refers to D0, D1, D2, D3, and D4, respectively) are also shown. reference was constructed for nine broadbands, swir2 region was removed because of its high susceptibility to water absorption (figure 3.2). 3.1.6 Phenological changes detection Diseases damage may accelerate the plant senescence process (Martinelli et al. (2015)). For example, diseases such as Sarocladium and Helminthosporium can cause panicles rotting and the emergence of spots on the leaves tissues. Early physiological changes can be spotted by a continuous monitoring method. In this sense, the goal is to detect spectral changes across the reproductive phase (from heading to ripening stage). A Bi-stage normalised band metric is calculated to identify different spectral patterns within the field. The metric is computed per pixel and is then clustered. As a consequence, groups with similar spectral changes pat- terns were detected. Finally, the averaged spectral reflectance per group is compared with the simulated reference spectral profile that describes a plant affected by a disease. Normalised band reflectance Rice diseases can affect a plant in different ways regarding the plant growth stage. Thus some symptoms, such as lesions, are not visible until the late stages. The canopy morphological changes can be observed using reflectance when the leaf has reached a specific size (Martinelli et al. (2015)). Likewise, the disease incidence increment can speed up during specific phase stages. For instance, the Rhizoctonia incidence rapidly increases from early heading to grain CHAPTER 3. RICE DISEASE DETECTION 57 filling stages, due to chemicals released by plant cells during reproductive phase (Gnanaman- ickam et al. (2010)). Hence, to correctly characterise the rice reflectance profile for healthy and unhealthy scenarios, the rice fields phenological stage was first detected. The methodology developed in chapter 2 was used to detect the growth phases. Shi et al. (2018) proposed normalised two-stage vegetation indices for characterising rice diseases development. They computed the metric over six different vegetation indices which were selected based on PlanetScope multi-spectral resolution (i.e., blue, red, green, and nir). However, Sentinel-2 mission offers a broader number of spectral bands than PlanetScope. This study used the bi-stage normalised difference (NDRB) metric to characterise the rice phenology changes across the reproductive phase. Thus, the metric was computed over each Sentinel-2 reflectance band, aimed at exploring the whole spectral range, and was fed with the plant reflectance data captured at the beginning and end of the reproductive phase. NDRbi = Rbi heading −Rbi ripening Rbi heading +Rbi ripening (3.2) Clustering An unsupervised classification approach was used to find which pixels shared similar spectral reflectance characteristics. The unsupervised transformation methods are applied to understand data better. Among the algorithms that belong to this approach, clustering methods are frequently used in remote sensing applications, e.g., land-cover determination (Duda and Canty (2002)), crop phenology identification (Tian et al. (2018)), or plant stresses detection (Zheng et al. (2019)). One of the most popular clustering algorithms is k-means. The k-means clustering randomly initialises with a given number of centres for which are assigned points closest to them. The algorithm calculates the means of the data that represents each centre, this mean vector then becomes the new centre for the cluster. These two steps are iteratively repeated to minimise the cluster variance (Gareth James, Daniela Witten and Tibshirani (2006)). One common practice is first to adjust the features to the same scale, before applying a clustering algorithm. This procedure was achieved through the robust-scalar algorithm. The method computes interquartile range and median absolute deviation for each feature. This methodology aims to be less susceptible to extreme values (Muller and Guido CHAPTER 3. RICE DISEASE DETECTION 58 (2017)). The clusters number were decided based on elbow algorithm, which measures the heterogeneity for each cluster, so k-means is calculated at a different number of groups. Hence, heterogeneity tends to decrease as more clusters are created. The final goal is to choose the number of groups before the reduction reached a minimum plateau (Lantz (2013)). Spectral profiles matching The Euclidean distance norm was used to quantify similarity among the spectral profiles. ED = √√√√ N∑ i=1 ‖Rsi −Ri‖2 (3.3) CHAPTER 3. RICE DISEASE DETECTION 59 3.2 Results 3.2.1 Spatial fields preprocessing Through the rice field digitalization was possible to obtain 57 polygons that are located in the northern region of Tolima were visually evaluated (figure 3.3). The remaining points were removed due to inconsistencies such as geographical positions errors, or low quantity of cloud- free images for the evaluation period. Figure 3.3: Location map of the rice fields located in the northern region of Tolima. 3.2.2 Healthy and unhealthy rice canopy spectral reference Figure 1.2 exhibits the simulated multispectral reference for rice canopies that were infested by Brown Spot. Comparing healthy (D0) with unhealthy(D3) profile, the most significant differences are presented in the green, red-edge 3, nir, and swir1 regions. This spectral response is occasioned by leaf tissue damages, which derivates in pigments degradation, intercellular space filled with cellular debris, and water content reduction CHAPTER 3. RICE DISEASE DETECTION 60 Figure 3.4: Simulated spectral reflectance response to different Brown Spot infection levels across the Sentinel-2 spectral bands. 3.2.3 Rice field of interest Rice canopy reflectance is not only influenced by stress factors, but also by agricultural prac- tices such as cultivar, fertilization, irrigation, or by external phenomena such as weather, or soil (Thenkabail et al. (2000); Gnyp et al. (2013); Huang et al. (2012)). Although the knowledge of these factors is essential to explain the rice spectral variations, only rice variety was available in the dataset, thus, the study was only focused on one cultivar. Fedearroz 67 was the most frequently sowed cultivar in the region during 2016 and 2017. To assign a disease incidence value to each rice field, the disease incidence scores that were registered for Rhizoctonia, Gae- mannomyces, Helmintosporiosis, and Sarocladium Oryzae. The Piricularia registers were not considered in the analysis because of its low values. The technicians only evaluated disease severity for Sarocladium, and therefore, this score was used as a reference for rice damage. Two rice fields were first characterised in order to explore the relationship between spectral reflectance and ground-based observations (table 3.1). Rice fields were selected based on their disease incidence and severity scores. Field 1 was registered with a disease incidence of 20.9% and severity of 9.6%, while Field 8 had an incidence of 44% and severity of 69.6%. Both rice fields were monitored at different years 2016 and 2017, respectively. Hereafter the name “healthy” is used to refer Field 1 and “unhealthy” field to name Field 8. CHAPTER 3. RICE DISEASE DETECTION 61 Table 3.1: Rice diseases incidence score for rice fields cultivated with Fedearroz 67. The rice field size is the field area at hectares. Rice field reference Sarocladium Severity Score (%) Diseases Incidence Score (%) Rice Field size (ha) Field 1 9.6 20.9 0.4 Field 2 12.6 13.2 11.9 Field 3 50 22.2 0.6 Field 4 53 20.4 7.7 Field 5 57 18.6 5.6 Field 6 58 34.4 2.5 Field 7 64 54.5 0.6 Field 8 69.6 44 2.8 Field 9 70 40.6 2.8 Field 10 71 50.9 5.9 Field 11 75 45.2 0.5 Field 12 75 50.6 3.2 Field 13 75 44.6 0.9 3.2.4 Rice growth phases detection Rice phenological phases were detected using the methodology described in section 2. Unlike Saldan˜a region where Landsat-7 data were included in the analysis, this mission was not in- cluded in this chapter. In total, 27 optical images were used, in which 10 and 17 were Landsat-8 and Sentinel-2, respectively. XGBoost model was used to classy the NDVI multitemporal sig- natures belonging to each rice field. The rice cycle was progressively characterised from July to September. The dates of interest were those in which the image cloud percentage was close to 0. The “healthy” field growth phase crop was estimated on five different dates. The rice field was in reproductive phase during July 2016 and then changed to ripening phase one month later. For “unhealthy” field, the method was applied on eight dates. The field reached reproductive phase on the middle of July, changing to ripening phase at the end of August, and finally, being harvested in October 2017 (figure 3.5). Sentinel-2 images that were taken during reproductive and ripening phases were selected. Therefore, Five and nine Sentinel-2 images were selected for healthy and unhealthy fields, respectively. An additional step was used to select the images. Each image was visualized as a true color composite (i.e., blue, green, and red) aimed at spotting surfaces with noise. As a result, the image acquired on 7 August of 2016 was removed, because a thin cloud partially covered the rice field (figure 3.6) CHAPTER 3. RICE DISEASE DETECTION 62 Figure 3.5: Results of Growth phase detection for “healthy” and “unhealthy” fields. 3.2.5 Heading and maturity stages identification The spectral information per pixel for each time and field was extracted (figure 3.7). The rice fields were classified into growth phases, but Zhao et al. (2012) characterised rice canopy during heading stage. Hence it was necessary to find which was the time where rice fields were in heading stage. So, the spectral profiles for each image were compared with the simulated spectral references. Euclidean distance metric was used to measure the similarity of both profiles. As a result, the most similar spectral profiles were found for those images captured on 28 of July of 2016 and 18 of July of 2017 for “healthy” and “unhealthy” field, respectively (figure 3.8). This result is consistent with growth phase detection, because the dates were within the estimated reproductive phase. Besides the comparison between simulated spectral reference and spectral profiles per time, these were also compared with the spectral characteristics for each disease severity level (figure 3.8). Thus, “healthy” field profile was found to be more similar to the spectral profile that char- acterises a rice canopy without disease damages. Likewise, “unhealthy” field profile registered a similar pattern to the spectral profile for the rice canopy with a high level of disease severity. CHAPTER 3. RICE DISEASE DETECTION 63 Figure 3.6: Sentinel-2 true color images during the reproductive and ripening periods for the “healthy” and “unhealthy” fields. The red line is the field contour. CHAPTER 3. RICE DISEASE DETECTION 64 Figure 3.7: Spectral profiles. The red lines are the spectral information per pixel. The black line shows the averaged spectral profiles per time. CHAPTER 3. RICE DISEASE DETECTION 65 Figure 3.9 shows the averaged spectral profiles that characterised each rice field, “unhealthy” field exhibited a similar trend than the D3 severity level reported by Zhao et al. (2012). Figure 3.8: Comparison between the simulated spectral reference and the spectral profiles for each date and for each field (“healthy” and “unhealthy”). The colorbar indicates Euclidean distance value, which its low values are colored in dark purple, whereas high values are colored as yellow. Figure 3.9: Comparison between the spectral reference profiles (blue and red lines) and the rice fields averaged spectral profiles (green and orange lines) computed for both fields. To find the date for maturity stage in both fields, the “healthy” field images during ripening phase were used as a reference. These images were compared with the images of “unhealthy” field ripening phase. The goal was to find in which date both fields had similar spectral profiles in ripening phase. As a result, images were taken on sixth of September and first of September, of 2016 and 2017, respectively, captured a similar spectral pattern for both fields (figure 3.10) Figure 3.11 shows the reflectance profile per pixel for each rice field. The spectral profiles exhibited a similar tendency in each band. Plant senescence process is mainly the cause. The spectral responses found in both stages for each field suggest that the biophysical changes caused by the diseases were mainly located during the reproductive phase. Although no information accurately pinpoints the rice phenological stage, hereafter maturity stage term is used to refer CHAPTER 3. RICE DISEASE DETECTION 66 Figure 3.10: Euclidean distances result from comparing the rice fields reflectance spectral pro- files during the ripening phase. to the highlighted spectral profiles. Figure 3.11: Comparison between the reflectance spectral profiles per pixel for each field (i.e., “healthy” and “unhealthy” ) during the maturity stage. 3.2.6 Rice disease detection within the rice fields The bi-stage NDRb was computed, aimed at gauging the spectral variation within the fields during the transition from heading to maturity stages. Figure 3.12 shows the NDRb values at each spectral band for each field. The most significant differences among both fields occurred in the blue, red, red-edge2, red-edge3, NIR, and narrow-nir bands. The NDRb values were used as inputs for the clustering algorithm. The number of clus- ters were determined using Elbow method 3.13; thus four and five clusters were selected for CHAPTER 3. RICE DISEASE DETECTION 67 Figure 3.12: Boxplots of the normalised bi-stage distributions by rice field disease status. “healthy” and “unhealthy” fields, respectively. Figure 3.13: Number of cluster selected for the “healthy” and “unhealthy” fields. The red points highlight which cluster were picked. To determine which group shared similar spectral characteristics to the healthy or diseased references, the pixels spectral profiles belonging to each cluster at heading stage were averaged. Averaged profiles were compared with the simulated spectral reference. For “healthy” field, the averaged groups were not shown significant difference across the whole spectrum (figures 3.14 3.15) For “unhealthy” field, it is to point out that there is a tree in the middle of the field (figure 3.16). Pixels belonging to cluster 5 characterise tree reflectance response. For that reason, this cluster was labeled as noise and then removed. Figure 3.17 shows how well the averaged reflectance profiles fitted with the reference profiles that characterise plants with 0 and 44% severity of Helminthosporium. The pixels, which comprise the first cluster, shared similar reflectance characteristics in the infrared portion that the healthy one reported by Zhao et al. (2012). Likewise, second and third clusters registered values close to 0.43 in the red-edge3 band, while the simulated reference reported a value of 0.42 in the same band. The fourth cluster CHAPTER 3. RICE DISEASE DETECTION 68 Figure 3.14: Map of the clusters calculated for the “healthy” field. The true color image is the date in which the rice field was in the heading stage. Figure 3.15: Averaged spectral profiles for each cluster that characterised the “Healthy” field. presented reflectance in the infrared region below of the diseased reference. These findings suggest that grouping the pixels based on the two-stage NDRb can discriminate reflectance responses from unhealthy to healthy rice canopy. 3.2.7 Application in other fields In order to gain more in-depth insight into the methodology, other rice fields were exposed to the steps before mentioned. Field 2, Field 9, and Field 12 were selected to calculate the zones affected by diseases (figure 3.18). For Field 2, five clusters were picked. For Field 9 and Field 12, four clusters were chosen. Spectral profiles were extracted from the images in which each field was in heading stage. CHAPTER 3. RICE DISEASE DETECTION 69 Figure 3.16: Map of the clusters calculated for the “unhealthy” field. The true color image is the date in which the rice field was in the heading stage. Figure 3.17: Averaged spectral profiles for each cluster that characterised the “Unhealthy” field. The three fields showed similar pattern in the near-infrared spectrum, their reflectance values were close to 0.4 (figure 3.18 b,d,f). Unlike, the rice fields exposed in the previous section; the reflectance profiles for the new fields did not present a similar pattern that the described as healthy. Although the unsupervised classification showed differences into the rice fields (figure 3.18 a,c,e), the clusters averaged spectral profiles differ from the simulated reference profile; thus, it was not possible to link which areas were affected by disease presence instead of others stresses factor. CHAPTER 3. RICE DISEASE DETECTION 70 Figure 3.18: Results for three different rice fields. The map for each field is exhibited in the left panel; The disease incidence score is pointed out above the map. In the right, the averaged spectral profile for each cluster and field. CHAPTER 3. RICE DISEASE DETECTION 71 3.3 Discussion The damage produced by diseases may change plant structure at the physiological level (Yang (2010); Martinelli et al. (2015); Zhang et al. (2018)). Degradation in pigments production and cellular structure can modify the light absorption in specific regions. This study explored the feasibility of using Sentinel-2 data for detecting diseased zones within rice fields. It was pos- sible to distinguish different rice canopy reflectances within fields by integrating hyperspectral characterisation, ground-based, and Sentinel-2 data. It was noted that the most significant differences were located from red-edge to near-infrared region. Red-edge portion has been highlighted as a good indicator of chlorophyll content (Pen˜uelas and Filella (1998)), whereas near-infrared has often linked to being sensitive to rice canopy morphology status (Shi et al. (2018)). The methodology used in this study demonstrated the applicability of detecting rice canopy changes by computing reflectance metrics across rice cycle. To identify the growth stages, the rice phase detection model developed in chapter 2 was used. It was possible to assign reflectances responses to diseases damages by using hyperspectral measurements previously reported (Zhao et al. (2012)). Thus, the narrow band characterisation was transformed into the Sentinel-2 spectral response function. Although, k-means can distinguish different spectral profiles in the fields, it was not possible to precisely pin down, which was the cause of the different rice canopy reflectance. However, the pixels groups shapes showed similar patterns than the spread of pathogens in fields (circular and rectangular) (Zhang et al. (2018)). The rice physiological structure can be modified by several factors such as soil, agricultural practices, weather. The changes can trigger a similar spectral response than damages by dis- eases. For instance, Gnyp et al. (2013), found that bands located in visible and near-infrared regions were highly sensitive to varies rates of nitrogen fertilizer (figure 3.19). To effectively assert rice detection using remote sensing tools, it is necessary to isolate factors that can influ- ence the spectral response. In this sense, rigorous ground-field monitoring must be held. Plant density, fertilizer applications, weather conditions, and irrigation application are parameters that must be part of the study. Regardless, the methodology was evaluated only in 5 rice fields. The remaining polygons, that were manual digitalized, were shared to Fedearroz. This information is a good source to CHAPTER 3. RICE DISEASE DETECTION 72 Figure 3.19: Rice canopy spectral profiles at various nitrogen rates (Gnyp et al. (2013)). locate the rice growers fields quickly. Besides, the validation opens a debate about the way that technicians georeferenced fields. This suggestion was transmitted to Fedearroz, which will advise using GPS devices when a worker is inside of fields. Chapter 4 Summary and Further Research The study aimed to explore the benefits of using remote sensing data at monitoring rice growers fields. In this sense, Chapter 2 showed a feasible way to detect rice phenological phases by blending two optical satellite missions. Performances of three machine learning models were compared, where XGBoost algorithms obtained slightly better f1-scores (figure 2.13). Although some signals were more challenging to detect than others (figure 2.15), the proposed method exhibited acceptable performances at implementing it in other localities (figure 2.16). This result allows creating national statics of rice phenological conditions (figure 2.17). Further exploration can derivate on identifying regions that are affected by external factors such as weather (figure 2.18). Chapter 3 explored a method to detect diseased rice zones within fields through Sentinel-2 data; however, these analyses need to be further validated. The rice-field referenced as “un- healthy” exposed almost identical spectral characteristics than the created by Zhao et al. (2012) (figure 3.17). But, this result can seem a coincidence without a comprehensive ground data observation. This appreciation is validated when the methodology was applied in other fields, though the method found an intra-variability signal, the spectral profiles exhibited different magnitudes in the near-infrared region (Figure 3.18). Thus, it was not possible to assign the clusters to rice zones that were affected by diseases. In future studies, ground-based data will need to provide agricultural practices, soil and weather conditions, evaluations points with georeferencing in order to get a better idea about the influence that each factor have in the rice-canopy reflectance. 73 CHAPTER 4. SUMMARY AND FURTHER RESEARCH 74 Some studies have created an integrated framework between airborne, ground hyperspec- tral, and satellite data. Thus, using a methodology to recollect multitemporal information at different spectral resolutions, would allow at monitoring pilot rice-fields. Fedearroz has started to explore airborne data scope. For futures works, this information would be included as well as rigorous agricultural practices registers at ground level. This information would help to grasp the reflectance metrics derivative from satellite missions. Bibliography Ahn, S.-W. and Jennings, P. R. (1982). Enfermedades del arroz en Ame´rica Latina y su control. CIAT, Cali, Colombia. Amaya Montoya, J. D. (2011). The Colombian Rice Sector Scenarios and Strategic Options for Increasing the Competitiveness Considering International Free Trade Agreements. PhD thesis, Wageningen University. Aschbacher, J. and Milagro-Pe´rez, M. P. (2012). The European Earth monitoring (GMES) programme: Status and perspectives. Remote Sensing of Environment, 120:3–8. Bannari, A., Morin, D., Bonn, F., and Huete, A. R. (1995). A review of vegetation indices. Remote Sensing Reviews, 13(1-2):95–120. Baret, F., Champion, I., Guyot, G., and Podaire, A. (1987). Monitoring wheat canopies with a high spectral resolution radiometer. Remote Sensing of Environment, 22(3):367–378. Barnwal, M. K., Kotasthane, A., Magculia, N., Mukherjee, P. K., Savary, S., Sharma, A. K., Singh, H. B., Singh, U. S., Sparks, A. H., Variar, M., and Zaidi, N. (2013). A review on crop losses, epidemiology and disease management of rice brown spot to identify research priorities and knowledge gaps. European Journal of Plant Pathology, 136(3):443–457. Belgiu, M. and Drgu, L. (2016). Random forest in remote sensing: A review of applications and future directions. Benedict, H. M. and Swidler, R. (1961). Nondestructive method for estimating chlorophyll content of leaves. Science. Breiman, L. (2001). Random Forests. pages 1–33. 75 BIBLIOGRAPHY 76 Buehler, S. A., Eriksson, P., and Lemke, O. (2011). Absorption lookup tables in the radia- tive transfer model ARTS. Journal of Quantitative Spectroscopy and Radiative Transfer, 112(10):1559–1567. Cao, R., Chen, Y., Shen, M., Chen, J., Zhou, J., Wang, C., and Yang, W. (2018). A simple method to improve the quality of NDVI time-series data by integrating spatiotemporal infor- mation with the Savitzky-Golay filter. Remote Sensing of Environment, 217(August):244– 257. Ceccato, P., Gobron, N., Flasse, S. S., Pinty, B., and Tarantola, S. (2002). Designing a spectral index to estimate vegetation water content from remote sensing data: Part 2. Validation and applications. Remote Sensing of Environment, 82(2-3):188–197. Chen, H.-W. and Cheng, K.-S. (2012). A Conceptual Model of Surface Reflectance Estimation for Satellite Remote Sensing Images Using in situ Reference Data. Remote Sensing, 4(12):934– 949. Chen, J., Zhu, X., Vogelmann, J. E., Gao, F., and Jin, S. (2011). A simple and effective method for filling gaps in Landsat ETM+ SLC-off images. Remote Sensing of Environment, 115(4):1053–1064. Chen, T. and Guestrin, C. (2016). XGBoost : A Scalable Tree Boosting System. In Conference on Knowledge Discovery and Data Mining, pages 1–6. Cihlar, J. (2000). Land cover mapping of large areas from satellites: Status and research priorities. International Journal of Remote Sensing, 21(7):1093–1114. Claverie, M., Ju, J., Masek, J. G., Dungan, J. L., Vermote, E. F., Roger, J.-C., Skakun, S. V., and Justice, C. (2018). The Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sensing of Environment, 219(August 2017):145–161. Correa-Victoria, F. J. and Zeigler, R. (1993). Pathogenic variability in Pyricularia grisae at a rice blast ”hot spot” breeding site in Eastern Colombia. Cortes, C. and Vapnik, V. (1995). Support-Vector Networks. Kluwer Academic Publishers, (20):273–297. BIBLIOGRAPHY 77 DANE (2017). 4 Censo Nacional Arrocero. Technical report. Daytnoff, L., Raid, R., Snyder, G., and Jones, D. (1991). Effect of Calcium Silicate on Blast and Brown Spot Intensities and Yields of Rice. Plant Disease, 75. Dean, R., Van Kan, J., Pretorius, Z., Hammond-Kosack, K., Di Pietro, A., Spanu, P., Rudd, J., Dickman, M., Kahmann, R., Ellis, J., and GD, F. (2012). The Top 10 fungal pathogens in molecular plant pathology. Usa, 13(4):414–430. Delerce, S., Dorado, H., Grillon, A., Rebolledo, M. C., Prager, S. D., Patin˜o, V. H., Varo´n, G. G., and Jime´nez, D. (2016). Assessing weather-yield relationships in rice at local scale using data mining approaches. PLoS ONE, 11(8). Dong, J. and Xiao, X. (2016). Evolution of regional to global paddy rice mapping methods: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 119(August):214–227. Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., and Bargellini, P. (2012). Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Ser- vices. Remote Sensing of Environment, 120:25–36. Duda, T. and Canty, M. (2002). Unsupervised classification of satellite imagery: Choosing a good algorithm. International Journal of Remote Sensing, 23(11):2193–2212. Fedearroz (2010). Evaluacio´n socio Econo´mica de la cadena productiva del arroz en Colombia. Technical report, Fedearroz, Bogota´, DC. Fedearroz (2014). Manejo Integrado del Cultivo del arroz. Fedearroz - Fondo Nacional del Arroz, Bogota´, D. C. Fedearroz (2016). INFORME DE GESTIO´N FONDO NACIONAL DEL ARROZ. Technical report. Fedearroz (2017). IV Censo nacional arrocero 2016. Division de investigaciones econo´micas. Feyisa, G. L., Palao, L., Nelson, A., Win, K. T., Htar, K. N., Gumma, M. K., and Johnson, D. E. (2016). A Participatory Iterative Mapping Approach and Evaluation of Three Machine BIBLIOGRAPHY 78 Learning Algorithms for Accurate Mapping of Cropping Patterns in a Complex. (March):1– 17. Friedman, J., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189–1232. Gareth James, Daniela Witten, T. H. and Tibshirani, R. (2006). An Introduction to Statistical Learning, volume 102. Georganos, S., Grippa, T., Vanhuysse, S., Lennert, M., Shimoni, M., and Wolff, E. (2018). Very High Resolution Object-Based Land Use-Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geoscience and Remote Sensing Letters, 15(4):607–611. Gnanamanickam, S., Kavitha, P. V., Babujee, L., and Brindha Priyadarisini, V. (2010). Bio- logical Control of Rice Diseases. In Biological Control of Crop Diseases. Gnyp, M. L., Yu, K., Aasen, H., Yao, Y., Huang, S., Miao, Y., and Bareth, G. (2013). Analysis of crop reflectance for estimating biomass in rice canopies at different phenological stages. Photogrammetrie - Fernerkundung - Geoinformation, 4:351–365. Gonza´lez-Betancourt, M. and Mayorga-Ru´ız, Z. L. (2018). Normalized difference vegetation index for rice management in El Espinal , Colombia. DYNA, 85(205):47–56. Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., and Moore, R. (2017). Remote Sensing of Environment Google Earth Engine : Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202(2016):18–27. Guizar-Sicairos, M., Thurman, S. T., and Fienup, J. R. (2008). Efficient subpixel image regis- tration algorithms. Optics Letters, 33(2):156. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, volume 1. second edition. BIBLIOGRAPHY 79 Huang, W., Luo, J., Zhang, J., Zhao, J., Zhao, C., Wang, J., Yang, G., Huang, M., Huang, L., and Du, S. (2012). Crop Disease and Pest Monitoring by Remote Sensing. Remote Sensing - Applications, pages 31–76. Jime´nez, D., Dorado, H., Cock, J., Prager, S. D., Delerce, S., Grillon, A., Bejarano, M. A., Benavides, H., and Jarvis, A. (2016). From observation to information: Data-driven under- standing of on farm yield variation. PLoS ONE, 11(3):1–20. Joshi, A. W. and Kumar, A. (2008). What can we learn from the electromagnetic spectrum? Resonance, 8(3):8–25. Khorram, S., Koch, F. H., F. van derWiele, C., and Nelson, S. A. (2012). Remote Sensing. Springer. Kobayashi, T., Kanda, E., Kitada, K., Ishiguro, K., and Torigoe, Y. (2001). Detection of Rice Panicle Blast with Multispectral Radiometer and the Potential of Using Airborne Multispec- tral Scanners. Phytopathology, 91(3):316–323. Kuenzer, C. and Knauer, K. (2013). Remote sensing of rice crop areas. International Journal of Remote Sensing, 34(6):2101–2139. Lantz, B. (2013). Machine Learning with R. Lipton, Z. C., Elkan, C., and Naryanaswamy, B. (2014). Optimal thresholding of classifiers to maximize F1 measure. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 8725 LNAI, pages 225–239. Springer Verlag. Liu, H. Q. and Huete, A. (1995). Feedback based modification of the NDVI to minimize canopy background and atmospheric noise. IEEE Transactions on Geoscience and Remote Sensing, 33(2):457–465. Liu, M., Wang, T., Skidmore, A. K., Liu, X., and Li, M. (2019). Identifying rice stress on a regional scale from multi-temporal satellite images using a Bayesian method. Environmental Pollution, 247:488–498. BIBLIOGRAPHY 80 Liu, Z.-y., Huang, J.-f., and Rong-xiang, T. (2008). Characterizing and Estimating Fungal Disease Severity of Rice Brown Spot with Hyperspectral Reflectance Data. Rice Science, 15(3):232–242. Ma, Y., Wu, H., Wang, L., Huang, B., Ranjan, R., Zomaya, A., and Jie, W. (2015). Remote sensing big data computing: Challenges and opportunities. Future Generation Computer Systems, 51:47–60. MADR (2014). Tolima: principales cultivos por a´rea embrada en el an˜o 2014. MADR (2016). Tolima: Principales Cultivos por A´rea Sembrada en 2016. Martinelli, F., Scalenghe, R., Davino, S., Panno, S., Scuderi, G., Ruisi, P., Villa, P., Stroppiana, D., Boschetti, M., Goulart, L. R., Davis, C. E., and Dandekar, A. M. (2015). Advanced meth- ods of plant disease detection. A review. Agronomy for Sustainable Development, 35(1):1–25. Mart´ınez, L. (2017). Relationship between crop nutritional status , spectral measurements and Sentinel 2 images. Agronomia Colombiana, 35(2):205–215. McLean, J., Hardy, B., and Hettel, G. (2013). Rice Almanac, 4th edition. Moldenhauer, K. and Slaton, N. (2001). Rice Growth and Development. In Rice Production Handbook, chapter 1. Mosleh, M. K., Hassan, Q. K., and Chowdhury, E. H. (2015). Application of remote sensors in mapping rice area and forecasting its production: A review. Sensors (Switzerland), 15(1):769– 791. Mountrakis, G., Im, J., and Ogole, C. (2011). Support vector machines in remote sensing: A review. Mulla, D. J. (2013). Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Biosystems Engineering, 114(4):358–371. Muller, A. C. and Guido, S. (2017). Introduction to Machine Learning with Python: A guide for data scientists. O’Reilly Media. NASA (2018). MODIS Vegetation Index Products (NDVI and EVI). BIBLIOGRAPHY 81 Onojeghuo, A. O., Blackburn, G. A., Wang, Q., Atkinson, P. M., Kindred, D., and Miao, Y. (2018). Mapping paddy rice fields by applying machine learning algorithms to multi-temporal sentinel-1A and landsat data. International Journal of Remote Sensing, 39(4):1042–1067. Ou, S. (1985). Rice Diseases. Commonwealth Mycological Institute, Kew, 2nd edition. Pearson, R. L. and Miller, L. D. (1972). Remote mapping of standing crop biomass for es- timation of the productivity of shortgrass prairie, Pawnee National Grasslands, Colorado. In Proceedings of the 8th International Symposium on Remote Sensing of the Environment, pages 1355–1379, Ann Arbor, MI. Pen˜uelas, J. and Filella, L. (1998). Technical focus: Visible and near-infrared reflectance techniques for diagnosing plant physiological status. Trends in Plant Science, 3(4):151–156. Qin, Z. and Zhang, M. (2005). Detection of rice sheath blight for in-season disease management using multispectral remote sensing. International Journal of Applied Earth Observation and Geoinformation, 7(2):115–128. Qiu, B., Li, W., Tang, Z., Chen, C., and Qi, W. (2015). Mapping paddy rice areas based on vegetation phenology and surface moisture conditions. Ecological Indicators, 56(August 2016):79–86. Ramoelo, A., Skidmore, A. K., Schlerf, M., Heitko¨nig, I. M., Mathieu, R., and Cho, M. A. (2012). Using the red-edge bands on Sentinel-2 for retrieving canopy chlorophyll and nitrogen content. International Journal of Applied Earth Observation and Geoinformation, 23(1):334– 343. Rondeaux, G., Steven, M., and Baret, F. (1996). Optimization of soil-adjusted vegetation indices. Remote Sensing of Environment, 55(2):95–107. Rouse, J. W., Haas, R. H., Schell, J. A., and Deering, D. W. (1973). Monitoring Vegetation Systems in the Great Plains with ERTS. Sakamoto, T., Yokozawa, M., Toritani, H., Shibayama, M., Ishitsuka, N., and Ohno, H. (2005). A crop phenology detection method using time-series MODIS data. Remote Sensing of Environment, 96(3-4):366–374. BIBLIOGRAPHY 82 Sarmiento, N. G. (2018). Identificacio´n de zonas de manejo con base en rendimiento del cultivo de ma´ız. Savitzky, A. and Golay, M. J. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 36(8):1627–1639. Shi, Y., Huang, W., Ye, H., Ruan, C., Xing, N., Geng, Y., Dong, Y., and Peng, D. (2018). Par- tial Least Square Discriminant Analysis Based on Normalized Two-Stage Vegetation Indices for Mapping Damage from Rice Diseases Using PlanetScope Datasets. Sensors (Switzerland), 18(6):1–16. Shihua, L., Jiangtao, X., Ping, N., Jing, Z., and Hongshu, W. (2014). Monitoring paddy rice phenology using time series modis data over Jiangxi province, China. International Journal of Agricultural and Biological Engineering, 7(6):28–36. Singh, R. (2016). Brown spot of rice : an overview. (January 2014). Skakun, S., Roger, J. C., Vermote, E. F., Masek, J. G., and Justice, C. O. (2017). Au- tomatic sub-pixel co-registration of Landsat-8 Operational Land Imager and Sentinel-2A Multi-Spectral Instrument images using phase correlation and machine learning based map- ping. International Journal of Digital Earth, 10(12):1253–1269. Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45(4):427–437. Thenkabail, P. S., Smith, R. B., and De Pauw, E. (2000). Hyperspectral vegetation indices and their relationships with agricultural crop characteristics. Remote Sensing of Environment, 71(2):158–182. Thomas, J. R. and Gausman, H. W. (1977). Leaf Reflectance vs. Leaf Chlorophyll and Carotenoid Concentrations for Eight Crops1. Agronomy Journal, 69(5):799. Tian, H., Wu, M., Wang, L., and Niu, Z. (2018). Mapping early, middle and late rice extent using Sentinel-1A and Landsat-8 data in the poyang lake plain, China. Sensors (Switzerland). Tucker, C. J. (1979). Red and Photographic Infrared linear Combinations for Monitoring Vegetation. Technical report. BIBLIOGRAPHY 83 UNEP (2005). Integrated Assessment of the Impact of Trade Liberalization. USGS (2018a). Landsat 4-7 Surface Reflectance (Ledaps) Product Guide. Technical report. USGS (2018b). LANDSAT 8 Surface reflectance code (LASRC) Product Guide. Technical report. Venkatarao, G. and Muralidharan, K. (1982). Effect of meteorological conditions on the in- cidence and progress of blast disease on rice. Journal of Plant Diseases and Protection, 89(4). Verrelst, J., Mun˜oz, J., Alonso, L., Delegido, J., Rivera, J. P., Camps-Valls, G., and Moreno, J. (2012). Machine learning regression algorithms for biophysical parameter retrieval: Op- portunities for Sentinel-2 and -3. Remote Sensing of Environment, 118:127–139. Wang, J., Huang, J.-f., Wang, X.-z., Jin, M.-t., Zhou, Z., Guo, Q.-y., Zhao, Z.-w., Huang, W.-j., Zhang, Y., and Song, X.-d. (2015). Estimation of rice phenology date using integrated HJ-1 CCD and Landsat-8 OLI vegetation indices time-series images. Journal of Zhejiang University-SCIENCE B, 16(10):832–844. Wang, L., Zhang, F. C., Jing, Y. S., Jiang, X. D., Yang, S. B., and Han, X. M. (2014). Multi- temporal detection of rice phenological stages using canopy stagespectrum. Rice Science, 21(2):108–115. Wu, M., Yang, C., Song, X., Hoffmann, W. C., Huang, W., Niu, Z., Wang, C., Li, W., and Yu, B. (2018). Monitoring cotton root rot by synthetic Sentinel-2 NDVI time series using improved spatial and temporal data fusion. Scientific Reports, 8(1):1–12. Xue, J. and Su, B. (2017). Significant Remote Sensing Vegetation Indices: A Review of Devel- opments and Applications. Journal of Sensors, 2017:1–17. Yang, C. M. (2010). Assessment of the severity of bacterial leaf blight in rice using canopy hyperspectral reflectance. Precision Agriculture, 11(1):61–81. Yoshida, S. (1981). Climatic Environment and its influence. Fundamentals of rice crop science, pages 65–109. BIBLIOGRAPHY 84 Zhang, D., Zhou, X., Zhang, J., Lan, Y., Xu, C., and Liang, D. (2018). Detection of rice sheath blight using an unmanned aerial system with high-resolution color and multispectral imaging. PLoS ONE. Zhao, J., Zhang, D., Luo, J., Dong, Y., Yang, H., and Huang, W. (2012). Characterization of the rice canopy infested with brown spot disease using field hyperspectral data. Wuhan University Journal of Natural Sciences, 17(1):86–92. Zheng, Q., Huang, W., Cui, X., Dong, Y., Shi, Y., Ma, H., and Liu, L. (2019). Identification of wheat yellow rust using optimal three-band spectral indices in different growth stages. Sensors (Switzerland), 19(1). Zheng, Q., Huang, W., Cui, X., Shi, Y., and Liu, L. (2018). New spectral index for detecting wheat yellow rust using sentinel-2 multispectral imagery. Sensors (Switzerland), 18(3):1–19. Zhihao Qin, Minghua Zhang, Christensen, T., Wenjuan Li, and Huajun Tang (2003). Remote sensing analysis of rice disease stresses for farm pest management using wide-band airborne data. In IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Sympo- sium. Proceedings (IEEE Cat. No.03CH37477). Zhou, R.-Q., Jin, J.-J., Li, Q.-M., Su, Z.-Z., Yu, X.-J., Tang, Y., Luo, S.-M., He, Y., and Li, X.-L. (2019). Early Detection of Magnaporthe oryzae-Infected Barley Leaves and Lesion Visualization Based on Hyperspectral Imaging. Frontiers in Plant Science, 9.