SenCYF was a 12-month project supported by ESA from July 2020 that aimed to estimate and forecast the yield of wheat at the parcel level. SenCYF proposed an innovative crop yield forecasting model based on Sentinel-2 data, validated with a France-wide in situ yield data set.
Crop yield forecasting is crucial information that structures the global trade commodities market, impacting public and private sectors. Major agricultural agencies have developed Crop Yield Forecasting Systems (CYFS) to monitor crop production and thus to cope with the consequences of production variability in terms of food security, market development and trade. These CYFS generally rely on expert knowledge, and for some developed countries, the use of nation-wide surveys may be used. The exploitation of Earth Observation (EO) data is rare, and often limited to coarse spatial resolution sensors, which are unable to provide crop specific observations. The Sentinel-2 constellation has started changing this perspective as it provides observations at high spatial and temporal resolution that may revolutionize the use of EO data for crop monitoring.
Crop production of a given region is calculated as the product of the cultivated area by the average yield of the crop of interest over the same area. Forecasting crop production requires the derivation of both aspects separately, while a within-season crop specific map is required to derive the yield from EO. In the past years, ESA has supported activities to develop operational tools to better identify the crop types and estimate the crop areas (e.g. ESA Sen2Agri, ESA Sen4CAP , ESA Sen4STAT ). During the outbreak of the covid-19 virus and the first phase of the sanitary crisis, the harvest of the wheat crop was monitored in real time during the 2020 growing season and reported on the RACE dashboad . Despite great expectations clearly expressed in the context of climate change and subsequent production uncertainty, as of early 2019 there is no operational method to estimate crop yield from Sentinel-2 time series. Building a crop yield estimation and forecasting model at field and administrative is of major interest for most of the stakeholders involved in agriculture sector.
The UCLouvain-Geomatics research group developed the necessary framework to calibrate its crop yield forecasting model from Sentinel-2 time series in order to estimate and forecast winter wheat yield in developed countries (Europe, USA). However, the validation of this calibrated yield model has been performed only at sub-national level (e.g., county level for USA, NUTS-3 for France). The model performances were already very promising for these preliminary results with the ability to estimate yield with an error as low as 11% at this aggregation level.
While validation is limited to the sub-national scale due to the shortage of accessible yield observation, the elementary unit of the simulation is the agricultural field and there is a potential to monitor yield at such a scale. In 2018, UCLouvain-Geomatics got the access to a data set composed by around 11,500 winter wheat yields declared by farmers each year. That gives the opportunity to confront high resolution satellite data like Sentinel-2 with a yield data set of high magnitude.
The proposed innovative approach aimed to develop and validate a Sentinel-2-based crop yield forecasting model based on this very large in-situ yield data set. It was a great opportunity to contribute to filling the gap for a complete EO-based crop monitoring system (areas and yield), and to demonstrate the performance of the Sentinel-2 time series for crop yield estimates. As the Sentinel era has just began, the interest for S2-based CYFS is expected to increase rapidly worldwide. The proposed pioneer method will be replicable for other crops and will affect a wide variety of actors, from national and international agricultural agencies to private sectors involved in the commodities trade market.
The overall objective of the project was to develop a regression model using a set of explanatory variables (mostly developed from Earth Observation and meteorological time series) to estimate the yield. The project was composed of two main parts. The first one was the selection and the extraction of those explanatory variables (yield features or YFs). The second one was the modeling part aiming to train a model on those variables to estimate the yield value at field level.
A benchmarking step was needed to develop and retrieve the model design providing the best performances. Multiple tests were completed during the benchmarking phase, using different model regression algorithms and combinations of steps on the dataset calibration (with or without stratification, variable selection methods, etc.). Such a benchmarking testing different choices at different levels allowed the selection of the best model for the final yield estimation and yield forecast.
SenCYF best model was designed to estimate the yield of the fields at three periods in the season. The estimation model (models calibrated with the reference yield dataset of the current years) provides an estimation based on the yield reference surveys. The yield estimation computed right after the harvest are based on the pre-existing model (PEM) corresponding to models calibrated with the reference yield dataset of the previous year(s). The forecast model provides the yields as early as mid-June, about one month before the harvest.
The estimation aggregated from field to subnational level (Nuts 3) presents high performances for each method. The huge gap between the estimation and the PEM estimation performances highlights the difficulty of a model to be applied on other spacetime. The inter-annual yield variability is not well retrieved. Looking at specific years, we observe bias for the PEM estimation and forecast in 2018 and 2019. 2018 results (orange) are systematically overestimated whereas 2019 results (blue) are underestimated. More results will be display after the publication of the scientific manuscript relying on the SenCYF project.