NASA SERVIR - yield_forecasting

The focus of the Project/Model:

Using the ground data from rice farms in the Katavi region of Tanzania, we build a machine learning model that predicts the end-of-season rice yield from different earth observation features.

Test Data Boundaries

A shapefile was created to represent the geographical boundaries of each surveyed field. The GEOCIF data for January through June comes with multiple features that provide information about the field’s location, area, geometry, and harvest. Given a shapefile with a polygon for each field and an attribute for the weight of the harvest, we calculate a yield value for each field. Geopandas is used to convert the file to the UTM Zone 36S coordinate reference system, and then the area is calculated for each polygon in square meters. Geopandas also allows yield calculation by dividing the weight by the area in a new attribute.

Crop Mask Generation:

A single seasonal mosaic was created from Level-2A Sentinel-2 imagery in Google Earth Engine (GEE), excluding cloudy and shadow pixels, using the Sentinel2Cloud probability masks.

Yield Mapping:

Multiple time-series satellite datasets utilized for this task were imported from GEE. For this model, the Sentinel-2A image collection has been imported. The shapefile for the region of interest (Katavi, administrative level 1) was added as a GEE asset. The time series image collection was filtered to cover only the region enveloped by the shapefiles from January to June 2022.

Features:

Normalized Difference Vegetation Index:
“Normalized Difference Vegetation Index (NDVI) quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs).” But when NDVI is close to zero, there are likely no green leaves, and it could even be a built-up/ impervious area.

    NDVI = (NIR - Red)(NIR + Red)

Equation 1: Calculation for NDVI from Sentinel 2 bands 8 and 4

Green Chlorophyll Vegetation Index:
The Chlorophyll Index - Green (Clg) is a vegetation index used to estimate leaf chlorophyll content in the plants based on near-infrared and green bands. In general, the chlorophyll value directly reflects the vegetation.

    GCVI =NIRGreen -1

Equation 2: Calculation for GCVI from Sentinel 2 bands 8 and 3

Sentinel-2 derived NDVI and GVCI are extracted for the area of interest biweekly from January through June from Google Earth Engine.

Extraction of NDVI time series data:

Using the Sentinel-2 Image Collection band data: Bands - B4 (Red) ; B8 (NIR) and the above formula, we obtain max NDVI values for each image. The NDVI tif images are extracted at 10m resolution.We generate these values every two weeks between the date range and export these images into a tiff file. These files can be accessed in Python using Geopandas/Rasterio and can be visualized.

Extraction of GCVI time series data:

Using the Sentinel-2 Image Collection band data: Bands - B3 (Green) ; B8 (NIR) and the above formula, we obtain max GCVI values for each image. The GCVI tif images are extracted at 10m resolution.We generate these values every two weeks between the date range and export these images into a tiff file. These files can be accessed in Python using Geopandas/Rasterio and can be visualized.

Building the Regional Model:

Machine Learning using PyspatialML and Scikit-learn With the python library Pyspatialml, scikit-learn machine learning models can be applied to raster-based datasets. Here we used monthly maximum NDVI rasters extracted from the GEE products as bands with the tsraster library as input features. The target data, field-scale yield values, were in a shapefile format that could be overlaid on the rasters to extract corresponding feature raster values to create the training data set. Pyspatialml also includes methods to plot the results and modify rasters.

Time-Series Visualization

NDVI time series plot, NDVI value at each biweekly interval represents an average of the values across rice fields locations

GCVI time series plot, GCVI value at each biweekly interval represents an average of the values across rice fields location.

Model Experimentation:

NDVI Data Modeling:

The preprocessed data was run through multiple prediction models to compare and analyze the results, performance metrics, and visualization. The regression models assessed to show model predicted yield values against the Maximum Bi-weekly NDVI values during June.

From the above performance metrics, we note that Linear Regression algorithm performed the best on the maximum NDVI data for the 13 biweekly time periods from January through June with the lowest Root Mean Squared Error Value of 0.47 and an R-squared value of 0.015

GCVI Data Modeling:

The preprocessed data was run through multiple prediction models to compare and analyze the results, performance metrics, and visualization. The regression models assessed to show model predicted yield values against the Maximum Bi-weekly GCVI values during June.

From the above performance metrics, we note that Linear Regression algorithm has performed the best on the maximum GCVI data for the 13 biweekly time periods from January through June with the lowest Root Mean Squared Error Value of 0.47 and an R-squared value of 0.015

Other models

We also experimented with the following models to compare the performance metrics and choose the best fit for the Max NDVI data and max GCVI data values individually.

Random Forest
Support Vector Machine - RBF , Linear, Polynomial , Sigmoid kernels
Polynomial Regression
Lasso Regression
Ridge Regression
Decision Tree
XG Boost

Performance Metrics Used - RMSE , R-squared , CV accuracy

Team

Dr.Catherine Nakalembe

Assistant Research Professor, Dept of Geographical Sciences,University of Maryland

Dr. Ritvik Sahajpal

Associate Research Professor, Dept of Geographical Sciences,University of Maryland

Alana Ginsburg

Undergrad Research Assistant, Dept of Geographical Sciences, University of Maryland

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NASA SERVIR - yield_forecasting

The focus of the Project/Model:

Test Data Boundaries

Crop Mask Generation:

Yield Mapping:

Features:

Building the Regional Model:

Time-Series Visualization

Model Experimentation:

Other models

Team

Files

README.md

Latest commit

History

README.md

File metadata and controls

NASA SERVIR - yield_forecasting

The focus of the Project/Model:

Test Data Boundaries

Crop Mask Generation:

Yield Mapping:

Features:

Building the Regional Model:

Time-Series Visualization

Model Experimentation:

Other models

Team