Crop Predictor

The Predictive product offers Crop Productivity and Quality Estimates.

This information is available in different formats depending on the needs of each entity.

Currently, HEMAV has an AI infrastructure that works for SUGAR CANE, CORN, SOYBEAN, COTTON, BEET, VINEYARD, and PALM crops.

The steps to access this service are as follows.

Input data to the model

The models must have as much input data as possible. These are divided into different blocks. With a total of 140 possible variables, associating the most important ones to each model according to the input data.

Customer data (real data)

This is the most important block. Models reflect the quality of the data. If there isn’t a sufficient amount of data, or if it lacks certain quality standards, when training the models they may not achieve the expected results.

In this section we refer to the data that we want the model to learn (production, quality…) and it is the starting point for data management (campaign start date, planting and harvest time).

This section has 40 possible data points to fill in, with the most important being:

external_reference_id: Unique identifier per plot.
season_label: Management grouping.
latitude: Georeferencing of the plot.
longitude: Georeferencing of the plot.
type_id: Crop
sub_type_id: Variety.
init_date: Campaign start date (start_date or plantation_date depending on the crop).
harvest_or_estimation_date: Date.
cut_number: Cut number for crops such as sugarcane or palm.
production_per_hectare_real: Actual production obtained. Key variable for estimating the PRODUCTION model.
prodph_lastseason: Variable calculated by us, evaluating for that plot the production obtained in the previous season.
atr_real: Real ATR obtained. Key variable for ATR model estimation.
atr_lastseason: Variable calculated by us, always evaluating the production obtained in the previous season for that plot.
pol_real: Actual polarization obtained. Key variable for POL model estimation
sac_real: Real sucrose obtained. Key variable for SAC model estimation.
sac_lastseason: Variable calculated by us, always evaluating for that plot the production obtained in the previous season.
n_bunches_real: No. of bunches. Key variable for the N_BUNCHES model estimation.
plants_per_hectare: Number of plants per hectare. Important variable.
irrigation_type: Irrigation system used in the plot.
days: Growing days.
week: Growing week.

All these variables will define the quality of the requested model. This is why a series of reviews are carried out, which will be seen in the data review section.

Spectral data + radar

To the model’s dataset, the average value of the plot for the following spectral and radar values is incorporated for each date at a weekly level:

cloudcoverage: % of plot cloud cover on the day of visit. Only applicable for spectral parameters.
sigma0: Variable radar.
sigma0_std: Standard deviation of the radar variable. Shows the uniformity of the plot.
ndre: Nitrogen - chlorophyll index.
ndvi: Vegetation index.
ndvi_std: Standard deviation of the vegetation index.
b1: Sentinel 2 spectral bands.
b2: Sentinel 2 spectral bands.
b3: Sentinel 2 spectral bands.
b4: Sentinel 2 spectral bands.
b5: Sentinel 2 spectral bands.
b6: Sentinel 2 spectral bands.
b7: Sentinel 2 spectral bands.
b8: Sentinel 2 spectral bands.
b8a: Sentinel 2 spectral bands.
b9: Sentinel 2 spectral bands.
b11: Sentinel 2 spectral bands.
b12: Sentinel 2 spectral bands.
tcari_osavi: Soil-adjusted vegetation index that removes soil influence.
gndvi: Green-normalized difference vegetation index.
ccci: Chlorophyll index.
ndwi: Water status index (NDMI).
tcari: Index related to chlorophyll absorption.
OSAVI: The OSAVI vegetation index is a modified SAVI that also uses near-infrared and red spectral reflectance.

Spectral + radar data (temporal)

These are combinations of the previous variables processed to avoid cloud influence, indicating temporality or sudden data changes, since we work with accumulated data from the start of the campaign/planting, helping us to extract important indicators.

ndvi_smoothed: Ndvi working with a smoothing function removing the influence of cloud effects.
ndwi_smoothed: Ndwi (NDMI) working with a smoothing function removing the influence due to cloud effects.
ndvi_smoothed_temporal_max_diff: Maximum difference between weeks for the NDVI index.
ndwi_smoothed_temporal_max_diff: Maximum difference between weeks for the NDWI (NDMI) index.
ndvi_smoothed_max: Maximum NDVI value reached during the campaign.
ndwi_smoothed_max: Maximum NDWI value reached during the campaign.
ndvi_smoothed_temporal_mean_diff: Average difference between weeks for the NDVI index.
ndwi_smoothed_temporal_mean_diff: Mean difference between weeks for the NDWI (NDMI) index.
ndvi_std_temporal_max_diff: Maximum variability difference within the season.
sigma0_temporal_max_diff: Maximum difference of radar value within the campaign.
sigma0_max: Maximum radar value reached in the campaign.
sigma0_min: Minimum radar value reached in the campaign.
sigma0_temporal_mean_diff: Mean radar difference during the campaign.
sigma0_std_temporal_max_diff: Maximum radar difference during the campaign.
ndvi_growth_first_month: Maximum NDVI reached in the first month of the campaign.
ndwi_growth_first_month: Maximum NDWI (NDMI) reached in the first month of the campaign.

Climatological data

Climatological data is very important in the model as it indicates what the crop has been exposed to during the campaign. The variables we measure are the following:

pres: Mean pressure (mb).
slp: Mean sea level pressure (mb).
wind_spd: Average wind speed (Default m/s).
wind_gust_spd: Wind gust speed (m/s).
max_wind_spd: 2-minute maximum wind speed (m/s).
wind_dir: Average wind direction (degrees).
max_wind_dir: Direction of maximum 2-minute wind gust (degrees).
max_wind_ts: Time of maximum wind gust UTC (Unix Timestamp).
temp: Average temperature (Celsius by default).
max_temp: Maximum temperature (Celsius by default).
min_temp: Minimum temperature (Celsius by default).
max_temp_ts: Daily maximum temperature time UTC (Unix Timestamp).
min_temp_ts: Daily minimum temperature time UTC (Unix Timestamp).
rh: Mean relative humidity (%).
dewpt: Average dew point (Celsius by default).
clouds: Average cloud cover [satellite-based] (%).
precip: Accumulated precipitation (default mm).
precip_gpm: Accumulated precipitation [estimated by satellite/radar] (default in mm).
solar_rad: Average solar radiation (W/M^2)
t_solar_rad: Total solar radiation (W/M^2)
ghi: Average global horizontal solar irradiance (W/m^2).
t_ghi: Total daily global horizontal solar irradiance (W/m^2) [Clear sky]
max_ghi: Maximum value of global horizontal solar irradiance during the day (W/m^2) [Clear sky]
dni: Average direct normal solar irradiance (W/m^2) [Clear sky]
t_dni: Total direct normal solar irradiance for the day (W/m^2) [Clear sky]
max_dni: Maximum direct normal solar radiation value for the day (W/m^2) [Clear sky]
dhi: Average diffuse horizontal solar irradiance (W/m^2) [Clear sky]
t_dhi: Total daily diffuse horizontal solar irradiance (W/m^2) [Clear sky]
max_dhi: Maximum diffuse horizontal solar irradiance value during the day (W/m^2) [Clear sky]
max_uv: Maximum UV index (0-11+)

Agro-climatic

We incorporate agro-climatic data due to their importance in the field of agriculture.

bulk_soil_density: Bulk soil density (kg/m^3).
skin_temp_max: Maximum skin temperature (C).
skin_temp_avg: Average skin temperature (C).
skin_temp_min: Minimum skin temperature (C).
temp_2m_avg: Average temperature at 2 meters (C).
precip: Accumulated precipitation (mm).
specific_humidity: Mean specific humidity (kg/kg).
evapotranspiration: Reference evapotranspiration - ET0 (mm).
pres_avg: Average surface pressure (mb).
wind_10m_spd_avg: Average wind speed at 10 meters (m/s).
dlwrf_avg: Average hourly downward longwave solar radiation (W/m^2 · H).
dlwrf_max: Maximum hourly downward long-wave solar radiation (W/m^2 · H).
dswrf_avg: Average hourly downward shortwave solar radiation (W/m^2 · H).
dswrf_max: Maximum hourly downward shortwave solar radiation (W/m^2 · H).
dlwrf_net: Net longwave solar radiation (W/m^2 · D)
dswrf_net: Net shortwave solar radiation (W/m^2 · D).
soilm_0_10cm: Average Soil moisture content 0 to 10 cm depth (mm).
soilm_10_40cm: Average Soil moisture content 10 to 40 cm depth (mm).
soilm_40_100cm: Average Soil moisture content 40 to 100 cm depth (mm).
soilm_100_200cm: Average Soil moisture content 100 to 200 cm depth (mm).
v_soilm_0_10cm: Average volumetric soil moisture content from 0 to 10 cm depth (fraction).
v_soilm_10_40cm: Average volumetric soil moisture content from 10 to 40 cm depth (fraction).
v_soilm_40_100cm: Average volumetric soil moisture content from 40 to 100 cm depth (fraction).
v_soilm_100_200cm: Average volumetric soil moisture content from 100 to 200 cm depth (fraction)
soilt_0_10cm: Average soil temperature at 0 to 10 cm depth (C).
soilt_10_40cm: Average soil temperature at 10 to 40 cm depth (C).
soilt_40_100cm: Average soil temperature at 40 to 100 cm depth (C).
soilt_100_200cm: Average soil temperature at 100 to 200 cm depth (C).

Processed climatological and agro-climatic data

gdd: Growing degree days accumulated with base temperature according to crop.
precip_temporal_max_diff: Maximum difference between weeks in precipitation during the campaign.
precip_max: Maximum precipitation value in a week.
evapotranspiration_max_diff: Maximum evapotranspiration difference between weeks during the campaign.
rh_max_diff: Maximum humidity difference between weeks during the campaign.
skin_temporal_max_min_diff: Maximum difference in maximum soil temperature between weeks during the campaign.
skin_temporal_min_min_diff: Maximum temperature difference in minimum soil temperature between weeks during the campaign.
gdd_min_diff: GDD difference between weeks during the campaign.
precip_first_month: Maximum precipitation reached during the first month of the campaign.
rh_first_month: Maximum humidity reached during the first month of the campaign.
skin_temp_max_first_month: Maximum soil temperature reached during the first month of the season.
solar_rad_first_month: Maximum solar radiation reached during the first month of campaign.

Data Review

The first step is calculating statistics for all the variables explained in the previous section. Once calculated, an automatic review of them is performed.

Currently, a summary report of this data review is generated, which can detect:

Temporal outliers: Such as problems with planting/harvest dates. Problems with predictor variables (anomalous estimations for a crop)
Global outliers: Detects any calculation issues across all explained variables.

All of this is represented in a summary report like the following:

REPORT

Enter the name of the report, it indicates that it is a preprocessing of the PROD variable.
Gives us information about the USER ID (be careful not to confuse user_id with customer_id or agrouser_id).
Shows the size of the dataframe.
Shows number of fields and seasons available for that client.
Number of actual data points for training (very important).
Number of data points that could be considered outliers.
Number of seasons with incorrect planting dates. Currently this is recorded if the NDVI in the first 30 days is greater than 0.4 (for “sugarcane”, “beetroot”, “soybean”, “corn”, “cotton”). (File attached in S3).
Number of records where the harvest date is considered incorrect (File attached in S3) taking into account the planting date (e.g., seasons that are too long or too short).
Number of real production data points that are considered outlier candidates, using two standard deviations above the moving average of the data on the days axis. (File attached in S3).
Graph showing the number of columns with missing data and % of missing data per variable.
NDVI evolution graph by cultivation days for season_ids with errors.
Evolution of production over time series. The upper graph shows the amount of data with actual production (green) compared to data without reported actual production (red). The lower graph shows the evolution of reported actual production.
Yield by cultivation days with limit axes to detect outliers based on time series data using population mean and two-thirds (2/3) of standard deviation over the moving average.
NDVI evolution by cultivation days smoothed by plant day-cycle.
Isolation forest: Shows the distribution of scores in a histogram given by the algorithm, and the number of observations considered outliers. (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html)

All outlier candidates warn us about possible values that should not be introduced into the model, but this step is performed in the next section.

Model generation

Model generation is also performed automatically.

Currently, we have the automatic generation of 3 simultaneous models performing hyperparametrization techniques with the aim of obtaining the best model for the provided data.

After automatic training, a summary report of results evaluation is also generated:

MODEL REPORT

Report title.
Indicate the user of the model.
Data that finally enters the training.
Outliers detected with isolation forest
Number of outliers in preprocessing, as in “faulty_seasons_dict”
Number of variables that entered the model.
Training date.
The outlier_threshold that was set in the config.
Model user.
Model type.
Model training.
Model type with the best accuracy (RFR -> Random Forest Regressor (helps prevent overfitting); GBR -> Gradient Boosting Regressor (intermediate, but overfits more); XGBR -> eXtreme Gradient Boosting Regressor (overfits the most, but usually gives the best results)).
Summary table of accuracies.
Graph to check actual vs predicted data distinguishing between train and test.
Evolution of actual production over time.
Most important variables of the trained model.
Distribution of errors in train and test. If train and test are very different, then you should review what happened, and why the test data does not represent the training data.
Error distribution over time.

In addition to this summary report, you can also consult with your KAM/CPM interesting graphs for more in-depth data analysis within our production model manager (MLFLOW)

Prediction Generation

Once the data has been reviewed and trained, we use the generated model to make predictions in the campaign.

Predictions are updated on a weekly basis. To internally validate the results, we also generate reports to ensure everything is running correctly and we don’t detect strong anomalies.

FORECASTECH REPORT

Report title.
Indicates the user that the prediction has been made.
Model type.
If any season_label filter has been used.
If it has been applied to update only future or all.
Prediction date.
Number of predictions generated (for each season_id there are multiple dates, hence the large number).
Plots that should have their planting date reviewed.
Summary table. Shows the distributions of the data used for training and the predicted data. It is normal that there will always be some deviation between the training mean and the forecast, since the training only takes the harvest date and the forecast takes everything.
Graph rescued from preprocessing to detect planting problems.
Distribution graph of the most important variables in the model, to observe the distribution of the model and the data to be predicted
Data frequency according to the value to predict, segmented by trained and predicted
Evolution of estimates over time, along with the data used to train the model (if there is data from the last year used to train it).
Variables with greater importance, showing how their values affect the prediction. The vertical line indicates the expected value (E(X)). Each observation has a value for each variable. The graph above shows how high values of that variable (in red) modify the observation with respect to the expected value. For example, the most important variable shows that red values increase the prediction value relative to the expected value or mean. Conversely, blue colors represent low values of that variable, and in the most important variable, you can see how low values (blue) make the prediction lower.

API de Crop Predictor

Crop Predictor dispone de una API REST que permite consultar las predicciones generadas por los modelos de forma programática. La documentación interactiva (Swagger UI) está disponible en:

Producción: https://agropred.layers.hemav.com/docs

Autenticación

Todos los endpoints de la API requieren una API key que se pasa como parámetro de consulta (query parameter) en la URL:

?api_key=TU_API_KEY

Esta API key es la misma que se utiliza para cualquier interacción con las APIs de la plataforma Layers. Para obtenerla, contacta con tu KAM/CPM o utiliza el endpoint /publicapi/getApiKey de la API principal de Layers.

Si la API key no se proporciona, la API devolverá un error 401 Not authenticated. Si la API key no es válida o no se encuentra en el sistema, devolverá un error 403 API key not found.

Roles y permisos

El acceso a los datos está controlado por el rol asociado a la API key:

Rol	Acceso
Admin	Acceso a todos los campos y usuarios
Agrouser	Campos de los clientes y cooperativas asignados
Cooperativa	Campos de los clientes miembros
Cliente (Farmer)	Solo sus propios campos

Si se solicitan campos a los que la API key no tiene acceso, la API devolverá un error 403.

Endpoints

Health Check

Método	Endpoint	Descripción
GET / HEAD	`/`	Comprobación de disponibilidad del servicio

POST `/forecast`

Endpoint principal para obtener las predicciones de productividad y calidad generadas por los modelos.

Parámetros de consulta (query):

Parámetro	Tipo	Requerido	Descripción
`api_key`	string	Sí	API key de autenticación

Cuerpo de la petición (JSON):

Campo	Tipo	Requerido	Descripción
`field_reference`	lista de strings	Sí	Lista de identificadores propios del cliente para sus campos (external reference). Cada cliente define sus propias referencias para identificar sus parcelas
`from_date`	string (YYYY-MM-DD)	Sí	Fecha de inicio del rango de consulta
`to_date`	string (YYYY-MM-DD)	Sí	Fecha de fin del rango de consulta

Ejemplo de petición:

curl -X 'POST' \
  'https://agropred.layers.hemav.com/forecast?api_key=TU_API_KEY' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "field_reference": [
    "10001_500123"
  ],
  "from_date": "2025-01-01",
  "to_date": "2025-12-31"
}'

Ejemplo de respuesta exitosa (200):

La respuesta es un objeto JSON donde cada clave es una field_reference solicitada, y el valor es una lista de registros de predicción semanales.

{
  "10001_500123": [
    {
      "date": "2025-01-05",
      "season_id": "3012345",
      "season_label": "SAFRA 2024-25",
      "user_id": "10001",
      "crop_type": "sugarcane",
      "field_id": 500123,
      "forecasts_production_per_hectare_real": 78.54,
      "forecasts_atr_real": 132.17,
      "update_date_production_per_hectare_real": "2025-04-10",
      "update_date_atr_real": "2025-04-10",
      "model.metadata.run_id_production_per_hectare_real": "abc123def456",
      "model.metadata.run_id_atr_real": "789ghi012jkl"
    },
    {
      "date": "2025-01-12",
      "season_id": "3012345",
      "season_label": "SAFRA 2024-25",
      "user_id": "10001",
      "crop_type": "sugarcane",
      "field_id": 500123,
      "forecasts_production_per_hectare_real": 79.12,
      "forecasts_atr_real": 131.85,
      "update_date_production_per_hectare_real": "2025-04-10",
      "update_date_atr_real": "2025-04-10",
      "model.metadata.run_id_production_per_hectare_real": "abc123def456",
      "model.metadata.run_id_atr_real": "789ghi012jkl"
    }
  ]
}

Descripción de los campos de respuesta:

Campo	Tipo	Descripción
`date`	string	Fecha de la predicción (resolución semanal)
`season_id`	string	Identificador de la campaña/zafra
`season_label`	string	Etiqueta legible de la campaña (ej: “SAFRA 2024-25”)
`user_id`	string	Identificador del usuario propietario
`crop_type`	string	Tipo de cultivo (ej: “sugarcane”, “corn”, “soybean”)
`field_id`	integer	Identificador numérico de la parcela
`forecasts_production_per_hectare_real`	float	Predicción de producción por hectárea (t/ha)
`forecasts_atr_real`	float	Predicción de ATR (Azúcares Totales Recuperables, kg/t)
`update_date_production_per_hectare_real`	string	Fecha de la última actualización del modelo de producción
`update_date_atr_real`	string	Fecha de la última actualización del modelo de ATR
`model.metadata.run_id_production_per_hectare_real`	string	Identificador de la ejecución del modelo de producción (MLflow)
`model.metadata.run_id_atr_real`	string	Identificador de la ejecución del modelo de ATR (MLflow)

Note

Los campos de predicción disponibles dependen del cultivo y de los modelos entrenados para cada cliente. Los campos más comunes son forecasts_production_per_hectare_real (producción) y forecasts_atr_real (ATR), pero pueden incluirse otros como forecasts_pol_real, forecasts_sac_real o forecasts_n_bunches_real según la configuración del modelo.

Note

Los valores de update_date_* y model.metadata.run_id_* pueden estar vacíos ("") si la predicción aún no ha sido generada para ese modelo o fecha concreta.

Códigos de error:

Código	Descripción
`401`	API key no proporcionada
`403`	API key no válida o sin permisos para los campos solicitados

POST `/s3/file_paths`

Endpoint para obtener las rutas de los archivos generados por el sistema (reports de preprocessing, entrenamiento y predicción) almacenados en S3.

Parámetros de consulta (query):

Parámetro	Tipo	Requerido	Descripción
`api_key`	string	Sí	API key de autenticación

Cuerpo de la petición (JSON):

Campo	Tipo	Requerido	Descripción
`user_id`	integer	Sí	Identificador del usuario cuyos archivos se desean consultar

Ejemplo de petición:

curl -X 'POST' \
  'https://agropred.layers.hemav.com/s3/file_paths?api_key=TU_API_KEY' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "user_id": 10001
}'

Ejemplo de respuesta exitosa (200):

[
  "preprocessing/10001/report_prod_2025-03.pdf",
  "training/10001/model_report_prod_2025-03.pdf",
  "forecast/10001/forecast_report_prod_2025-04.pdf"
]

Códigos de error:

Código	Descripción
`401`	API key no proporcionada o sin permisos para el usuario solicitado
`403`	API key no válida
`404`	No hay archivos disponibles para el usuario

Ejemplo de uso completo

A continuación se muestra un ejemplo completo de integración con la API utilizando Python:

import requests

API_URL = "https://agropred.layers.hemav.com"
API_KEY = "TU_API_KEY"

# Obtener predicciones para una parcela
response = requests.post(
    f"{API_URL}/forecast",
    params={"api_key": API_KEY},
    json={
        "field_reference": ["10001_500123"],
        "from_date": "2025-01-01",
        "to_date": "2025-12-31"
    }
)

if response.status_code == 200:
    data = response.json()
    for field_ref, forecasts in data.items():
        print(f"Parcela: {field_ref}")
        for forecast in forecasts:
            print(f"  Fecha: {forecast['date']}")
            print(f"  Producción: {forecast.get('forecasts_production_per_hectare_real', 'N/A')} t/ha")
            print(f"  ATR: {forecast.get('forecasts_atr_real', 'N/A')} kg/t")
else:
    print(f"Error {response.status_code}: {response.text}")

Tip

Puedes consultar múltiples parcelas en una sola petición pasando varias referencias en el array field_reference. La API procesará las consultas en paralelo para optimizar el tiempo de respuesta.

Crop Predictor

Input data to the model

Customer data (real data)

Spectral data + radar

Spectral + radar data (temporal)

Climatological data

Agro-climatic

Processed climatological and agro-climatic data

Data Review

Model generation

Prediction Generation

API de Crop Predictor

Autenticación

Roles y permisos

Endpoints

Health Check

POST /forecast

POST /s3/file_paths

Ejemplo de uso completo

POST `/forecast`

POST `/s3/file_paths`