Applying Numerical Weather Prediction model data for machine learning

When forecasting weather-based processes, such as wind and solar power generation, numerical weather prediction (NWP) models are the most important time-variant input, as they cover the upcoming days which are of particular interest in many applications. Selecting the right models, parameters and time periods helps to achieve high forecast accuracy when using machine learning methods with numerical weather prediction model data.

Choosing the right NWP models

The NWP models available from meteorological services differ greatly in geographic coverage, spatial and temporal resolution, and forecast horizon. What’s more, each NWP model has certain strengths and weaknesses depending on conditions such as region and terrain type. For this reason, the selection of NWP models is an important step in the process of creating a forecast model.

Global weather models form a solid basis: The GFS model from NOAA is available free of charge, while the ECMWF HRES model provides forecasts with high resolution and accuracy. These models, as well as a dozen of global NWP models from other meteorological services, typically cover a forecast horizon of one to two weeks. This makes them a good foundation for any forecast.

Regional weather models cover much smaller areas, such as a single continent or a few countries. In many cases, smaller geographical coverage comes with higher spatial resolution, but reduced forecast horizon. This makes the models faster to compute, allowing the meteorological services to run the model more often and provide updates very quickly.

The North American Mesoscale Forecast System (NAM) by NOAA, EURO4 by UK Met Office and ICON EU-Nest by the German DWD are three examples of regional NWP models.

Organizations may also run their own local weather models, assimilating their own weather observation data into the model to compute a forecast that is tailored to the specific application.

Depending on the required forecast horizon and the site that is to be forecasted, only certain NWP models may be applicable. However, the highest forecast accuracy can be achieved through a smart combination of multiple NWP models. The combination of models should be specific to the site and reference data, taking into account the strengths and weaknesses of each model. This can be achieved through a machine learning approach.

Applying NWP data in machine learning

When designing a machine learning model, it is important to use the same data in training as would be used in operational forecasting. In a first step, the relevant forecast range has to be identified. E.g., a typical case is that a forecast for the following day is computed once daily (day-ahead forecast). This means that while the NWP model may be updated 4 times per day and have a forecast horizon of several days, only one of the 4 model runs per day and only a one-day subset of the forecast data may be used.

However, for most NWP models this data is not available for past time periods from the meteorological services, so that it becomes impossible to train models using data with the required lead time. For this reason, we maintain our own archive of historical NWP model data in our enercast MeteoStore X repository, holding 1PB of fast access weather data. This enables us to train the forecast models and compute backcasts with the exact required forecast horizon and assess historical forecast performance.

Sign up for our newsletter

Did you like our blog post and want to receive regular updates on renewable energy performance forecasting? Then the best thing to do is .