BrightHub: Handling difficult wind resource data
Welcome to part 2 of our 5-part series on BrightHub, BrightWind’s wind resource data platform. In the previous article, we explained why Brightwind needed a centralised data platform like BrightHub. If you haven’t read it yet, check out BrightHub: BrightWind’s wind resource data hub. This article will delve into the first major problem that BrightHub needed to solve: handling the difficult metadata associated with each met mast sensor.
Met mast data vs LiDAR/SoDAR data
When talking about “difficult wind resource data”, we will mostly be talking about met mast data as opposed to LiDAR/SoDAR data. While not without their difficulties (for example, knowing how the unit has been set up), handling LiDAR and SoDAR metadata is relatively straightforward. Almost all the metadata we require, such as the height of the measurements, is contained in the daily files, making it easy for BrightHub’s file ingestion system to extract it.
Why is wind met mast data difficult?
When processing daily files from a met mast however, things become a lot more complicated.
Firstly, from the files alone, we can’t always determine the position of the sensor on the mast. The height of a sensor is an important thing to know, but most logger manufacturers don’t include information about the sensor height in their daily files. Even if the height was included in the file’s metadata, it may not be up to date. For example, a technician may have programmed the correct height into the logger when the mast was being set up, but 6 months later they may have changed the position of the sensor on the mast and not updated the logger configuration. This would mean that the height of the sensor in the daily file’s metadata would be incorrect.
Secondly, we can’t always trust the measurement data in the daily file, even when a sensor is functioning correctly. This surprising fact is due to the flexibility of loggers being able to store and aggregate data from a variety of sensors.
Let’s imagine we have a met mast with an anemometer (which measures wind speed) at 80 meters above ground level. This anemometer is connected to the logger at the base of the mast. An anemometer has three cups which rotate when the wind blows and cause a central shaft to rotate. The logger records a frequency related to the rotations of this central shaft. This frequency data isn’t very useful on its own, so the logger applies a slope and offset to it which results in a wind speed in meters per second. When the logger produces the daily data file, it contains the calculated wind speed values, not the raw frequency values.
A crucial thing to note here is that the slope and offset values used by the logger to calculate the wind speed are manually programmed into the logger when it is being set up. This leaves quite a lot of room for human error. For example, all anemometers are usually calibrated in a wind tunnel before being mounted on a met mast, meaning that the correct slope and offset values for each anemometer are known. However, if the slope and offset to be applied to the raw data are accidentally programmed incorrectly into the logger, then the wind speed measurements that we receive in the daily file will be incorrect, even if the anemometer is in perfect working order.
How BrightHub handles metadata
When designing the database for BrightHub, it became clear that we needed a schema that was flexible enough to capture the complexities mentioned above. The data model also needed to be intuitive enough for analysts to easily interpret and update, seeing as a large portion of the metadata would have to be added manually by the analyst.
We decided to split our metadata into 3 major sections and handle them separately. The sections were:
- Metadata related to the physical position of the sensor on the mast e.g. height.
- Metadata related to how the sensor’s slope and offset had been programmed into the logger
- Metadata related to the calibration slope and offset. This represents how the sensor should have been programmed into the logger
This is a simplified version of the schema we came up, using PostgreSQL as our database engine:
- The measurement_point table would store information about the position of the sensor, such as its height
- The sensor table would store information about the sensor’s calibration, and any other required information about the physical sensor itself
- The sensor_config table would store information about how the sensor was programmed on the logger
- The column_name table would store information about each of the columns in the daily data file that are associated with the sensor, seeing as most sensors measure multiple statistics such as average, min, max, etc.
- The date_from fields on the sensor and sensor_config tables would allow us to add multiple entries to theses tables for the same measurement_point. This would reflect how the metadata changed over the measurement period
In this schema, each aspect of measurement metadata was stored in separate tables. This allowed us to make changes to individual sections independently of each other, which was not only reflective of how the metadata changed in the real world, but was also easy for an analyst to make updates. For example, if a faulty sensor on a mast was swapped out with a new one at the same height and no changes were made to the logger slope and offset, then an analyst could just add a new entry to the sensor table and link it to the correct measurement.
This separation of concerns also made any potential issues with the measurement data easier to remedy. For example, if an analyst wanted to determine whether the data being recorded for a measurement is correct, they would compare the slope and offset values of its sensor_configs with the calibration slope and offset values of its sensors. If the values were different, the analyst could remove the incorrect sensor_config slope and offset that had been applied to the measurement data and apply the correct sensor slope and offset values.
We continually expanded upon this schema, adding more and more useful fields. Eventually it formed the basis of the excellent International Energy Association (IEA) Wind Task 43 Data Model, which aims to standardize wind resource metadata. BrightHub utilises this IEA Wind Task 43 Data Model as its database schema that deals with sensor metadata.
Next Steps
So now we had a flexible data model that would be able to handle metadata from met masts, LiDARs and SoDARs. An analyst could use installation reports, calibration certificates and any other external sources to find information about a sensor position and calibration and then manually enter it into measurement_point and sensor tables using our UI.
However, for the sensor_config and column_name table, we felt there was room for automation. Most loggers include the slope_in_logger and offset_in_logger values in the metadata section of their daily files, and also grouped the column_names based on their sensor_config. We wanted to build an automated file processing system that would extract all of the useful metadata related to the sensor_config and column_name tables.
The next article in the series will cover how we designed this automatic file processing system, and how we ensured that it integrated seamlessly with the manually entered metadata in our new data model.