11/15/2022
All trading decisions in short-term power markets are data-driven.
Energy market participants have therefore been increasing their efforts to establish data processing pipelines in order to facilitate sourcing, collection, preparation, input, analysis and storage of data points.
Let’s zoom in on the six crucial stages of data processing.
Market participants predominantly obtain large amounts of data from a mix of third-party providers. The unavoidable result is a potpourri of heterogeneous data points that need to be transformed in a series of data processing stages so that insights for intelligent trading decisions can be derived.
The data processing cycle starts with the selection and validation of data sources to fit trading requirements. Usually, one looks at the following characteristics:
Once data providers have been selected and integrated, all available historical and live data points have to be fed into the market participants’ existing systems. This collection process often involves the development of applications to connect to the data providers’ APIs (if available) and stream incoming data.
Since the accuracy of data outputs is reliant on the quality of incoming data, the third data processing stage comprises cleaning up received data points by applying the following common techniques:
At this stage, data is input into corresponding data processing applications, primarily databases or message queues. Typically, one would use databases for historical data analysis or batch processing (mostly relevant for auction markets) and message queues with event-driven architectures for online processing (mostly relevant for continuous markets).
During the fifth stage, multiple data manipulation techniques — sorting, summarization, aggregation, transformation, normalization — are used to process the collected data points, based on which trading decisions will be derived.
Batch processing applications allow for processing data points in batches each time the pre-specified amount of data is collected. In contrast, online processing applications run autonomously and react to every data update in realtime.
While data processing in short-term power trading mostly follows generic data processing principles, it still comes with several industry-specific challenges.
Once the output is available, it needs to be stored: historical data storage and live caching are the common options to accommodate for different use cases.
Regardless of the selected way of storing data, the unique identifier of traded products — i.e. the combination of a market area, delivery start and end — needs to be applied when saving data points. Consistency in area naming, product size nomenclature and granularity therefore make data processing pipelines more scalable across products and markets.