You need to agree to share your contact information to access this dataset

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this dataset content.

UK PV dataset

PV solar generation data from the UK. This dataset contains data from 1311 PV systems from 2018 to 2021. Time granularity varies from 2 minutes to 30 minutes.

This data is collected from live PV systems in the UK. We have obfuscated the location of the PV systems for privacy. If you are the owner of a PV system in the dataset, and do not want this data to be shared, please do get in contact with [email protected].

Files

  • metadata.csv: Data about the PV systems, e.g location
  • 2min.parquet: Power output for PV systems every 2 minutes.
  • 5min.parquet: Power output for PV systems every 5 minutes.
  • 30min.parquet: Power output for PV systems every 30 minutes.
  • pv.netcdf: (legacy) Time series of PV solar generation every 5 minutes

metadata.csv

Metadata of the different PV systems.

Note that there are extra PV systems in this metadata that do not appear in the PV time-series data.

The csv columns are:

  • ss_id: the id of the system
  • latitude_rounded: latitude of the PV system, but rounded to approximately the nearest km
  • longitude_rounded: latitude of the PV system, but rounded to approximately the nearest km
  • llsoacd: TODO
  • orientation: The orientation of the PV system
  • tilt: The tilt of the PV system
  • kwp: The capacity of the PV system
  • operational_at: the datetime the PV system started working

{2,5,30}min.parquet

Time series of solar generation for a number of sytems. Each file includes the systems for which there is enough granularity. In particular the systems in 2min.parquet and 5min.parquet are also in 30min.parquet.

The files contain 3 columns:

  • ss_id: the id of the system
  • timestamp: the timestamp
  • generation_wh: the generated power (in kW) at the given timestamp for the given system

pv.netcdf (legacy)

Time series data of PV solar generation data is in an xarray format.

The data variables are the same as 'ss_id' in the metadata. Each data variable contains the solar generation (in kW) for that PV system. The ss_id's here are a subset of all the ss_id's in the metadata The coordinates of the date are tagged as 'datetime' which is the datetime of the solar generation reading.

This is a subset of the more recent 5min.parquet file.

example

using Model Database Datasets

from datasets import load_dataset
dataset = load_dataset("openclimatefix/uk_pv")

useful links

https://huggingface.co/docs/datasets/share - this repo was made by following this tutorial

Downloads last month
8