Datasets:
The dataset viewer is not available for this split.
Error code: StreamingRowsError Exception: NonStreamableDatasetError Message: Streaming is not possible for this dataset because data host server doesn't support HTTP range requests. You can still load this dataset in non-streaming mode by passing `streaming=False` (default) Traceback: Traceback (most recent call last): File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 439, in open return open_files( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 282, in open_files fs, fs_token, paths = get_fs_token_paths( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths fs = filesystem(protocol, **inkwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem return cls(**storage_options) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__ obj = super().__call__(*args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 59, in __init__ self.zip = zipfile.ZipFile( File "/usr/local/lib/python3.9/zipfile.py", line 1266, in __init__ self._RealGetContents() File "/usr/local/lib/python3.9/zipfile.py", line 1329, in _RealGetContents endrec = _EndRecData(fp) File "/usr/local/lib/python3.9/zipfile.py", line 263, in _EndRecData fpin.seek(0, 2) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 747, in seek raise ValueError("Cannot seek streaming HTTP file") ValueError: Cannot seek streaming HTTP file The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 204, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 226, in get_rows ds = load_dataset( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 2129, in load_dataset return builder_instance.as_streaming_dataset(split=split) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/builder.py", line 1329, in as_streaming_dataset splits_generators = {sg.name: sg for sg in self._split_generators(dl_manager)} File "/tmp/modules-cache/datasets_modules/datasets/electricity_load_diagrams/fe3dd01c39428ad92523a7ced0df3fdf669cb0548b3dd16fb9f7009381aa440f/electricity_load_diagrams.py", line 109, in _split_generators df = pd.read_csv( File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper return function(*args, download_config=download_config, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 765, in xpandas_read_csv return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 499, in xopen raise NonStreamableDatasetError( datasets.download.streaming_download_manager.NonStreamableDatasetError: Streaming is not possible for this dataset because data host server doesn't support HTTP range requests. You can still load this dataset in non-streaming mode by passing `streaming=False` (default)
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for Electricity Load Diagrams
Dataset Summary
This dataset contains hourly kW electricity consumption time series of 370 Portuguese clients from 2011 to 2014.
Dataset Usage
The dataset has the following configuration parameters:
freq
is the time series frequency at which we resample (default:"1H"
)prediction_length
is the forecast horizon for this task which is used to make the validation and test splits (default:24
)rolling_evaluations
is the number of rolling window time series in the test split for evaluation purposes (default:7
)
For example, you can specify your own configuration different from those used in the papers as follows:
load_dataset("electricity_load_diagrams", "uci", rolling_evaluations=10)
Notes:
- Data set has no missing values.
- Values are in kW of each 15 min rescaled to hourly. To convert values in kWh values must be divided by 4.
- All time labels report to Portuguese hour, however all days present 96 measures (24*4).
- Every year in March time change day (which has only 23 hours) the values between 1:00 am and 2:00 am are zero for all points.
- Every year in October time change day (which has 25 hours) the values between 1:00 am and 2:00 am aggregate the consumption of two hours.
Supported Tasks and Leaderboards
univariate-time-series-forecasting
: The time series forecasting tasks involves learning the futuretarget
values of time series in a dataset for theprediction_length
time steps. The results of the forecasts can then be validated via the ground truth in thevalidation
split and tested via thetest
split.
Languages
Dataset Structure
Data set has no missing values. The raw values are in kW of each 15 min interval and are resampled to hourly frequency. Each time series represent one client. Some clients were created after 2011. In these cases consumption were considered zero. All time labels report to Portuguese hour, however all days contain 96 measurements (24*4). Every year in March time change day (which has only 23 hours) the values between 1:00 am and 2:00 am are zero for all points. Every year in October time change day (which has 25 hours) the values between 1:00 am and 2:00 am aggregate the consumption of two hours.
Data Instances
A sample from the training set is provided below:
{
'start': datetime.datetime(2012, 1, 1, 0, 0),
'target': [14.0, 18.0, 21.0, 20.0, 22.0, 20.0, 20.0, 20.0, 13.0, 11.0], # <= this target array is a concatenated sample
'feat_static_cat': [0],
'item_id': '0'
}
We have two configurations uci
and lstnet
, which are specified as follows.
The time series are resampled to hourly frequency. We test on 7 rolling windows of prediction length of 24.
The uci
validation therefore ends 24*7 time steps before the end of each time series. The training split ends 24 time steps before the end of the validation split.
For the lsnet
configuration we split the training window so that it is 0.6-th of the full time series and the validation is 0.8-th of the full time series and the last 0.2-th length time windows is used as the test set of 7 rolling windows of the 24 time steps each. Finally, as in the LSTNet paper, we only consider time series that are active in the year 2012--2014, which leaves us with 320 time series.
Data Fields
For this univariate regular time series we have:
start
: adatetime
of the first entry of each time series in the datasettarget
: anarray[float32]
of the actual target valuesfeat_static_cat
: anarray[uint64]
which contains a categorical identifier of each time series in the datasetitem_id
: a string identifier of each time series in a dataset for reference
Given the freq
and the start
datetime, we can assign a datetime to each entry in the target array.
Data Splits
name | train | unsupervised | test |
---|---|---|---|
uci | 370 | 2590 | 370 |
lstnet | 320 | 2240 | 320 |
Dataset Creation
The Electricity Load Diagrams 2011–2014 Dataset was developed by Artur Trindade and shared in UCI Machine Learning Repository. This dataset covers the electricity load of 370 substations in Portugal from the start of 2011 to the end of 2014 with a sampling period of 15 min. We will resample this to hourly time series.
Curation Rationale
Research and development of load forecasting methods. In particular short-term electricity forecasting.
Source Data
This dataset covers the electricity load of 370 sub-stations in Portugal from the start of 2011 to the end of 2014 with a sampling period of 15 min.
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
[More Information Needed]
Licensing Information
[More Information Needed]
Citation Information
@inproceedings{10.1145/3209978.3210006,
author = {Lai, Guokun and Chang, Wei-Cheng and Yang, Yiming and Liu, Hanxiao},
title = {Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks},
year = {2018},
isbn = {9781450356572},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3209978.3210006},
doi = {10.1145/3209978.3210006},
booktitle = {The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval},
pages = {95--104},
numpages = {10},
location = {Ann Arbor, MI, USA},
series = {SIGIR '18}
}
Contributions
Thanks to @kashif for adding this dataset.
- Downloads last month
- 545