Dataset Preview
Viewer
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
Couldn't get the size of external files in `_split_generators` because a request failed: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0) Please consider moving your data files in this dataset repository instead (e.g. inside a data/ folder).
Error code:   ExternalFilesSizeRequestTimeoutError
Exception:    ReadTimeout
Message:      HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0)
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 466, in _make_request
                  six.raise_from(e, None)
                File "<string>", line 3, in raise_from
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 461, in _make_request
                  httplib_response = conn.getresponse()
                File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
                  response.begin()
                File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
                  version, status, reason = self._read_status()
                File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
                  line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
                File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
                  return self._sock.recv_into(b)
                File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
                  return self.read(nbytes, buffer)
                File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
                  return self._sslobj.read(len, buffer)
              socket.timeout: The read operation timed out
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
                  resp = conn.urlopen(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 798, in urlopen
                  retries = retries.increment(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/util/retry.py", line 550, in increment
                  raise six.reraise(type(error), error, _stacktrace)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
                  raise value
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 714, in urlopen
                  httplib_response = self._make_request(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 468, in _make_request
                  self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 357, in _raise_timeout
                  raise ReadTimeoutError(
              urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0)
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 488, in _is_too_big_from_external_data_files
                  for i, size in enumerate(pool.imap_unordered(get_size, ext_data_files)):
                File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 870, in next
                  raise value
                File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker
                  result = (True, func(*args, **kwds))
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 386, in _request_size
                  response = http_head(url, headers=headers, max_retries=3)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 429, in http_head
                  response = _request_with_retry(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 328, in _request_with_retry
                  response = requests.request(method=method.upper(), url=url, timeout=timeout, **params)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/api.py", line 59, in request
                  return session.request(method=method, url=url, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
                  resp = self.send(prep, **send_kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
                  r = adapter.send(request, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/adapters.py", line 532, in send
                  raise ReadTimeout(e, request=request)
              requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0)

Need help to make the dataset viewer work? Open a discussion for direct support.

start
unknown
target
sequence
feat_static_cat
sequence
feat_dynamic_real
sequence
item_id
string
"2016-07-01T00:00:00"
[30.5310001373291,27.78700065612793,27.78700065612793,25.04400062561035,21.947999954223633,21.173999(...TRUNCATED)
[ 0 ]
[[5.827000141143799,5.692999839782715,5.1570000648498535,5.090000152587891,5.357999801635742,5.62599(...TRUNCATED)
"OT"

Dataset Card for Electricity Transformer Temperature

Dataset Summary

The electric power distribution problem is the distribution of electricity to different areas depending on its sequential usage. But predicting the future demand of a specific area is difficult, as it varies with weekdays, holidays, seasons, weather, temperatures, etc. However, no existing method can perform a long-term prediction based on super long-term real-world data with high precision. Any false predictions may damage the electrical transformer. So currently, without an efficient method to predict future electric usage, managers have to make decisions based on the empirical number, which is much higher than the real-world demands. It causes unnecessary waste of electric and equipment depreciation. On the other hand, the oil temperatures can reflect the condition of the Transformer. One of the most efficient strategies is to predict how the electrical transformers' oil temperature is safe and avoid unnecessary waste. As a result, to address this problem, the authors and Beijing Guowang Fuda Science & Technology Development Company have provided 2-years worth of data.

Specifically, the dataset combines short-term periodical patterns, long-term periodical patterns, long-term trends, and many irregular patterns. The dataset are obtained from 2 Electricity Transformers at 2 stations and come in an 1H (hourly) or 15T (15-minute) frequency containing 2 year * 365 days * 24 hours * (4 for 15T) times = 17,520 (70,080 for 15T) data points.

The target time series is the Oil Temperature and the dataset comes with the following 6 covariates in the univariate setup:

  • High UseFul Load
  • High UseLess Load
  • Middle UseFul Load
  • Middle UseLess Load
  • Low UseFul Load
  • Low UseLess Load

Dataset Usage

To load a particular variant of the dataset just specify its name e.g:

load_dataset("ett", "m1", multivariate=False) # univariate 15-min frequency dataset from first transformer

or to specify a prediction length:

load_dataset("ett", "h2", prediction_length=48) # multivariate dataset from second transformer with prediction length of 48 (hours)

Supported Tasks and Leaderboards

The time series data is split into train/val/test set of 12/4/4 months respectively. Given the prediction length (default: 1 day (24 hours or 24*4 15T)) we create rolling windows of this size for the val/test sets.

time-series-forecasting

univariate-time-series-forecasting

The univariate time series forecasting tasks involves learning the future one dimensional target values of a time series in a dataset for some prediction_length time steps. The performance of the forecast models can then be validated via the ground truth in the validation split and tested via the test split. The covriates are stored in the feat_dynamic_real key of each time series.

multivariate-time-series-forecasting

The multivariate time series forecasting task involves learning the future vector of target values of a time series in a dataset for some prediction_length time steps. Similar to the univariate setting the performance of a multivariate model can be validated via the ground truth in the validation split and tested via the test split.

Languages

Dataset Structure

Data Instances

A sample from the training set is provided below:

{
  'start': datetime.datetime(2012, 1, 1, 0, 0),
  'target': [14.0, 18.0, 21.0, 20.0, 22.0, 20.0, ...],
  'feat_static_cat': [0], 
  'feat_dynamic_real': [[0.3, 0.4], [0.1, 0.6], ...],
  'item_id': 'OT'
}

Data Fields

For the univariate regular time series each series has the following keys:

  • start: a datetime of the first entry of each time series in the dataset
  • target: an array[float32] of the actual target values
  • feat_static_cat: an array[uint64] which contains a categorical identifier of each time series in the dataset
  • feat_dynamic_real: optional array of covariate features
  • item_id: a string identifier of each time series in a dataset for reference

For the multivariate time series the target is a vector of the multivariate dimension for each time point.

Data Splits

The time series data is split into train/val/test set of 12/4/4 months respectively.

Dataset Creation

Curation Rationale

Develop time series methods that can perform a long-term prediction based on super long-term real-world data with high precision.

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Licensing Information

Creative Commons Attribution 4.0 International

Citation Information

@inproceedings{haoyietal-informer-2021,
  author    = {Haoyi Zhou and
               Shanghang Zhang and
               Jieqi Peng and
               Shuai Zhang and
               Jianxin Li and
               Hui Xiong and
               Wancai Zhang},
  title     = {Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
  booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021, Virtual Conference},
  volume    = {35},
  number    = {12},
  pages     = {11106--11115},
  publisher = {{AAAI} Press},
  year      = {2021},
}

Contributions

Thanks to @kashif for adding this dataset.

Downloads last month
73
Edit dataset card
Evaluate models HF Leaderboard