Datasets:

DarthReca
/

california_burned_areas

Tasks:

Image Segmentation

Size Categories: n<1K

Tags: climate

DOI:

License:

Dataset card Files Files and versions Community

Dataset Viewer

Go to dataset viewer

Viewer

The dataset viewer is not available for this split.

Cannot load the dataset split (in streaming mode) to extract the first rows.

Error code:   StreamingRowsError
Exception:    FileNotFoundError
Message:      [Errno 2] Unable to open file (unable to open file: name = 'https://huggingface.co/datasets/DarthReca/california_burned_areas/resolve/main/raw/patched/512x512.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 257, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 198, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 235, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 233, in __iter__
                  yield from self.generate_examples_fn(**self.kwargs)
                File "/tmp/modules-cache/datasets_modules/datasets/DarthReca--california_burned_areas/b366661cf1081924dd360ebcc89b085a0e0e2ba2db9b60ea6f0e5b527bbc7b98/california_burned_areas.py", line 139, in _generate_examples
                  with h5py.File(filepath, "r") as f:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 567, in __init__
                  fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/h5py/_hl/files.py", line 231, in make_fid
                  fid = h5f.open(name, flags, fapl=fapl)
                File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
                File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
                File "h5py/h5f.pyx", line 106, in h5py.h5f.open
              FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'https://huggingface.co/datasets/DarthReca/california_burned_areas/resolve/main/raw/patched/512x512.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Need help to make the dataset viewer work? Open a discussion for direct support.

California Burned Areas Dataset

Working on adding more data

Dataset Summary

This dataset contains images from Sentinel-2 satellites taken before and after a wildfire. The ground truth masks are provided by the California Department of Forestry and Fire Protection and they are mapped on the images.

Supported Tasks

The dataset is designed to do binary semantic segmentation of burned vs unburned areas.

Dataset Structure

We opted to use HDF5 to grant better portability and lower file size than GeoTIFF.

Dataset opening

Using the dataset library, you download only the pre-patched raw version for simplicity.

from dataset import load_dataset

# There are two available configurations, "post-fire" and "pre-post-fire."
dataset = load_dataset("DarthReca/california_burned_areas", name="post-fire")

The dataset was compressed using h5py and BZip2 from hdf5plugin. WARNING: hdf5plugin is necessary to extract data.

Data Instances

Each matrix has a shape of 5490x5490xC, where C is 12 for pre-fire and post-fire images, while it is 0 for binary masks. Pre-patched version is provided with matrices of size 512x512xC, too. In this case, only mask with at least one positive pixel is present.

You can find two versions of the dataset: raw (without any transformation) and normalized (with data normalized in the range 0-255). Our suggestion is to use the raw version to have the possibility to apply any wanted pre-processing step.

Data Fields

In each standard HDF5 file, you can find post-fire, pre-fire images, and binary masks. The file is structured in this way:

├── foldn
│   ├── uid0
│   │   ├── pre_fire
│   │   ├── post_fire
│   │   ├── mask 
│   ├── uid1
│       ├── post_fire
│       ├── mask
│  
├── foldm
    ├── uid2
    │   ├── post_fire
    │   ├── mask 
    ├── uid3
        ├── pre_fire
        ├── post_fire
        ├── mask
...

where foldn and foldm are fold names and uidn is a unique identifier for the wildfire.

For the pre-patched version, the structure is:

root
|
|-- uid0_x: {post_fire, pre_fire, mask}
|
|-- uid0_y: {post_fire, pre_fire, mask}
|
|-- uid1_x: {post_fire, mask}
|
...

the fold name is stored as an attribute.

Data Splits

There are 5 random splits whose names are: 0, 1, 2, 3, and 4.

Source Data

Data are collected directly from Copernicus Open Access Hub through the API. The band files are aggregated into one single matrix.

Additional Information

Licensing Information

This work is under OpenRAIL license.

Citation Information

If you plan to use this dataset in your work please give the credit to Sentinel-2 mission and the California Department of Forestry and Fire Protection and cite using this BibTex:

@article{cabuar,
  title={Ca{B}u{A}r: California {B}urned {A}reas dataset for delineation},
  author={Rege Cambrin, Daniele and Colomba, Luca and Garza, Paolo},
  journal={IEEE Geoscience and Remote Sensing Magazine},
  doi={10.1109/MGRS.2023.3292467},
  year={2023} 
}

Downloads last month: 21

Edit dataset card

Evaluate models Model Database Leaderboard

Paper: