Datasets:
The dataset viewer is not available for this dataset.
Error code: JobManagerCrashedError
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for Crello
Dataset Summary
The Crello dataset is compiled for the study of vector graphic documents. The dataset contains document meta-data such as canvas size and pre-rendered elements such as images or text boxes. The original templates were collected from crello.com (now create.vista.com) and converted to a low-resolution format suitable for machine learning analysis.
Supported Tasks and Leaderboards
CanvasVAE studies unsupervised document generation.
Languages
Almost all design templates use English.
Dataset Structure
Data Instances
Each instance has scalar attributes (canvas) and sequence attributes (elements). Categorical values are stored as integer values. Check ClassLabel
features of the dataset for the list of categorical labels.
{'id': '592d6c2c95a7a863ddcda140',
'length': 8,
'group': 4,
'format': 20,
'canvas_width': 3,
'canvas_height': 1,
'category': 0,
'title': 'Beauty Blog Ad Woman with Unusual Hairstyle',
'type': [1, 3, 3, 3, 3, 4, 4, 4],
'left': [0.0,
-0.0009259259095415473,
0.24444444477558136,
0.5712962746620178,
0.2657407522201538,
0.369228333234787,
0.2739444375038147,
0.44776931405067444],
'top': [0.0,
-0.0009259259095415473,
0.37037035822868347,
0.41296297311782837,
0.41296297311782837,
0.8946287035942078,
0.4549448788166046,
0.40591198205947876],
'width': [1.0,
1.0018517971038818,
0.510185182094574,
0.16296295821666718,
0.16296295821666718,
0.30000001192092896,
0.4990740716457367,
0.11388888955116272],
'height': [1.0,
1.0018517971038818,
0.25833332538604736,
0.004629629664123058,
0.004629629664123058,
0.016611294820904732,
0.12458471953868866,
0.02657807245850563],
'opacity': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
'text': ['', '', '', '', '', 'STAY WITH US', 'FOLLOW', 'PRESS'],
'font': [0, 0, 0, 0, 0, 152, 172, 152],
'font_size': [0.0, 0.0, 0.0, 0.0, 0.0, 18.0, 135.0, 30.0],
'text_align': [0, 0, 0, 0, 0, 2, 2, 2],
'angle': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
'capitalize': [0, 0, 0, 0, 0, 0, 0, 0],
'line_height': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
'letter_spacing': [0.0, 0.0, 0.0, 0.0, 0.0, 14.0, 12.55813980102539, 3.0],
'suitability': [0],
'keywords': ['beautiful',
'beauty',
'blog',
'blogging',
'caucasian',
'cute',
'elegance',
'elegant',
'fashion',
'fashionable',
'femininity',
'glamour',
'hairstyle',
'luxury',
'model',
'stylish',
'vogue',
'website',
'woman',
'post',
'instagram',
'ig',
'insta',
'fashion',
'purple'],
'industries': [1, 8, 13],
'color': [[153.0, 118.0, 96.0],
[34.0, 23.0, 61.0],
[34.0, 23.0, 61.0],
[255.0, 255.0, 255.0],
[255.0, 255.0, 255.0],
[255.0, 255.0, 255.0],
[255.0, 255.0, 255.0],
[255.0, 255.0, 255.0]],
'image': [<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>,
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=256x256>]}
To get a label for categorical values, use the int2str
method:
key = "font"
example = dataset[0]
dataset.features[key].int2str(example[key])
Data Fields
In the following, categorical fields are shown as categorical
type, but the actual storage is int64
.
Canvas attributes
Field | Type | Shape | Description |
---|---|---|---|
id | string | () | Template ID from crello.com |
group | categorical | () | Broad design groups, such as social media posts or blog headers |
format | categorical | () | Detailed design formats, such as Instagram post or postcard |
category | categorical | () | Topic category of the design, such as holiday celebration |
canvas_width | categorical | () | Canvas pixel width |
canvas_height | categorical | () | Canvas pixel height |
length | int64 | () | Length of elements |
suitability | categorical | (None,) | List of display tags, only mobile tag exists |
keywords | string | (None,) | List of keywords associated to this template |
industries | categorical | (None,) | List of industry tags like marketingAds |
Element attributes
Field | Type | Shape | Description |
---|---|---|---|
type | categorical | (None,) | Element type, such as vector shape, image, or text |
left | float32 | (None,) | Element left position normalized to [0, 1] range w.r.t. canvas_width |
top | float32 | (None,) | Element top position normalized to [0, 1] range w.r.t. canvas_height |
width | float32 | (None,) | Element width normalized to [0, 1] range w.r.t. canvas_width |
height | float32 | (None,) | Element height normalized to [0, 1] range w.r.t. canvas_height |
color | int64 | (None, 3) | Extracted main RGB color of the element |
opacity | float32 | (None,) | Opacity in [0, 1] range |
image | image | (None,) | Pre-rendered 256x256 preview of the element encoded in PNG format |
text | string | (None,) | Text content in UTF-8 encoding for text element |
font | categorical | (None,) | Font family name for text element |
font_size | float32 | (None,) | Font size (height) in pixels |
text_align | categorical | (None,) | Horizontal text alignment, left, center, right for text element |
angle | float32 | (None,) | Element rotation angle (radian) w.r.t. the center of the element |
capitalize | categorical | (None,) | Binary flag to capitalize letters |
line_height | float32 | (None,) | Scaling parameter to line height, default is 1.0 |
letter_spacing | float32 | (None,) | Adjustment parameter for letter spacing, default is 0.0 |
Note that the color and pre-rendered images do not necessarily accurately reproduce the original design templates. The original template is accessible at the following URL if still available.
https://create.vista.com/artboard/?template=<template_id>
left
and top
can be negative because elements can be bigger than the canvas size.
Data Splits
The Crello dataset has 3 splits: train, validation, and test. The current split is generated such that the same title of the original template shows up in only in one split.
Split | Count |
---|---|
train | 18659 |
validaton | 2391 |
test | 2371 |
Visualization
Each example can be visualized in the following approach using skia-python
. Note the following does not guarantee a similar appearance to the original template. Currently, the quality of text rendering is far from perfect.
import io
from typing import Any, Dict
import numpy as np
import skia
def render(features: datasets.Features, example: Dict[str, Any], max_size: float=512.) -> bytes:
"""Render parsed sequence example onto an image and return as PNG bytes."""
canvas_width = int(features["canvas_width"].int2str(example["canvas_width"]))
canvas_height = int(features["canvas_height"].int2str(example["canvas_height"]))
scale = min(1.0, max_size / canvas_width, max_size / canvas_height)
surface = skia.Surface(int(scale * canvas_width), int(scale * canvas_height))
with surface as canvas:
canvas.scale(scale, scale)
for index in range(example["length"]):
pil_image = example["image"][index]
image = skia.Image.frombytes(
pil_image.convert('RGBA').tobytes(),
pil_image.size,
skia.kRGBA_8888_ColorType)
left = example["left"][index] * canvas_width
top = example["top"][index] * canvas_height
width = example["width"][index] * canvas_width
height = example["height"][index] * canvas_height
rect = skia.Rect.MakeXYWH(left, top, width, height)
paint = skia.Paint(Alphaf=example["opacity"][index], AntiAlias=True)
angle = example["angle"][index]
with skia.AutoCanvasRestore(canvas):
if angle != 0:
degree = 180. * angle / np.pi
canvas.rotate(degree, left + width / 2., top + height / 2.)
canvas.drawImageRect(image, rect, paint=paint)
image = surface.makeImageSnapshot()
with io.BytesIO() as f:
image.save(f, skia.kPNG)
return f.getvalue()
Dataset Creation
Curation Rationale
The Crello dataset is compiled for the general study of vector graphic documents, with the goal of producing a dataset that offers complete vector graphic information suitable for neural methodologies.
Source Data
Initial Data Collection and Normalization
The dataset is initially scraped from the former crello.com
and pre-processed to the above format.
Who are the source language producers?
While create.vista.com owns those templates, the templates seem to be originally created by a specific group of design studios.
Personal and Sensitive Information
The dataset does not contain any personal information about the creator but may contain a picture of people in the design template.
Considerations for Using the Data
Social Impact of Dataset
This dataset was developed for advancing the general study of vector graphic documents, especially for generative systems of graphic design. Successful utilization might enable the automation of creative workflow that human designers get involved in.
Discussion of Biases
The templates contained in the dataset reflect the biases appearing in the source data, which could present gender biases in specific design categories.
Other Known Limitations
Due to the unknown data specification of the source data, the color and pre-rendered images do not necessarily accurately reproduce the original design templates. The original template is accessible at the following URL if still available.
https://create.vista.com/artboard/?template=<template_id>
Additional Information
Dataset Curators
The Crello dataset was developed by Kota Yamaguchi.
Licensing Information
The origin of the dataset is create.vista.com (formally, crello.com
).
The distributor ("We") do not own the copyrights of the original design templates.
By using the Crello dataset, the user of this dataset ("You") must agree to the
VistaCreate License Agreements.
The dataset is distributed under CDLA-Permissive-2.0 license.
Note
We do not re-distribute the original files as we are not allowed by terms.
Citation Information
@article{yamaguchi2021canvasvae,
title={CanvasVAE: Learning to Generate Vector Graphic Documents},
author={Yamaguchi, Kota},
journal={ICCV},
year={2021}
}
Releases
3.1: bugfix release (Feb 16, 2023)
- Fix a bug that ignores newline characters in some of the texts
3.0: v3 release (Feb 13, 2023)
- Migrate to Model Database Hub.
- Fix various text rendering bugs.
- Change split generation criteria for avoiding near-duplicates: no compatibility with v2 splits.
- Incorporate a motion picture thumbnail in templates.
- Add
title
,keywords
,suitability
, andindustries
canvas attributes. - Add
capitalize
,line_height
, andletter_spacing
element attributes.
2.0: v2 release (May 26, 2022)
- Add
text
,font
,font_size
,text_align
, andangle
element attributes. - Include rendered text element in
image_bytes
.
1.0: v1 release (Aug 24, 2021)
Contributions
Thanks to @kyamagu for adding this dataset.
- Downloads last month
- 693