What models can I use for Text-to-Image?

The CompVis/stable-diffusion-v1-4, dalle-mini/dalle-mega, DeepFloyd/IF-I-XL-v1.0, and kakaobrain/karlo-v1-alpha models can be used for Text-to-Image.

What datasets can I use for Text-to-Image?

The red_capsand conceptual_captions datasets can be used for Text-to-Image.

What metrics can I use for Text-to-Image?

The IS, FID, and R-Precision metrics can be used for Text-to-Image.

Tasks

Text-to-Image

Generates images from input text. These models can be used to generate and modify images based on text prompts.

Inputs

Input

A city above clouds, pastel colors, Victorian style

Text-to-Image Model

Output

About Text-to-Image

Use Cases

Data Generation

Businesses can generate data for their their use cases by inputting text and getting image outputs.

Immersive Conversational Chatbots

Chatbots can be made more immersive if they provide contextual images based on the input provided by the user.

Creative Ideas for Fashion Industry

Different patterns can be generated to obtain unique pieces of fashion. Text-to-image models make creations easier for designers to conceptualize their design before actually implementing it.

Architecture Industry

Architects can utilise the models to construct an environment based out on the requirements of the floor plan. This can also include the furniture that has to be placed in that environment.

Task Variants

You can contribute variants of this task here.

Inference

You can use diffusers pipelines to infer with text-to-image models.

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "stabilityai/stable-diffusion-2"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

You can use Model Database.js to infer text-to-image models on Model Database Hub.

import { HfInference } from "@Model Database/inference";

const inference = new HfInference(HF_ACCESS_TOKEN);
await inference.textToImage({
  model: 'stabilityai/stable-diffusion-2',
  inputs: 'award winning high resolution photo of a giant tortoise/((ladybird)) hybrid, [trending on artstation]',
  parameters: {
    negative_prompt: 'blurry',
  }
})

Useful Resources

This page was made possible thanks to the efforts of Ishan Dutta, Enrique Elias Ubaldo and Oğuz Akif.

Deploy on Inference Endpoints

Text-to-Image demo

using CompVis/stable-diffusion-v1-4

Text-to-Image

Examples

This model can be loaded on the Inference API on-demand.

Models for Text-to-Image

Browse Models (9,710)

CompVis/stable-diffusion-v1-4

Text-to-Image • Updated 28 days ago • 608k • 5.93k

Note A latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

dalle-mini/dalle-mega

Text-to-Image • Updated Jan 11 • 20 • 136

Note A model that can be used to generate images based on text prompts. The DALL·E Mega model is the largest version of DALLE Mini.

DeepFloyd/IF-I-XL-v1.0

Text-to-Image • Updated Jun 2 • 64.7k • 502

Note A text-to-image model that can generate coherent text inside image.

kakaobrain/karlo-v1-alpha

Text-to-Image • Updated Feb 6 • 2.3k • 76

Note A powerful text-to-image model.

Datasets for Text-to-Image

Browse Datasets (2,428)

red_caps

Viewer • Updated Jan 25 • 294k • 43

Note RedCaps is a large-scale dataset of 12M image-text pairs collected from Reddit.

conceptual_captions

Viewer • Updated Nov 3, 2022 • 1.02k • 35

Note Conceptual Captions is a dataset consisting of ~3.3M images annotated with captions.

Spaces using Text-to-Image

🔥

stabilityai/stable-diffusion

Note A powerful text-to-image application.

🔥

DeepFloyd/IF

Note An text-to-image application that can generate coherent text inside the image.

🖌️🎨

kakaobrain/karlo

Note An powerful text-to-image application that can generate images.

🧢

hysts/Shap-E

Note An powerful text-to-image application that can generates 3D representations.

Metrics for Text-to-Image

IS: The Inception Score (IS) measure assesses diversity and meaningfulness. It uses a generated image sample to predict its label. A higher score signifies more diverse and meaningful images.

FID: The Fréchet Inception Distance (FID) calculates the distance between distributions between synthetic and real samples. A lower FID score indicates better similarity between the distributions of real and generated images.

R-Precision: R-precision assesses how the generated image aligns with the provided text description. It uses the generated images as queries to retrieve relevant text descriptions. The top 'r' relevant descriptions are selected and used to calculate R-precision as r/R, where 'R' is the number of ground truth descriptions associated with the generated images. A higher R-precision value indicates a better model.