VQDiffusionScheduler
VQDiffusionScheduler
converts the transformer model’s output into a sample for the unnoised image at the previous diffusion timestep. It was introduced in Vector Quantized Diffusion Model for TexttoImage Synthesis by Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo.
The abstract from the paper is:
We present the vector quantized diffusion (VQDiffusion) model for texttoimage generation. This method is based on a vector quantized variational autoencoder (VQVAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latentspace method is wellsuited for texttoimage generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a maskandreplace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQDiffusion produces significantly better texttoimage generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GANbased texttoimage methods, our VQDiffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the texttoimage generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQDiffusion allows us to achieve a better tradeoff between quality and speed. Our experiments indicate that the VQDiffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.
VQDiffusionScheduler
class diffusers.VQDiffusionScheduler
< source >( num_vec_classes: int num_train_timesteps: int = 100 alpha_cum_start: float = 0.99999 alpha_cum_end: float = 9e06 gamma_cum_start: float = 9e06 gamma_cum_end: float = 0.99999 )
Parameters

num_vec_classes (
int
) — The number of classes of the vector embeddings of the latent pixels. Includes the class for the masked latent pixel. 
num_train_timesteps (
int
, defaults to 100) — The number of diffusion steps to train the model. 
alpha_cum_start (
float
, defaults to 0.99999) — The starting cumulative alpha value. 
alpha_cum_end (
float
, defaults to 0.00009) — The ending cumulative alpha value. 
gamma_cum_start (
float
, defaults to 0.00009) — The starting cumulative gamma value. 
gamma_cum_end (
float
, defaults to 0.99999) — The ending cumulative gamma value.
A scheduler for vector quantized diffusion.
This model inherits from SchedulerMixin and ConfigMixin. Check the superclass documentation for the generic methods the library implements for all schedulers such as loading and saving.
log_Q_t_transitioning_to_known_class
< source >(
t: torch.int32
x_t: LongTensor
log_onehot_x_t: FloatTensor
cumulative: bool
)
→
torch.FloatTensor
of shape (batch size, num classes  1, num latent pixels)
Parameters

t (
torch.Long
) — The timestep that determines which transition matrix is used. 
x_t (
torch.LongTensor
of shape(batch size, num latent pixels)
) — The classes of each latent pixel at timet
. 
log_onehot_x_t (
torch.FloatTensor
of shape(batch size, num classes, num latent pixels)
) — The log onehot vectors ofx_t
. 
cumulative (
bool
) — If cumulative isFalse
, the single step transition matrixt1
>t
is used. If cumulative isTrue
, the cumulative transition matrix0
>t
is used.
Returns
torch.FloatTensor
of shape (batch size, num classes  1, num latent pixels)
Each column of the returned matrix is a row of log probabilities of the complete probability transition matrix.
When non cumulative, returns self.num_classes  1
rows because the initial latent pixel cannot be
masked.
Where:
q_n
is the probability distribution for the forward process of then
th latent pixel. C_0 is a class of a latent pixel embedding
 C_k is the class of the masked latent pixel
noncumulative result (omitting logarithms):
cumulative result (omitting logarithms):
Calculates the log probabilities of the rows from the (cumulative or noncumulative) transition matrix for each
latent pixel in x_t
.
q_posterior
< source >(
log_p_x_0
x_t
t
)
→
torch.FloatTensor
of shape (batch size, num classes, num latent pixels)
Parameters

log_p_x_0 (
torch.FloatTensor
of shape(batch size, num classes  1, num latent pixels)
) — The log probabilities for the predicted classes of the initial latent pixels. Does not include a prediction for the masked class as the initial unnoised image cannot be masked. 
x_t (
torch.LongTensor
of shape(batch size, num latent pixels)
) — The classes of each latent pixel at timet
. 
t (
torch.Long
) — The timestep that determines which transition matrix is used.
Returns
torch.FloatTensor
of shape (batch size, num classes, num latent pixels)
The log probabilities for the predicted classes of the image at timestep t1
.
set_timesteps
< source >( num_inference_steps: int device: typing.Union[str, torch.device] = None )
Sets the discrete timesteps used for the diffusion chain (to be run before inference).
step
< source >(
model_output: FloatTensor
timestep: torch.int64
sample: LongTensor
generator: typing.Optional[torch._C.Generator] = None
return_dict: bool = True
)
→
VQDiffusionSchedulerOutput or tuple
Parameters

t (
torch.long
) — The timestep that determines which transition matrices are used. 
x_t (
torch.LongTensor
of shape(batch size, num latent pixels)
) — The classes of each latent pixel at timet
. 
generator (
torch.Generator
, orNone
) — A random number generator for the noise applied top(x_{t1}  x_t)
before it is sampled from. 
return_dict (
bool
, optional, defaults toTrue
) — Whether or not to return a VQDiffusionSchedulerOutput ortuple
.
Returns
VQDiffusionSchedulerOutput or tuple
If return_dict is True
, VQDiffusionSchedulerOutput is
returned, otherwise a tuple is returned where the first element is the sample tensor.
Predict the sample from the previous timestep by the reverse transition distribution. See q_posterior() for more details about how the distribution is computer.
VQDiffusionSchedulerOutput
class diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput
< source >( prev_sample: LongTensor )
Output class for the scheduler’s step function output.