Models
Generic model classes
NeuronBaseModel
The NeuronBaseModel
class is available for instantiating a base Neuron model without a specific head.
It is used as the base class for all tasks but text generation.
class optimum.neuron.NeuronBaseModel
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Base class running compiled and optimized models on Neuron devices.
It implements generic methods for interacting with the Model Database Hub as well as compiling vanilla
transformers models to neuron-optimized TorchScript module and export it using optimum.exporters.neuron
toolchain.
Class attributes:
- model_type (
str
, optional, defaults to"neuron_model"
) β The name of the model type to use when registering the NeuronBaseModel classes. - auto_model_class (
Type
, optional, defaults toAutoModel
) β TheAutoModel
class to be represented by the current NeuronBaseModel class.
Common attributes:
- model (
torch.jit._script.ScriptModule
) β The loadedScriptModule
compiled for neuron devices. - config (PretrainedConfig) β The configuration of the model.
- model_save_dir (
Path
) β The directory where a neuron compiled model is saved. By default, if the loaded model is local, the directory where the original model will be used. Otherwise, the cache directory will be used.
Gets a dictionary of inputs with their valid static shapes.
load_model
< source >( path: Union )
Loads a TorchScript module compiled by neuron(x)-cc compiler. It will be first loaded onto CPU and then moved to one or multiple NeuronCore.
remove_padding
< source >( outputs: List dims: List indices: List )
Removes padding from output tensors.
NeuronDecoderModel
The NeuronDecoderModel
class is the base class for text generation models.
class optimum.neuron.NeuronDecoderModel
< source >( model: Module config: PretrainedConfig model_path: Union generation_config: Optional = None )
Base class to convert and run pre-trained transformers decoder models on Neuron devices.
It implements the methods to convert a pre-trained transformers decoder model into a Neuron transformer model by:
- transferring the checkpoint weights of the original into an optimized neuron graph,
- compiling the resulting graph using the Neuron compiler.
Common attributes:
- model (
torch.nn.Module
) β The decoder model with a graph optimized for neuron devices. - config (PretrainedConfig) β The configuration of the original model.
- generation_config (GenerationConfig) β The generation configuration used by default when calling
generate()
.
Natural Language Processing
The following Neuron model classes are available for natural language processing tasks.
NeuronModelForFeatureExtraction
class optimum.neuron.NeuronModelForFeatureExtraction
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Parameters
-
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronBaseModel.from_pretrained
method to load the model weights. -
model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript graph compiled by neuron(x) compiler.
Neuron Model with a BaseModelOutput for feature-extraction tasks.
This model inherits from ~neuron.modeling.NeuronBaseModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Feature Extraction model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: Optional = None **kwargs )
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? -
attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
-
token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForFeatureExtraction forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of feature extraction: (Following model is compiled with neuronx compiler and can only be run on INF2. Replace βneuronxβ with βneuronβ if you are using INF1.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForFeatureExtraction
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2-neuronx")
>>> model = NeuronModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2-neuronx")
>>> inputs = tokenizer("Dear Evan Hansen is the winner of six Tony Awards.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> list(last_hidden_state.shape)
[1, 13, 384]
NeuronModelForMaskedLM
class optimum.neuron.NeuronModelForMaskedLM
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Parameters
-
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronBaseModel.from_pretrained
method to load the model weights. -
model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript graph compiled by neuron(x) compiler.
Neuron Model with a MaskedLMOutput for masked language modeling tasks.
This model inherits from ~neuron.modeling.NeuronBaseModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Masked language model for on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: Optional = None **kwargs )
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? -
attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
-
token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForMaskedLM forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of fill mask: (Following model is compiled with neuronx compiler and can only be run on INF2. Replace βneuronxβ with βneuronβ if you are using INF1.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForMaskedLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> model = NeuronModelForMaskedLM.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> inputs = tokenizer("This [MASK] Agreement is between General Motors and John Murray.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 13, 30522]
NeuronModelForSequenceClassification
class optimum.neuron.NeuronModelForSequenceClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Parameters
-
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronBaseModel.from_pretrained
method to load the model weights. -
model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript graph compiled by neuron(x) compiler.
Neuron Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from ~neuron.modeling.NeuronBaseModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Sequence Classification model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: Optional = None **kwargs )
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? -
attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
-
token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForSequenceClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of single-label classification: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx")
>>> model = NeuronModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx")
>>> inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 2]
NeuronModelForQuestionAnswering
class optimum.neuron.NeuronModelForQuestionAnswering
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Parameters
-
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronBaseModel.from_pretrained
method to load the model weights. -
model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript graph compiled by neuron(x) compiler.
Neuron Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD.
This model inherits from ~neuron.modeling.NeuronBaseModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Question Answering model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: Optional = None **kwargs )
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? -
attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
-
token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForQuestionAnswering forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of question answering: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> import torch
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForQuestionAnswering
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2-neuronx")
>>> model = NeuronModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2-neuronx")
>>> question, text = "Are there wheelchair spaces in the theatres?", "Yes, we have reserved wheelchair spaces with a good view."
>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([12])
>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits
NeuronModelForTokenClassification
class optimum.neuron.NeuronModelForTokenClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Parameters
-
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronBaseModel.from_pretrained
method to load the model weights. -
model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript graph compiled by neuron(x) compiler.
Neuron Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
This model inherits from ~neuron.modeling.NeuronBaseModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Token Classification model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: Optional = None **kwargs )
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? -
attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
-
token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForTokenClassification forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of token classification: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForTokenClassification
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER-neuronx")
>>> model = NeuronModelForTokenClassification.from_pretrained("optimum/bert-base-NER-neuronx")
>>> inputs = tokenizer("Lin-Manuel Miranda is an American songwriter, actor, singer, filmmaker, and playwright.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 20, 9]
NeuronModelForMultipleChoice
class optimum.neuron.NeuronModelForMultipleChoice
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: Union = None model_file_name: Optional = None preprocessors: Optional = None neuron_config: Optional = None **kwargs )
Parameters
-
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronBaseModel.from_pretrained
method to load the model weights. -
model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript graph compiled by neuron(x) compiler.
Neuron Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
This model inherits from ~neuron.modeling.NeuronBaseModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Multiple choice model on Neuron devices.
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: Optional = None **kwargs )
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, num_choices, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? -
attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, num_choices, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
-
token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, num_choices, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronModelForMultipleChoice forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of mutliple choice: (Following model is compiled with neuronx compiler and can only be run on INF2.)
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForMultipleChoice
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased_SWAG-neuronx")
>>> model = NeuronModelForMultipleChoice.from_pretrained("optimum/bert-base-uncased_SWAG-neuronx", export=True)
>>> num_choices = 4
>>> first_sentence = ["Members of the procession walk down the street holding small horn brass instruments."] * num_choices
>>> second_sentence = [
... "A drum line passes by walking down the street playing their instruments.",
... "A drum line has heard approaching them.",
... "A drum line arrives and they're outside dancing and asleep.",
... "A drum line turns the lead singer watches the performance."
... ]
>>> inputs = tokenizer(first_sentence, second_sentence, truncation=True, padding=True)
# Unflatten the inputs values expanding it to the shape [batch_size, num_choices, seq_length]
>>> for k, v in inputs.items():
... inputs[k] = [v[i: i + num_choices] for i in range(0, len(v), num_choices)]
>>> inputs = dict(inputs.convert_to_tensors(tensor_type="pt"))
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> logits.shape
[1, 4]
NeuronModelForCausalLM
class optimum.neuron.NeuronModelForCausalLM
< source >( model: Module config: PretrainedConfig model_path: Union generation_config: Optional = None )
Parameters
-
model (
torch.nn.Module
) — torch.nn.Module is the neuron decoder graph. -
config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. -
model_path (
Path
) — The directory where the compiled artifacts for the model are stored. It can be a temporary directory if the model has never been saved locally before. -
generation_config (
transformers.GenerationConfig
) — GenerationConfig holds the configuration for the model generation task.
Neuron model with a causal language modeling head for inference on Neuron devices.
This model inherits from ~neuron.modeling.NeuronDecoderModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
Returns True to validate the check made in GenerationMixin.generate()
.
forward
< source >( input_ids: Tensor cache_ids: Tensor start_ids: Tensor = None return_dict: bool = True )
Parameters
-
input_ids (
torch.LongTensor
) — Indices of decoder input sequence tokens in the vocabulary of shape(batch_size, sequence_length)
. -
cache_ids (
torch.LongTensor
) — The indices at which the cached key and value for the current inputs need to be stored. -
start_ids (
torch.LongTensor
) — The indices of the first tokens to be processed, deduced form the attention masks.
The NeuronModelForCausalLM forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Example of text generation:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForCausalLM
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = NeuronModelForCausalLM.from_pretrained("gpt2", export=True)
>>> inputs = tokenizer("My favorite moment of the day is", return_tensors="pt")
>>> gen_tokens = model.generate(**inputs, do_sample=True, temperature=0.9, min_length=20, max_length=20)
>>> tokenizer.batch_decode(gen_tokens)
generate
< source >(
input_ids: Tensor
attention_mask: Optional = None
generation_config: Optional = None
**kwargs
)
β
torch.Tensor
Parameters
-
input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — The sequence used as a prompt for the generation. -
attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. -
generation_config (
~transformers.generation.GenerationConfig
, optional) — The generation configuration to be used as base parametrization for the generation call.**kwargs
passed to generate matching the attributes ofgeneration_config
will override them. Ifgeneration_config
is not provided, default will be used, which had the following loading priority: 1) from thegeneration_config.json
model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inheritGenerationConfig
’s default values, whose documentation should be checked to parameterize generation.
Returns
torch.Tensor
A torch.FloatTensor
.
A streamlined generate() method overriding the transformers.GenerationMixin.generate() method.
This method uses the same logits processors/warpers and stopping criterias as the transformers library
generate()
method but restricts the generation to greedy search and sampling.
It does not support transformers generate()
advanced options.
Please refer to https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationMixin.generate for details on generation configuration.
generate_tokens
< source >(
input_ids: LongTensor
selector: TokenSelector
batch_size: int
attention_mask: Optional = None
**model_kwargs
)
β
torch.LongTensor
Parameters
-
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) — The sequence used as a prompt for the generation. -
selector (
TokenSelector
) — The object implementing the generation logic based on transformers processors and stopping criterias. -
batch_size (
int
) — The actual input batch size. Used to avoid generating tokens for padded inputs. -
attention_mask (
torch.Tensor
of shape(batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. model_kwargs — Additional model specific kwargs will be forwarded to theforward
function of the model.
Returns
torch.LongTensor
A torch.LongTensor
containing the generated tokens.
Generate tokens using sampling or greedy search.
Stable Diffusion
NeuronStableDiffusionPipelineBase
class optimum.neuron.modeling_diffusion.NeuronStableDiffusionPipelineBase
< source >( text_encoder: ScriptModule unet: ScriptModule vae_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae_encoder: Optional = None text_encoder_2: Optional = None tokenizer_2: Optional = None feature_extractor: Optional = None device_ids: Optional = None configs: Optional = None neuron_configs: Optional = None model_save_dir: Union = None model_and_config_save_paths: Optional = None )
load_model
< source >( text_encoder_path: Union unet_path: Union vae_decoder_path: Union vae_encoder_path: Union = None text_encoder_2_path: Union = None device_ids: Optional = None dynamic_batch_size: bool = False )
Parameters
-
text_encoder_path (
Union[str, Path]
) — Path of the compiled text encoder. -
unet_path (
Union[str, Path]
) — Path of the compiled U-NET. -
vae_decoder_path (
Union[str, Path]
) — Path of the compiled VAE decoder. -
vae_encoder_path (
Optional[Union[str, Path]]
, defaults toNone
) — Path of the compiled VAE encoder. It is optional, only used for tasks taking images as input. -
text_encoder_2_path (
Optional[Union[str, Path]]
, defaults toNone
) — Path of the compiled second frozen text encoder. SDXL only. -
device_ids (
Optional[List[int]]
, defaults toNone
) — The ID of neuron cores to load a model, in the case of stable diffusion, it is only used for loading unet, and by default unet will be loaded onto both neuron cores of a device. -
dynamic_batch_size (
bool
, defaults toFalse
) — Whether enable dynamic batch size for neuron compiled model. IfTrue
, the input batch size can be a multiple of the batch size during the compilation.
Loads Stable Diffusion TorchScript modules compiled by neuron(x)-cc compiler. It will be first loaded onto CPU and then moved to one or multiple NeuronCore.
NeuronStableDiffusionPipeline
class optimum.neuron.NeuronStableDiffusionPipeline
< source >( text_encoder: ScriptModule unet: ScriptModule vae_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae_encoder: Optional = None text_encoder_2: Optional = None tokenizer_2: Optional = None feature_extractor: Optional = None device_ids: Optional = None configs: Optional = None neuron_configs: Optional = None model_save_dir: Union = None model_and_config_save_paths: Optional = None )
__call__
< source >(
prompt: Union = None
num_inference_steps: int = 50
guidance_scale: float = 7.5
negative_prompt: Union = None
num_images_per_prompt: int = 1
eta: float = 0.0
generator: Union = None
latents: Optional = None
prompt_embeds: Optional = None
negative_prompt_embeds: Optional = None
output_type: Optional = 'pil'
return_dict: bool = True
callback: Optional = None
callback_steps: int = 1
cross_attention_kwargs: Optional = None
guidance_rescale: float = 0.0
)
β
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
-
prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. -
num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. -
guidance_scale (
float
, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. -
negative_prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). -
num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching). -
eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. -
generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. -
latents (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. -
prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. -
negative_prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. -
output_type (
Optional[str]
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. -
return_dict (
bool
, defaults toTrue
) — Whether or not to return adiffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. -
callback (
Optional[Callable]
, defaults toNone
) — A function that calls everycallback_steps
steps during inference. The function is called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
. -
callback_steps (
int
, defaults to 1) — The frequency at which thecallback
function is called. If not specified, the callback is called at every step. -
cross_attention_kwargs (
dict
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. -
guidance_rescale (
float
, defaults to 0.0) — Guidance rescale factor from Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned where the first element is a list with the generated images and the
second element is a list of bool
s indicating whether the corresponding generated image contains
βnot-safe-for-workβ (nsfw) content.
The call function to the pipeline for generation.
Examples:
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
... "runwayml/stable-diffusion-v1-5", export=True, **compiler_args, **input_shapes
... )
>>> stable_diffusion.save_pretrained("sd_neuron/")
>>> prompt = "a photo of an astronaut riding a horse on mars"
>>> image = stable_diffusion(prompt).images[0]
NeuronStableDiffusionImg2ImgPipeline
class optimum.neuron.NeuronStableDiffusionImg2ImgPipeline
< source >( text_encoder: ScriptModule unet: ScriptModule vae_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae_encoder: Optional = None text_encoder_2: Optional = None tokenizer_2: Optional = None feature_extractor: Optional = None device_ids: Optional = None configs: Optional = None neuron_configs: Optional = None model_save_dir: Union = None model_and_config_save_paths: Optional = None )
__call__
< source >(
prompt: Union = None
image: Union = None
strength: float = 0.8
num_inference_steps: int = 50
guidance_scale: float = 7.5
negative_prompt: Union = None
num_images_per_prompt: int = 1
eta: float = 0.0
generator: Optional = None
prompt_embeds: Optional = None
negative_prompt_embeds: Optional = None
output_type: str = 'pil'
return_dict: bool = True
callback: Optional = None
callback_steps: int = 1
cross_attention_kwargs: Optional = None
)
β
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
-
prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. -
image (
torch.FloatTensor
,PIL.Image.Image
,np.ndarray
,List[torch.FloatTensor]
,List[PIL.Image.Image]
, orList[np.ndarray]
) —Image
, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between[0, 1]
If it’s a tensor or a list or tensors, the expected shape should be(B, C, H, W)
or(C, H, W)
. If it is a numpy array or a list of arrays, the expected shape should be(B, H, W, C)
or(H, W, C)
It can also accept image latents asimage
, but if passing latents directly it is not encoded again. -
strength (
float
, defaults to 0.8) — Indicates extent to transform the referenceimage
. Must be between 0 and 1.image
is used as a starting point and more noise is added the higher thestrength
. The number of denoising steps depends on the amount of noise initially added. Whenstrength
is 1, added noise is maximum and the denoising process runs for the full number of iterations specified innum_inference_steps
. A value of 1 essentially ignoresimage
. -
num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. This parameter is modulated bystrength
. -
guidance_scale (
float
, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. -
negative_prompt (
Optional[Union[str, List[str]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). -
num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching). -
eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. -
generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. -
prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. -
negative_prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. -
output_type (
Optional[str]
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. -
return_dict (
bool
, defaults toTrue
) — Whether or not to return adiffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. -
callback (
Optional[Callable]
, defaults toNone
) — A function that calls everycallback_steps
steps during inference. The function is called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
. -
callback_steps (
int
, defaults to 1) — The frequency at which thecallback
function is called. If not specified, the callback is called at every step. -
cross_attention_kwargs (
dict
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned where the first element is a list with the generated images and the
second element is a list of bool
s indicating whether the corresponding generated image contains
βnot-safe-for-workβ (nsfw) content.
The call function to the pipeline for generation.
Examples:
>>> import PIL
>>> import requests
>>> from io import BytesIO
>>> from optimum.neuron import NeuronStableDiffusionImg2ImgPipeline
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
>>> response = requests.get(url)
>>> init_image = Image.open(BytesIO(response.content)).convert("RGB")
>>> init_image = init_image.resize((512, 512))
>>> pipeline = NeuronStableDiffusionImg2ImgPipeline.from_pretrained(
... "nitrosocke/Ghibli-Diffusion", export=True, **input_shapes, device_ids=[0, 1]
... )
>>> pipeline.save_pretrained("sd_img2img/")
>>> prompt = "ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection."
>>> image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
NeuronStableDiffusionInpaintPipeline
class optimum.neuron.NeuronStableDiffusionInpaintPipeline
< source >( text_encoder: ScriptModule unet: ScriptModule vae_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae_encoder: Optional = None text_encoder_2: Optional = None tokenizer_2: Optional = None feature_extractor: Optional = None device_ids: Optional = None configs: Optional = None neuron_configs: Optional = None model_save_dir: Union = None model_and_config_save_paths: Optional = None )
__call__
< source >(
prompt: Union = None
image: Union = None
mask_image: Union = None
masked_image_latents: FloatTensor = None
strength: float = 1.0
num_inference_steps: int = 50
guidance_scale: float = 7.5
negative_prompt: Union = None
num_images_per_prompt: Optional = 1
eta: float = 0.0
generator: Union = None
latents: Optional = None
prompt_embeds: Optional = None
negative_prompt_embeds: Optional = None
output_type: Optional = 'pil'
return_dict: bool = True
callback: Optional = None
callback_steps: int = 1
cross_attention_kwargs: Optional = None
clip_skip: int = None
)
β
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
-
prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. -
image (
torch.FloatTensor
,PIL.Image.Image
,np.ndarray
,List[torch.FloatTensor]
,List[PIL.Image.Image]
, orList[np.ndarray]
) —Image
, numpy array or tensor representing an image batch to be inpainted (which parts of the image to be masked out withmask_image
and repainted according toprompt
). For both numpy array and pytorch tensor, the expected value range is between[0, 1]
If it’s a tensor or a list or tensors, the expected shape should be(B, C, H, W)
or(C, H, W)
. If it is a numpy array or a list of arrays, the expected shape should be(B, H, W, C)
or(H, W, C)
It can also accept image latents asimage
, but if passing latents directly it is not encoded again. -
mask_image (
torch.FloatTensor
,PIL.Image.Image
,np.ndarray
,List[torch.FloatTensor]
,List[PIL.Image.Image]
, orList[np.ndarray]
) —Image
, numpy array or tensor representing an image batch to maskimage
. White pixels in the mask are repainted while black pixels are preserved. Ifmask_image
is a PIL image, it is converted to a single channel (luminance) before use. If it’s a numpy array or pytorch tensor, it should contain one color channel (L) instead of 3, so the expected shape for pytorch tensor would be(B, 1, H, W)
,(B, H, W)
,(1, H, W)
,(H, W)
. And for numpy array would be for(B, H, W, 1)
,(B, H, W)
,(H, W, 1)
, or(H, W)
. -
strength (
float
, defaults to 1.0) — Indicates extent to transform the referenceimage
. Must be between 0 and 1.image
is used as a starting point and more noise is added the higher thestrength
. The number of denoising steps depends on the amount of noise initially added. Whenstrength
is 1, added noise is maximum and the denoising process runs for the full number of iterations specified innum_inference_steps
. A value of 1 essentially ignoresimage
. -
num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. This parameter is modulated bystrength
. -
guidance_scale (
float
, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. -
negative_prompt (
Optional[Union[str, List[str]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). -
num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching). -
eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. -
generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. -
latents (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. -
prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. -
negative_prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. -
output_type (
Optional[str]
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. -
return_dict (
bool
, defaults toTrue
) — Whether or not to return adiffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. -
callback (
Optional[Callable]
, defaults toNone
) — A function that calls everycallback_steps
steps during inference. The function is called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
. -
callback_steps (
int
, defaults to 1) — The frequency at which thecallback
function is called. If not specified, the callback is called at every step. -
cross_attention_kwargs (
dict
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. -
clip_skip (
int
, defaults toNone
) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned where the first element is a list with the generated images and the
second element is a list of bool
s indicating whether the corresponding generated image contains
βnot-safe-for-workβ (nsfw) content.
The call function to the pipeline for generation.
Examples:
>>> import PIL
>>> import requests
>>> from io import BytesIO
>>> from optimum.neuron import NeuronStableDiffusionInpaintPipeline
>>> def download_image(url):
... response = requests.get(url)
... return PIL.Image.open(BytesIO(response.content)).convert("RGB")
>>> img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
>>> mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
>>> init_image = download_image(img_url).resize((512, 512))
>>> mask_image = download_image(mask_url).resize((512, 512))
>>> pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(
... "runwayml/stable-diffusion-inpainting", export=True, **input_shapes, device_ids=[0, 1])
... )
>>> pipeline.save_pretrained("sd_inpaint/")
>>> prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
>>> image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
NeuronStableDiffusionXLPipeline
class optimum.neuron.NeuronStableDiffusionXLPipeline
< source >( text_encoder: ScriptModule unet: ScriptModule vae_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae_encoder: Optional = None text_encoder_2: Optional = None tokenizer_2: Optional = None feature_extractor: Optional = None device_ids: Optional = None configs: Optional = None neuron_configs: Optional = None model_save_dir: Union = None model_and_config_save_paths: Optional = None add_watermarker: Optional = None )
__call__
< source >(
prompt: Union = None
prompt_2: Union = None
num_inference_steps: int = 50
denoising_end: Optional = None
guidance_scale: float = 5.0
negative_prompt: Union = None
negative_prompt_2: Union = None
num_images_per_prompt: int = 1
eta: float = 0.0
generator: Union = None
latents: Optional = None
prompt_embeds: Optional = None
negative_prompt_embeds: Optional = None
pooled_prompt_embeds: Optional = None
negative_pooled_prompt_embeds: Optional = None
output_type: Optional = 'pil'
return_dict: bool = True
callback: Optional = None
callback_steps: int = 1
cross_attention_kwargs: Optional = None
guidance_rescale: float = 0.0
original_size: Optional = None
crops_coords_top_left: Tuple = (0, 0)
target_size: Optional = None
)
β
diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput
or tuple
Parameters
-
prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide the image generation. If not defined, one has to passprompt_embeds
. instead. -
prompt_2 (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to be sent to thetokenizer_2
andtext_encoder_2
. If not defined,prompt
is used in both text-encoders -
num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. -
denoising_end (
Optional[float]
, defaults toNone
) — When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise as determined by the discrete timesteps selected by the scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in Refining the Image Output -
guidance_scale (
float
, defaults to 5.0) — Guidance scale as defined in Classifier-Free Diffusion Guidance.guidance_scale
is defined asw
of equation 2. of Imagen Paper. Guidance scale is enabled by settingguidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the textprompt
, usually at the expense of lower image quality. -
negative_prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds
instead. Ignored when not using guidance (i.e., ignored ifguidance_scale
is less than1
). -
negative_prompt_2 (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts not to guide the image generation to be sent totokenizer_2
andtext_encoder_2
. If not defined,negative_prompt
is used in both text-encoders -
num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching). -
eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies toschedulers.DDIMScheduler
, will be ignored for others. -
generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — One or a list of torch generator(s) to make generation deterministic. -
latents (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied randomgenerator
. -
prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated fromprompt
input argument. -
negative_prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated fromnegative_prompt
input argument. -
pooled_prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, pooled text embeddings will be generated fromprompt
input argument. -
negative_pooled_prompt_embeds (
Optional[torch.FloatTensor]
, defaults toNone
) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, pooled negative_prompt_embeds will be generated fromnegative_prompt
input argument. -
output_type (
Optional[str]
, defaults to"pil"
) — The output format of the generate image. Choose between PIL:PIL.Image.Image
ornp.array
. -
return_dict (
bool
, defaults toTrue
) — Whether or not to return adiffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput
instead of a plain tuple. -
callback (
Optional[Callable]
, defaults toNone
) — A function that will be called everycallback_steps
steps during inference. The function will be called with the following arguments:callback(step: int, timestep: int, latents: torch.FloatTensor)
. -
callback_steps (
int
, defaults to 1) — The frequency at which thecallback
function will be called. If not specified, the callback will be called at every step. -
cross_attention_kwargs (
dict
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.models.attention_processor. -
guidance_rescale (
float
, optional, defaults to 0.0) — Guidance rescale factor proposed by Common Diffusion Noise Schedules and Sample Steps are Flawedguidance_scale
is defined asφ
in equation 16. of Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR. -
original_size (
Optional[Tuple[int, int]]
, defaults to (1024, 1024)) — Iforiginal_size
is not the same astarget_size
the image will appear to be down- or upsampled.original_size
defaults to(width, height)
if not specified. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. -
crops_coords_top_left (
Tuple[int]
, defaults to (0, 0)) —crops_coords_top_left
can be used to generate an image that appears to be “cropped” from the positioncrops_coords_top_left
downwards. Favorable, well-centered images are usually achieved by settingcrops_coords_top_left
to (0, 0). Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. -
target_size (
Tuple[int]
,defaults to (1024, 1024)) — For most cases,target_size
should be set to the desired height and width of the generated image. If not specified it will default to(width, height)
. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952.
Returns
diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput
or tuple
diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput
if return_dict
is True, otherwise a
tuple
. When returning a tuple, the first element is a list with the generated images.
Function invoked when calling the pipeline for generation.
Examples:
>>> from optimum.neuron import NeuronStableDiffusionXLPipeline
>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
... "stabilityai/stable-diffusion-xl-base-1.0", export=True, **compiler_args, **input_shapes)
... )
>>> stable_diffusion_xl.save_pretrained("sd_neuron_xl/")
>>> prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
>>> image = stable_diffusion_xl(prompt).images[0]