Models

class optimum.onnxruntime.ORTModel

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

Base class for implementing models using ONNX Runtime.

The ORTModel implements generic methods for interacting with the Hugging Face Hub as well as exporting vanilla transformers models to ONNX using optimum.exporters.onnx toolchain.

Class attributes:

model_type (str, optional, defaults to "onnx_model") — The name of the model type to use when registering the ORTModel classes.
auto_model_class (Type, optional, defaults to AutoModel) — The “AutoModel” class to represented by the current ORTModel class.

Common attributes:

model (ort.InferenceSession) — The ONNX Runtime InferenceSession that is running the model.
config (PretrainedConfig — The configuration of the model.
use_io_binding (bool, optional, defaults to True) — Whether to use I/O bindings with ONNX Runtime with the CUDAExecutionProvider, this can significantly speedup inference depending on the task.
model_save_dir (Path) — The directory where the model exported to ONNX is saved. By defaults, if the loaded model is local, the directory where the original model will be used. Otherwise, the cache directory is used.
providers (`List[str]) — The list of execution providers available to ONNX Runtime.

can_generate

< source >

( )

Returns whether this model can generate sequences with .generate().

from_pretrained

< source >

( model_id: typing.Union[str, pathlib.Path] export: bool = False force_download: bool = False use_auth_token: typing.Union[str, bool, NoneType] = None token: typing.Union[str, bool, NoneType] = None cache_dir: str = '/root/.cache/huggingface/hub' subfolder: str = '' config: typing.Optional[ForwardRef('PretrainedConfig')] = None local_files_only: bool = False provider: str = 'CPUExecutionProvider' session_options: typing.Optional[onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions] = None provider_options: typing.Union[typing.Dict[str, typing.Any], NoneType] = None use_io_binding: typing.Optional[bool] = None **kwargs ) → ORTModel

Parameters

model_id (Union[str, Path]) — Can be either:
- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
- A path to a directory containing a model saved using ~OptimizedModel.save_pretrained, e.g., ./my_model_directory/.
export (bool, defaults to False) — Defines whether the provided model_id needs to be exported to the targeted format.
force_download (bool, defaults to True) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
use_auth_token (Optional[Union[bool,str]], defaults to None) — Deprecated. Please use the token argument instead.
token (Optional[Union[bool,str]], defaults to None) — The token to use as HTTP bearer authorization for remote files. If True, will use the token generated when running huggingface-cli login (stored in huggingface_hub.constants.HF_TOKEN_PATH).
cache_dir (Optional[str], defaults to None) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
subfolder (str, defaults to "") — In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can specify the folder name here.
config (Optional[transformers.PretrainedConfig], defaults to None) — The model configuration.
local_files_only (Optional[bool], defaults to False) — Whether or not to only look at local files (i.e., do not try to download the model).
trust_remote_code (bool, defaults to False) — Whether or not to allow for custom code defined on the Hub in their own modeling. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
revision (Optional[str], defaults to None) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.
provider (str, defaults to "CPUExecutionProvider") — ONNX Runtime provider to use for loading the model. See https://onnxruntime.ai/docs/execution-providers/ for possible providers.
session_options (Optional[onnxruntime.SessionOptions], defaults to None), — ONNX Runtime session options to use for loading the model.
provider_options (Optional[Dict[str, Any]], defaults to None) — Provider option dictionaries corresponding to the provider used. See available options for each provider: https://onnxruntime.ai/docs/api/c/group___global.html .
use_io_binding (Optional[bool], defaults to None) — Whether to use IOBinding during inference to avoid memory copy between the host and device, or between numpy/torch tensors and ONNX Runtime ORTValue. Defaults to True if the execution provider is CUDAExecutionProvider. For [~onnxruntime.ORTModelForCausalLM], defaults to True on CPUExecutionProvider, in all other cases defaults to False.
kwargs (Dict[str, Any]) — Will be passed to the underlying model loading methods.

Parameters for decoder models (ORTModelForCausalLM, ORTModelForSeq2SeqLM, ORTModelForSeq2SeqLM, ORTModelForSpeechSeq2Seq, ORTModelForVision2Seq)

use_cache (Optional[bool], defaults to True) — Whether or not past key/values cache should be used. Defaults to True.

Parameters for ORTModelForCausalLM

use_merged (Optional[bool], defaults to None) — whether or not to use a single ONNX that handles both the decoding without and with past key values reuse. This option defaults to True if loading from a local repository and a merged decoder is found. When exporting with export=True, defaults to False. This option should be set to True to minimize memory usage.

Returns

ORTModel

The loaded ORTModel model.

Instantiate a pretrained model from a pre-trained model configuration.

load_model

< source >

( path: typing.Union[str, pathlib.Path] provider: str = 'CPUExecutionProvider' session_options: typing.Optional[onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions] = None provider_options: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )

Parameters

path (Union[str, Path]) — Path of the ONNX model.
provider (str, defaults to "CPUExecutionProvider") — ONNX Runtime provider to use for loading the model. See https://onnxruntime.ai/docs/execution-providers/ for possible providers.
session_options (Optional[onnxruntime.SessionOptions], defaults to None) — ONNX Runtime session options to use for loading the model.
provider_options (Optional[Dict[str, Any]], defaults to None) — Provider option dictionary corresponding to the provider used. See available options for each provider: https://onnxruntime.ai/docs/api/c/group___global.html .

Loads an ONNX Inference session with a given provider. Default provider is CPUExecutionProvider to match the default behaviour in PyTorch/TensorFlow/JAX.

raise_on_numpy_input_io_binding

< source >

( use_torch: bool )

Parameters

use_torch (bool) — Whether the tensor used during inference are of type torch.Tensor or not.

Raises an error if IO Binding is requested although the tensor used are numpy arrays.

shared_attributes_init

< source >

( model: InferenceSession use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

Initializes attributes that may be shared among several ONNX Runtime inference sesssions.

to

< source >

( device: typing.Union[torch.device, str, int] ) → ORTModel

Parameters

device (torch.device or str or int) — Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id. You can pass native torch.device or a str too.

Returns

ORTModel

the model placed on the requested device.

Changes the ONNX Runtime provider according to the device.

class optimum.onnxruntime.ORTModelForCausalLM

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None use_cache: typing.Optional[bool] = None **kwargs )

ONNX model with a causal language modeling head for ONNX Runtime inference. This class officially supports bloom, codegen, falcon, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gptj, llama.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: LongTensor attention_mask: typing.Optional[torch.FloatTensor] = None position_ids: typing.Optional[torch.LongTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None labels: typing.Optional[torch.LongTensor] = None use_cache_branch: bool = None **kwargs )

Parameters

input_ids (torch.LongTensor) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, sequence_length).
attention_mask (torch.LongTensor) — Mask to avoid performing attention on padding token indices, of shape (batch_size, sequence_length). Mask values selected in [0, 1].
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to None) — Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head).

The ORTModelForCausalLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForCausalLM
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/gpt2")
>>> model = ORTModelForCausalLM.from_pretrained("optimum/gpt2")

>>> inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

>>> gen_tokens = model.generate(**inputs,do_sample=True,temperature=0.9, min_length=20,max_length=20)
>>> tokenizer.batch_decode(gen_tokens)

Example using transformers.pipelines:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/gpt2")
>>> model = ORTModelForCausalLM.from_pretrained("optimum/gpt2")
>>> onnx_gen = pipeline("text-generation", model=model, tokenizer=tokenizer)

>>> text = "My name is Arthur and I live in"
>>> gen = onnx_gen(text)

class optimum.onnxruntime.ORTModelForMaskedLM

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a MaskedLMOutput for masked language modeling tasks. This class officially supports albert, bert, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, ibert, mobilebert, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The ORTModelForMaskedLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of feature extraction:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForMaskedLM
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased-for-fill-mask")
>>> model = ORTModelForMaskedLM.from_pretrained("optimum/bert-base-uncased-for-fill-mask")

>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 8, 28996]

Example using transformers.pipeline:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForMaskedLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased-for-fill-mask")
>>> model = ORTModelForMaskedLM.from_pretrained("optimum/bert-base-uncased-for-fill-mask")
>>> fill_masker = pipeline("fill-mask", model=model, tokenizer=tokenizer)

>>> text = "The capital of France is [MASK]."
>>> pred = fill_masker(text)

class optimum.onnxruntime.ORTModelForSeq2SeqLM

< source >

( encoder_session: InferenceSession decoder_session: InferenceSession config: PretrainedConfig onnx_paths: typing.List[str] decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None use_cache: bool = True use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None **kwargs )

Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports bart, blenderbot, blenderbot_small, longt5, m2m_100, marian, mbart, mt5, pegasus, t5.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: LongTensor = None attention_mask: typing.Optional[torch.FloatTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None labels: typing.Optional[torch.LongTensor] = None **kwargs )

Parameters

input_ids (torch.LongTensor) — Indices of input sequence tokens in the vocabulary of shape (batch_size, encoder_sequence_length).
attention_mask (torch.LongTensor) — Mask to avoid performing attention on padding token indices, of shape (batch_size, encoder_sequence_length). Mask values selected in [0, 1].
decoder_input_ids (torch.LongTensor) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length).
encoder_outputs (torch.FloatTensor) — The encoder last_hidden_state of shape (batch_size, encoder_sequence_length, hidden_size).
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to None) — Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

The ORTModelForSeq2SeqLM forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
>>> model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")

>>> inputs = tokenizer("My name is Eustache and I like to", return_tensors="pt")

>>> gen_tokens = model.generate(**inputs)
>>> outputs = tokenizer.batch_decode(gen_tokens)

Example using transformers.pipeline:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
>>> model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
>>> onnx_translation = pipeline("translation_en_to_de", model=model, tokenizer=tokenizer)

>>> text = "My name is Eustache."
>>> pred = onnx_translation(text)

class optimum.onnxruntime.ORTModelForSequenceClassification

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks. This class officially supports albert, bart, bert, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, ibert, mbart, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The ORTModelForSequenceClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 2]

Example using transformers.pipelines:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
>>> onnx_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

>>> text = "Hello, my dog is cute"
>>> pred = onnx_classifier(text)

Example using zero-shot-classification transformers.pipelines:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-mnli")
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-mnli")
>>> onnx_z0 = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

>>> sequence_to_classify = "Who are you voting for in 2020?"
>>> candidate_labels = ["Europe", "public health", "politics", "elections"]
>>> pred = onnx_z0(sequence_to_classify, candidate_labels, multi_label=True)

class optimum.onnxruntime.ORTModelForTokenClassification

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This class officially supports albert, bert, bloom, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, gpt2, ibert, mobilebert, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The ORTModelForTokenClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of token classification:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForTokenClassification
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER")
>>> model = ORTModelForTokenClassification.from_pretrained("optimum/bert-base-NER")

>>> inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 12, 9]

Example using transformers.pipelines:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForTokenClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER")
>>> model = ORTModelForTokenClassification.from_pretrained("optimum/bert-base-NER")
>>> onnx_ner = pipeline("token-classification", model=model, tokenizer=tokenizer)

>>> text = "My name is Philipp and I live in Germany."
>>> pred = onnx_ner(text)

class optimum.onnxruntime.ORTModelForMultipleChoice

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks. This class officially supports albert, bert, camembert, convbert, data2vec_text, deberta_v2, distilbert, electra, flaubert, ibert, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The ORTModelForMultipleChoice forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of mutliple choice:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForMultipleChoice

>>> tokenizer = AutoTokenizer.from_pretrained("ehdwns1516/bert-base-uncased_SWAG")
>>> model = ORTModelForMultipleChoice.from_pretrained("ehdwns1516/bert-base-uncased_SWAG", export=True)

>>> num_choices = 4
>>> first_sentence = ["Members of the procession walk down the street holding small horn brass instruments."] * num_choices
>>> second_sentence = [
...     "A drum line passes by walking down the street playing their instruments.",
...     "A drum line has heard approaching them.",
...     "A drum line arrives and they're outside dancing and asleep.",
...     "A drum line turns the lead singer watches the performance."
... ]
>>> inputs = tokenizer(first_sentence, second_sentence, truncation=True, padding=True)

# Unflatten the inputs values expanding it to the shape [batch_size, num_choices, seq_length]
>>> for k, v in inputs.items():
...     inputs[k] = [v[i: i + num_choices] for i in range(0, len(v), num_choices)]
>>> inputs = dict(inputs.convert_to_tensors(tensor_type="pt"))
>>> outputs = model(**inputs)
>>> logits = outputs.logits

class optimum.onnxruntime.ORTModelForQuestionAnswering

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD. This class officially supports albert, bart, bert, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, gptj, ibert, mbart, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The ORTModelForQuestionAnswering forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of question answering:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors="np")
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([3])

>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits

Example using transformers.pipeline:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> pred = onnx_qa(question, text)

class optimum.onnxruntime.ORTModelForImageClassification

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model for image-classification tasks. This class officially supports beit, convnext, convnextv2, data2vec_vision, deit, levit, mobilenet_v1, mobilenet_v2, mobilevit, poolformer, resnet, segformer, swin, vit.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( pixel_values: typing.Union[torch.Tensor, numpy.ndarray] **kwargs )

Parameters

pixel_values (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, num_channels, height, width), defaults to None) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images using AutoFeatureExtractor.

The ORTModelForImageClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of image classification:

>>> import requests
>>> from PIL import Image
>>> from optimum.onnxruntime import ORTModelForImageClassification
>>> from transformers import AutoFeatureExtractor

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
>>> model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")

>>> inputs = preprocessor(images=image, return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits

Example using transformers.pipeline:

>>> import requests
>>> from PIL import Image
>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.onnxruntime import ORTModelForImageClassification

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
>>> model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")
>>> onnx_image_classifier = pipeline("image-classification", model=model, feature_extractor=preprocessor)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = onnx_image_classifier(url)

class optimum.onnxruntime.ORTModelForSemanticSegmentation

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model for semantic-segmentation, with an all-MLP decode head on top e.g. for ADE20k, CityScapes. This class officially supports segformer.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( pixel_values: typing.Union[torch.Tensor, numpy.ndarray] **kwargs )

Parameters

pixel_values (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, num_channels, height, width), defaults to None) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images using AutoFeatureExtractor.

The ORTModelForSemanticSegmentation forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of semantic segmentation:

>>> import requests
>>> from PIL import Image
>>> from optimum.onnxruntime import ORTModelForSemanticSegmentation
>>> from transformers import AutoFeatureExtractor

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")
>>> model = ORTModelForSemanticSegmentation.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")

>>> inputs = preprocessor(images=image, return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits

Example using transformers.pipeline:

>>> import requests
>>> from PIL import Image
>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.onnxruntime import ORTModelForSemanticSegmentation

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")
>>> model = ORTModelForSemanticSegmentation.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")
>>> onnx_image_segmenter = pipeline("image-segmentation", model=model, feature_extractor=preprocessor)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = onnx_image_segmenter(url)

class optimum.onnxruntime.ORTModelForAudioClassification

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model for audio-classification, with a sequence classification head on top (a linear layer over the pooled output) for tasks like SUPERB Keyword Spotting. This class officially supports audio_spectrogram_transformer, data2vec_audio, hubert, sew, sew_d, unispeech, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_values: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None input_features: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using AutoFeatureExtractor.

The ORTModelForAudioClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of audio classification:

>>> from transformers import AutoFeatureExtractor
>>> from optimum.onnxruntime import ORTModelForAudioClassification
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/hubert-base-superb-ks")
>>> model = ORTModelForAudioClassification.from_pretrained("optimum/hubert-base-superb-ks")

>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")

>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> predicted_class_ids = torch.argmax(logits, dim=-1).item()
>>> predicted_label = model.config.id2label[predicted_class_ids]

Example using transformers.pipeline:

>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.onnxruntime import ORTModelForAudioClassification

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/hubert-base-superb-ks")
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")

>>> model = ORTModelForAudioClassification.from_pretrained("optimum/hubert-base-superb-ks")
>>> onnx_ac = pipeline("audio-classification", model=model, feature_extractor=feature_extractor)

>>> pred = onnx_ac(dataset[0]["audio"]["array"])

class optimum.onnxruntime.ORTModelForAudioFrameClassification

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a frame classification head on top for tasks like Speaker Diarization. This class officially supports data2vec_audio, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_values: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using AutoFeatureExtractor.

The ORTModelForAudioFrameClassification forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of audio frame classification:

>>> from transformers import AutoFeatureExtractor
>>> from optimum.onnxruntime import ORTModelForAudioFrameClassification
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/wav2vec2-base-superb-sd")
>>> model =  ORTModelForAudioFrameClassification.from_pretrained("optimum/wav2vec2-base-superb-sd")

>>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt", sampling_rate=sampling_rate)
>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> probabilities = torch.sigmoid(logits[0])
>>> labels = (probabilities > 0.5).long()
>>> labels[0].tolist()

class optimum.onnxruntime.ORTModelForCTC

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with a language modeling head on top for Connectionist Temporal Classification (CTC). This class officially supports data2vec_audio, hubert, sew, sew_d, unispeech, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_values: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using AutoFeatureExtractor.

The ORTModelForCTC forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of CTC:

>>> from transformers import AutoProcessor, HubertForCTC
>>> from optimum.onnxruntime import ORTModelForCTC
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> processor = AutoProcessor.from_pretrained("optimum/hubert-large-ls960-ft")
>>> model = ORTModelForCTC.from_pretrained("optimum/hubert-large-ls960-ft")

>>> # audio file is decoded on the fly
>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
>>> predicted_ids = torch.argmax(logits, dim=-1)

>>> transcription = processor.batch_decode(predicted_ids)

class optimum.onnxruntime.ORTModelForSpeechSeq2Seq

< source >

( encoder_session: InferenceSession decoder_session: InferenceSession config: PretrainedConfig onnx_paths: typing.List[str] decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None use_cache: bool = True use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None **kwargs )

Speech Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports whisper, speech_to_text.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_features: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None labels: typing.Optional[torch.LongTensor] = None cache_position: typing.Optional[torch.Tensor] = None **kwargs )

Parameters

input_features (torch.FloatTensor) — Mel features extracted from the raw speech waveform. (batch_size, feature_size, encoder_sequence_length).
decoder_input_ids (torch.LongTensor) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length).
encoder_outputs (torch.FloatTensor) — The encoder last_hidden_state of shape (batch_size, encoder_sequence_length, hidden_size).
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to None) — Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

The ORTModelForSpeechSeq2Seq forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

>>> from transformers import AutoProcessor
>>> from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
>>> from datasets import load_dataset

>>> processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
>>> model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")

>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> inputs = processor.feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")

>>> gen_tokens = model.generate(inputs=inputs.input_features)
>>> outputs = processor.tokenizer.batch_decode(gen_tokens)

Example using transformers.pipeline:

>>> from transformers import AutoProcessor, pipeline
>>> from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
>>> from datasets import load_dataset

>>> processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
>>> model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")
>>> speech_recognition = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor)

>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> pred = speech_recognition(ds[0]["audio"]["array"])

class optimum.onnxruntime.ORTModelForAudioXVector

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model with an XVector feature extraction head on top for tasks like Speaker Verification. This class officially supports data2vec_audio, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_values: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using AutoFeatureExtractor.

The ORTModelForAudioXVector forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of Audio XVector:

>>> from transformers import AutoFeatureExtractor
>>> from optimum.onnxruntime import ORTModelForAudioXVector
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/wav2vec2-base-superb-sv")
>>> model = ORTModelForAudioXVector.from_pretrained("optimum/wav2vec2-base-superb-sv")

>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(
...     [d["array"] for d in dataset[:2]["audio"]], sampling_rate=sampling_rate, return_tensors="pt", padding=True
... )
>>> with torch.no_grad():
...     embeddings = model(**inputs).embeddings

>>> embeddings = torch.nn.functional.normalize(embeddings, dim=-1).cpu()

>>> cosine_sim = torch.nn.CosineSimilarity(dim=-1)
>>> similarity = cosine_sim(embeddings[0], embeddings[1])
>>> threshold = 0.7
>>> if similarity < threshold:
...     print("Speakers are not the same!")
>>> round(similarity.item(), 2)

class optimum.onnxruntime.ORTModelForVision2Seq

< source >

( encoder_session: InferenceSession decoder_session: InferenceSession config: PretrainedConfig onnx_paths: typing.List[str] decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None use_cache: bool = True use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None **kwargs )

VisionEncoderDecoder Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports trocr and vision-encoder-decoder.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( pixel_values: typing.Optional[torch.FloatTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None labels: typing.Optional[torch.LongTensor] = None **kwargs )

Parameters

pixel_values (torch.FloatTensor) — Features extracted from an Image. This tensor should be of shape (batch_size, num_channels, height, width).
decoder_input_ids (torch.LongTensor) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length).
encoder_outputs (torch.FloatTensor) — The encoder last_hidden_state of shape (batch_size, encoder_sequence_length, hidden_size).
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to None) — Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

The ORTModelForVision2Seq forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

>>> from transformers import AutoImageProcessor, AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForVision2Seq
>>> from PIL import Image
>>> import requests


>>> processor = AutoImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> model = ORTModelForVision2Seq.from_pretrained("nlpconnect/vit-gpt2-image-captioning", export=True)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> inputs = processor(image, return_tensors="pt")

>>> gen_tokens = model.generate(**inputs)
>>> outputs = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

Example using transformers.pipeline:

>>> from transformers import AutoImageProcessor, AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForVision2Seq
>>> from PIL import Image
>>> import requests


>>> processor = AutoImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> model = ORTModelForVision2Seq.from_pretrained("nlpconnect/vit-gpt2-image-captioning", export=True)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> image_to_text = pipeline("image-to-text", model=model, tokenizer=tokenizer, feature_extractor=processor, image_processor=processor)
>>> pred = image_to_text(image)

class optimum.onnxruntime.ORTModelForPix2Struct

< source >

( encoder_session: InferenceSession decoder_session: InferenceSession config: PretrainedConfig onnx_paths: typing.List[str] decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None use_cache: bool = True use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None generation_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None **kwargs )

Pix2struct model with a language modeling head for ONNX Runtime inference. This class officially supports pix2struct.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( flattened_patches: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.LongTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None decoder_attention_mask: typing.Optional[torch.BoolTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None labels: typing.Optional[torch.LongTensor] = None **kwargs )

Parameters

flattened_patches (torch.FloatTensor of shape (batch_size, seq_length, hidden_size)) — Flattened pixel patches. the hidden_size is obtained by the following formula: hidden_size = num_channels patch_size patch_size The process of flattening the pixel patches is done by Pix2StructProcessor.
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices.
decoder_input_ids (torch.LongTensor of shape (batch_size, target_sequence_length), optional) — Indices of decoder input sequence tokens in the vocabulary. Pix2StructText uses the pad_token_id as the starting token for decoder_input_ids generation. If past_key_values is used, optionally only the last decoder_input_ids have to be input (see past_key_values).
decoder_attention_mask (torch.BoolTensor of shape (batch_size, target_sequence_length), optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
encoder_outputs (tuple(tuple(torch.FloatTensor), optional) — Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape (batch_size, sequence_length, hidden_size) is a sequence of hidden states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to None) — Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

The ORTModelForPix2Struct forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of pix2struct:

>>> from transformers import AutoProcessor
>>> from optimum.onnxruntime import ORTModelForPix2Struct
>>> from PIL import Image
>>> import requests

>>> processor = AutoProcessor.from_pretrained("google/pix2struct-ai2d-base")
>>> model = ORTModelForPix2Struct.from_pretrained("google/pix2struct-ai2d-base", export=True, use_io_binding=True)

>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
>>> inputs = processor(images=image, text=question, return_tensors="pt")

>>> gen_tokens = model.generate(**inputs)
>>> outputs = processor.batch_decode(gen_tokens, skip_special_tokens=True)

class optimum.onnxruntime.ORTModelForCustomTasks

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model for any custom tasks. It can be used to leverage the inference acceleration for any single-file ONNX model, that may use custom inputs and outputs.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( **model_inputs: typing.Union[torch.Tensor, numpy.ndarray] )

The ORTModelForCustomTasks forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of custom tasks(e.g. a sentence transformers taking pooler_output as output):

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForCustomTasks

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")
>>> model = ORTModelForCustomTasks.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")

>>> inputs = tokenizer("I love burritos!", return_tensors="np")

>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> pooler_output = outputs.pooler_output

Example using transformers.pipelines(only if the task is supported):

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForCustomTasks

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")
>>> model = ORTModelForCustomTasks.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")
>>> onnx_extractor = pipeline("feature-extraction", model=model, tokenizer=tokenizer)

>>> text = "I love burritos!"
>>> pred = onnx_extractor(text)

class optimum.onnxruntime.ORTModelForFeatureExtraction

< source >

( model: InferenceSession config: PretrainedConfig use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None preprocessors: typing.Optional[typing.List] = None **kwargs )

ONNX Model for feature-extraction task.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

forward

< source >

( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

input_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using AutoTokenizer. See PreTrainedTokenizer.encode and PreTrainedTokenizer.__call__ for details. What are input IDs?
attention_mask (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
token_type_ids (Union[torch.Tensor, np.ndarray, None] of shape (batch_size, sequence_length), defaults to None) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?

The ORTModelForFeatureExtraction forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of feature extraction:

>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForFeatureExtraction
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2")
>>> model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")

>>> inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np")

>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> list(last_hidden_state.shape)
[1, 12, 384]

Example using transformers.pipeline:

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForFeatureExtraction

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2")
>>> model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
>>> onnx_extractor = pipeline("feature-extraction", model=model, tokenizer=tokenizer)

>>> text = "My name is Philipp and I live in Germany."
>>> pred = onnx_extractor(text)

class optimum.onnxruntime.ORTStableDiffusionPipeline

< source >

( vae_decoder_session: InferenceSession text_encoder_session: InferenceSession unet_session: InferenceSession config: typing.Dict[str, typing.Any] tokenizer: CLIPTokenizer scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None vae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None text_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )

ONNX Runtime-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionPipeline.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 50 guidance_scale: float = 7.5 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 eta: float = 0.0 generator: typing.Optional[numpy.random.mtrand.RandomState] = None latents: typing.Optional[numpy.ndarray] = None prompt_embeds: typing.Optional[numpy.ndarray] = None negative_prompt_embeds: typing.Optional[numpy.ndarray] = None output_type: str = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = None callback_steps: int = 1 guidance_rescale: float = 0.0 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

Parameters

prompt (Optional[Union[str, List[str]]], defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
height (Optional[int], defaults to None) — The height in pixels of the generated image.
width (Optional[int], defaults to None) — The width in pixels of the generated image.
num_inference_steps (int, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (Optional[Union[str, list]]) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds. instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, defaults to 1) — The number of images to generate per prompt.
eta (float, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.
generator (Optional[np.random.RandomState], defaults to None) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray], defaults to None) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, defaults to True) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput instead of a plain tuple.
callback (Optional[Callable], defaults to None) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.
guidance_rescale (float, defaults to 0.0) — Guidance rescale factor proposed by Common Diffusion Noise Schedules and Sample Steps are Flawed guidance_scale is defined as φ in equation 16. of Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR.

Returns

~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

~pipelines.stable_diffusion.StableDiffusionPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

Function invoked when calling the pipeline for generation.

class optimum.onnxruntime.ORTStableDiffusionImg2ImgPipeline

< source >

( vae_decoder_session: InferenceSession text_encoder_session: InferenceSession unet_session: InferenceSession config: typing.Dict[str, typing.Any] tokenizer: CLIPTokenizer scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None vae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None text_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )

ONNX Runtime-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionImg2ImgPipeline.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None image: typing.Union[numpy.ndarray, PIL.Image.Image] = None strength: float = 0.8 num_inference_steps: int = 50 guidance_scale: float = 7.5 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 eta: float = 0.0 generator: typing.Optional[numpy.random.mtrand.RandomState] = None prompt_embeds: typing.Optional[numpy.ndarray] = None negative_prompt_embeds: typing.Optional[numpy.ndarray] = None output_type: str = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = None callback_steps: int = 1 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

Parameters

prompt (Optional[Union[str, List[str]]], defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
image (Union[np.ndarray, PIL.Image.Image]) — Image, or tensor representing an image batch which will be upscaled.
strength (float, defaults to 0.8) — Conceptually, indicates how much to transform the reference image. Must be between 0 and 1. image will be used as a starting point, adding more noise to it the larger the strength. The number of denoising steps depends on the amount of noise initially added. When strength is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified in num_inference_steps. A value of 1, therefore, essentially ignores image.
num_inference_steps (int, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (Optional[Union[str, list]]) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds. instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, defaults to 1) — The number of images to generate per prompt.
eta (float, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.
generator (Optional[np.random.RandomState], defaults to None) —: A np.random.RandomState to make generation deterministic.
prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, defaults to True) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput instead of a plain tuple.
callback (Optional[Callable], defaults to None) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.

Returns

~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

~pipelines.stable_diffusion.StableDiffusionPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

Function invoked when calling the pipeline for generation.

class optimum.onnxruntime.ORTStableDiffusionInpaintPipeline

< source >

( vae_decoder_session: InferenceSession text_encoder_session: InferenceSession unet_session: InferenceSession config: typing.Dict[str, typing.Any] tokenizer: CLIPTokenizer scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None vae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None text_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )

ONNX Runtime-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionInpaintPipeline.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

call

< source >

( prompt: typing.Union[str, typing.List[str]] image: Image mask_image: Image height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 50 guidance_scale: float = 7.5 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 eta: float = 0.0 generator: typing.Optional[numpy.random.mtrand.RandomState] = None latents: typing.Optional[numpy.ndarray] = None prompt_embeds: typing.Optional[numpy.ndarray] = None negative_prompt_embeds: typing.Optional[numpy.ndarray] = None output_type: str = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = None callback_steps: int = 1 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

Parameters

prompt (Union[str, List[str]]) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
image (PIL.Image.Image) — Image, or tensor representing an image batch which will be upscaled.
mask_image (PIL.Image.Image) — Image, or tensor representing a masked image batch which will be upscaled.
height (Optional[int], defaults to None) — The height in pixels of the generated image.
width (Optional[int], defaults to None) — The width in pixels of the generated image.
num_inference_steps (int, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (Optional[Union[str, list]]) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds. instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, defaults to 1) — The number of images to generate per prompt.
eta (float, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.
generator (Optional[np.random.RandomState], defaults to None) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray], defaults to None) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, defaults to True) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput instead of a plain tuple.
callback (Optional[Callable], defaults to None) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.

Returns

~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

~pipelines.stable_diffusion.StableDiffusionPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

Function invoked when calling the pipeline for generation.

class optimum.onnxruntime.ORTStableDiffusionXLPipeline

< source >

( vae_decoder_session: InferenceSession text_encoder_session: InferenceSession unet_session: InferenceSession config: typing.Dict[str, typing.Any] tokenizer: CLIPTokenizer scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None vae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None text_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None add_watermarker: typing.Optional[bool] = None )

ONNX Runtime-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionXLPipeline.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 50 guidance_scale: float = 5.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 eta: float = 0.0 generator: typing.Optional[numpy.random.mtrand.RandomState] = None latents: typing.Optional[numpy.ndarray] = None prompt_embeds: typing.Optional[numpy.ndarray] = None negative_prompt_embeds: typing.Optional[numpy.ndarray] = None pooled_prompt_embeds: typing.Optional[numpy.ndarray] = None negative_pooled_prompt_embeds: typing.Optional[numpy.ndarray] = None output_type: str = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = None callback_steps: int = 1 cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None guidance_rescale: float = 0.0 original_size: typing.Union[typing.Tuple[int, int], NoneType] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) target_size: typing.Union[typing.Tuple[int, int], NoneType] = None ) → ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput or tuple

Parameters

prompt (Optional[Union[str, List[str]]], defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
height (Optional[int], defaults to None) — The height in pixels of the generated image.
width (Optional[int], defaults to None) — The width in pixels of the generated image.
num_inference_steps (int, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, defaults to 5) — Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (Optional[Union[str, list]]) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds. instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, defaults to 1) — The number of images to generate per prompt.
eta (float, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.
generator (Optional[np.random.RandomState], defaults to None) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray], defaults to None) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, defaults to True) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput instead of a plain tuple.
callback (Optional[Callable], defaults to None) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.
guidance_rescale (float, defaults to 0.7) — Guidance rescale factor proposed by Common Diffusion Noise Schedules and Sample Steps are Flawed guidance_scale is defined as φ in equation 16. of Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR.

Returns

~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput or tuple

~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

Function invoked when calling the pipeline for generation.

class optimum.onnxruntime.ORTStableDiffusionXLImg2ImgPipeline

< source >

( vae_decoder_session: InferenceSession text_encoder_session: InferenceSession unet_session: InferenceSession config: typing.Dict[str, typing.Any] tokenizer: CLIPTokenizer scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None vae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None text_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None add_watermarker: typing.Optional[bool] = None )

ONNX Runtime-powered stable diffusion pipeline corresponding to diffusers.StableDiffusionXLImg2ImgPipeline.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None image: typing.Union[numpy.ndarray, PIL.Image.Image] = None strength: float = 0.3 num_inference_steps: int = 50 guidance_scale: float = 5.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 eta: float = 0.0 generator: typing.Optional[numpy.random.mtrand.RandomState] = None latents: typing.Optional[numpy.ndarray] = None prompt_embeds: typing.Optional[numpy.ndarray] = None negative_prompt_embeds: typing.Optional[numpy.ndarray] = None pooled_prompt_embeds: typing.Optional[numpy.ndarray] = None negative_pooled_prompt_embeds: typing.Optional[numpy.ndarray] = None output_type: str = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = None callback_steps: int = 1 cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = None guidance_rescale: float = 0.0 original_size: typing.Union[typing.Tuple[int, int], NoneType] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) target_size: typing.Union[typing.Tuple[int, int], NoneType] = None aesthetic_score: float = 6.0 negative_aesthetic_score: float = 2.5 ) → ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput or tuple

Parameters

prompt (Optional[Union[str, List[str]]], defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
image (Union[np.ndarray, PIL.Image.Image]) — Image, or tensor representing an image batch which will be upscaled.
strength (float, defaults to 0.8) — Conceptually, indicates how much to transform the reference image. Must be between 0 and 1. image will be used as a starting point, adding more noise to it the larger the strength. The number of denoising steps depends on the amount of noise initially added. When strength is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified in num_inference_steps. A value of 1, therefore, essentially ignores image.
num_inference_steps (int, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, defaults to 5) — Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (Optional[Union[str, list]]) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds. instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, defaults to 1) — The number of images to generate per prompt.
eta (float, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to schedulers.DDIMScheduler, will be ignored for others.
generator (Optional[np.random.RandomState], defaults to None) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray], defaults to None) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, defaults to True) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput instead of a plain tuple.
callback (Optional[Callable], defaults to None) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.
guidance_rescale (float, defaults to 0.7) — Guidance rescale factor proposed by Common Diffusion Noise Schedules and Sample Steps are Flawed guidance_scale is defined as φ in equation 16. of Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR.

Returns

~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput or tuple

~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

Function invoked when calling the pipeline for generation.

class optimum.onnxruntime.ORTLatentConsistencyModelPipeline

< source >

( vae_decoder_session: InferenceSession text_encoder_session: InferenceSession unet_session: InferenceSession config: typing.Dict[str, typing.Any] tokenizer: CLIPTokenizer scheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler] feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None vae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None text_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None use_io_binding: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )

ONNX Runtime-powered stable diffusion pipeline corresponding to diffusers.LatentConsistencyModelPipeline.

This model inherits from ORTModel, check its documentation for the generic methods the library implements for all its model (such as downloading or saving).

This class should be initialized using the onnxruntime.modeling_ort.ORTModel.from_pretrained() method.

call

< source >

( prompt: typing.Union[str, typing.List[str], NoneType] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 4 original_inference_steps: int = None guidance_scale: float = 8.5 num_images_per_prompt: int = 1 generator: typing.Optional[numpy.random.mtrand.RandomState] = None latents: typing.Optional[numpy.ndarray] = None prompt_embeds: typing.Optional[numpy.ndarray] = None output_type: str = 'pil' return_dict: bool = True callback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = None callback_steps: int = 1 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

Parameters

prompt (Optional[Union[str, List[str]]], defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
height (Optional[int], defaults to None) — The height in pixels of the generated image.
width (Optional[int], defaults to None) — The width in pixels of the generated image.
num_inference_steps (int, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, defaults to 7.5) — Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
num_images_per_prompt (int, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState], defaults to None) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray], defaults to None) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (Optional[np.ndarray], defaults to None) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
output_type (str, defaults to "pil") — The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.
return_dict (bool, defaults to True) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput instead of a plain tuple.
callback (Optional[Callable], defaults to None) — A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, defaults to 1) — The frequency at which the callback function will be called. If not specified, the callback will be called at every step.
guidance_rescale (float, defaults to 0.0) — Guidance rescale factor proposed by Common Diffusion Noise Schedules and Sample Steps are Flawed guidance_scale is defined as φ in equation 16. of Common Diffusion Noise Schedules and Sample Steps are Flawed. Guidance rescale factor should fix overexposure when using zero terminal SNR.

Returns

~pipelines.stable_diffusion.StableDiffusionPipelineOutput or tuple

~pipelines.stable_diffusion.StableDiffusionPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

Function invoked when calling the pipeline for generation.

Optimum

Models

Generic model classes

ORTModel

class optimum.onnxruntime.ORTModel

can_generate

from_pretrained

load_model

raise_on_numpy_input_io_binding

shared_attributes_init

to

Natural Language Processing

ORTModelForCausalLM

class optimum.onnxruntime.ORTModelForCausalLM

forward

ORTModelForMaskedLM

class optimum.onnxruntime.ORTModelForMaskedLM

forward

ORTModelForSeq2SeqLM

class optimum.onnxruntime.ORTModelForSeq2SeqLM

forward

ORTModelForSequenceClassification

class optimum.onnxruntime.ORTModelForSequenceClassification

forward

ORTModelForTokenClassification

class optimum.onnxruntime.ORTModelForTokenClassification

forward

ORTModelForMultipleChoice

class optimum.onnxruntime.ORTModelForMultipleChoice

forward

ORTModelForQuestionAnswering

class optimum.onnxruntime.ORTModelForQuestionAnswering

forward

Computer vision

ORTModelForImageClassification

class optimum.onnxruntime.ORTModelForImageClassification

forward

ORTModelForSemanticSegmentation

class optimum.onnxruntime.ORTModelForSemanticSegmentation

forward

Audio

ORTModelForAudioClassification

class optimum.onnxruntime.ORTModelForAudioClassification

forward

ORTModelForAudioFrameClassification

class optimum.onnxruntime.ORTModelForAudioFrameClassification

forward

ORTModelForCTC

class optimum.onnxruntime.ORTModelForCTC

forward

ORTModelForSpeechSeq2Seq

class optimum.onnxruntime.ORTModelForSpeechSeq2Seq

forward

ORTModelForAudioXVector

class optimum.onnxruntime.ORTModelForAudioXVector

forward

Multimodal

ORTModelForVision2Seq

class optimum.onnxruntime.ORTModelForVision2Seq

forward

ORTModelForPix2Struct

class optimum.onnxruntime.ORTModelForPix2Struct

forward

Custom Tasks

ORTModelForCustomTasks

class optimum.onnxruntime.ORTModelForCustomTasks

forward

ORTModelForFeatureExtraction

class optimum.onnxruntime.ORTModelForFeatureExtraction

forward

Stable Diffusion

ORTStableDiffusionPipeline

class optimum.onnxruntime.ORTStableDiffusionPipeline

__call__

ORTStableDiffusionImg2ImgPipeline

class optimum.onnxruntime.ORTStableDiffusionImg2ImgPipeline

__call__

ORTStableDiffusionInpaintPipeline

class optimum.onnxruntime.ORTStableDiffusionInpaintPipeline

__call__

call

call

call

call

call

call