Quantization
🤗 Optimum provides an optimum.furiosa
package that enables you to apply quantization on many models hosted on
the Hugging Face Hub using the Furiosa
quantization tool.
The quantization process is abstracted via the FuriosaAIConfig
and
the FuriosaAIQuantizer
classes. The former allows you to specify how quantization should be done,
while the latter effectively handles quantization.
Static Quantization example
The FuriosaAIQuantizer
class can be used to quantize statically your ONNX model. Below you will find
an easy end-to-end example on how to quantize statically
eugenecamus/resnet-50-base-beans-demo.
>>> from functools import partial
>>> from pathlib import Path
>>> from transformers import AutoFeatureExtractor
>>> from optimum.furiosa import FuriosaAIQuantizer, FuriosaAIModelForImageClassification
>>> from optimum.furiosa.configuration import AutoCalibrationConfig
>>> from optimum.furiosa.utils import export_model_to_onnx
>>> model_id = "eugenecamus/resnet-50-base-beans-demo"
# Convert PyTorch model convert to ONNX and create Quantizer and setup config
>>> feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
>>> batch_size = 1
>>> image_size = feature_extractor.size["shortest_edge"]
>>> num_labels = 3
>>> onnx_model_name = "model.onnx"
>>> output_dir = "output"
>>> onnx_model_path = Path(output_dir) / onnx_model_name
>>> export_model_to_onnx(
... model_id,
... save_dir=output_dir,
... input_shape_dict={"pixel_values": [batch_size, 3, image_size, image_size]},
... output_shape_dict={"logits": [batch_size, num_labels]},
... file_name=onnx_model_name,
)
>>> quantizer = FuriosaAIQuantizer.from_pretrained(output_dir, file_name=onnx_model_name)
>>> qconfig = QuantizationConfig()
# Create the calibration dataset
>>> def preprocess_fn(ex, feature_extractor):
... return feature_extractor(ex["image"])
>>> calibration_dataset = quantizer.get_calibration_dataset(
... "beans",
... preprocess_function=partial(preprocess_fn, feature_extractor=feature_extractor),
... num_samples=50,
... dataset_split="train",
... )
# Create the calibration configuration containing the parameters related to calibration.
>>> calibration_config = AutoCalibrationConfig.mse_asym(calibration_dataset)
# Perform the calibration step: computes the activations quantization ranges
>>> ranges = quantizer.fit(
... dataset=calibration_dataset,
... calibration_config=calibration_config,
... )
# Apply static quantization on the model
>>> model_quantized_path = quantizer.quantize(
... save_dir=output,
... calibration_tensors_range=ranges,
... quantization_config=qconfig,
... )