Transformers.js documentation

processors

You are viewing main version, which requires installation from source. If you'd like regular npm install, checkout the latest stable version (v3.0.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

processors

Processors are used to prepare non-textual inputs (e.g., image or audio) for a model.

Example: Using a WhisperProcessor to prepare an audio input for a model.

import { AutoProcessor, read_audio } from '@huggingface/transformers';

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
let audio = await read_audio('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac', 16000);
let { input_features } = await processor(audio);
// Tensor {
//   data: Float32Array(240000) [0.4752984642982483, 0.5597258806228638, 0.56434166431427, ...],
//   dims: [1, 80, 3000],
//   type: 'float32',
//   size: 240000,
// }

processors.FeatureExtractor ⇐ <code> Callable </code>

Base class for feature extractors.

Kind: static class of processors
Extends: Callable


new FeatureExtractor(config)

Constructs a new FeatureExtractor instance.

ParamTypeDescription
configObject

The configuration for the feature extractor.


featureExtractor._call(...args)

This method should be implemented in subclasses to provide the functionality of the callable object.

Kind: instance method of FeatureExtractor
Overrides: _call
Throws:

  • Error If the subclass does not implement the `_call` method.
ParamType
...argsArray.<any>

processors.ImageFeatureExtractor ⇐ <code> FeatureExtractor </code>

Feature extractor for image models.

Kind: static class of processors
Extends: FeatureExtractor


new ImageFeatureExtractor(config)

Constructs a new ImageFeatureExtractor instance.

ParamTypeDefaultDescription
configObject

The configuration for the feature extractor.

config.image_meanArray.<number>

The mean values for image normalization.

config.image_stdArray.<number>

The standard deviation values for image normalization.

config.do_rescaleboolean

Whether to rescale the image pixel values to the [0,1] range.

config.rescale_factornumber

The factor to use for rescaling the image pixel values.

config.do_normalizeboolean

Whether to normalize the image pixel values.

config.do_resizeboolean

Whether to resize the image.

config.resamplenumber

What method to use for resampling.

config.sizenumber | Object

The size to resize the image to.

[config.do_flip_channel_order]booleanfalse

Whether to flip the color channels from RGB to BGR. Can be overridden by the do_flip_channel_order parameter in the preprocess method.


imageFeatureExtractor.thumbnail(image, size, [resample]) β‡’ <code> Promise. < RawImage > </code>

Resize the image to make a thumbnail. The image is resized so that no dimension is larger than any corresponding dimension of the specified size.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The resized image.

ParamTypeDefaultDescription
imageRawImage

The image to be resized.

sizeObject

The size {"height": h, "width": w} to resize the image to.

[resample]string | 0 | 1 | 2 | 3 | 4 | 52

The resampling filter to use.


imageFeatureExtractor.crop_margin(image, gray_threshold) β‡’ <code> Promise. < RawImage > </code>

Crops the margin of the image. Gray pixels are considered margin (i.e., pixels with a value below the threshold).

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The cropped image.

ParamTypeDefaultDescription
imageRawImage

The image to be cropped.

gray_thresholdnumber200

Value below which pixels are considered to be gray.


imageFeatureExtractor.pad_image(pixelData, imgDims, padSize, options) β‡’ <code> * </code>

Pad the image by a certain amount.

Kind: instance method of ImageFeatureExtractor
Returns: * - The padded pixel data and image dimensions.

ParamTypeDefaultDescription
pixelDataFloat32Array

The pixel data to pad.

imgDimsArray.<number>

The dimensions of the image (height, width, channels).

padSize*

The dimensions of the padded image.

optionsObject

The options for padding.

[options.mode]'constant' | 'symmetric''constant'

The type of padding to add.

[options.center]booleanfalse

Whether to center the image.

[options.constant_values]number0

The constant value to use for padding.


imageFeatureExtractor.rescale(pixelData) β‡’ <code> void </code>

Rescale the image’ pixel values by this.rescale_factor.

Kind: instance method of ImageFeatureExtractor

ParamTypeDescription
pixelDataFloat32Array

The pixel data to rescale.


imageFeatureExtractor.get_resize_output_image_size(image, size) β‡’ <code> * </code>

Find the target (width, height) dimension of the output image after resizing given the input image and the desired size.

Kind: instance method of ImageFeatureExtractor
Returns: * - The target (width, height) dimension of the output image after resizing.

ParamTypeDescription
imageRawImage

The image to resize.

sizeany

The size to use for resizing the image.


imageFeatureExtractor.resize(image) β‡’ <code> Promise. < RawImage > </code>

Resizes the image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<RawImage> - The resized image.

ParamTypeDescription
imageRawImage

The image to resize.


imageFeatureExtractor.preprocess(image, overrides) β‡’ <code> Promise. < PreprocessedImage > </code>

Preprocesses the given image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<PreprocessedImage> - The preprocessed image.

ParamTypeDescription
imageRawImage

The image to preprocess.

overridesObject

The overrides for the preprocessing options.


imageFeatureExtractor._call(images, ...args) β‡’ <code> Promise. < ImageFeatureExtractorResult > </code>

Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<ImageFeatureExtractorResult> - An object containing the concatenated pixel values (and other metadata) of the preprocessed images.

ParamTypeDescription
imagesArray.<RawImage>

The image(s) to extract features from.

...argsany

Additional arguments.


processors.DetrFeatureExtractor ⇐ <code> ImageFeatureExtractor </code>

Detr Feature Extractor.

Kind: static class of processors
Extends: ImageFeatureExtractor


detrFeatureExtractor._call(images) β‡’ <code> Promise. < DetrFeatureExtractorResult > </code>

Calls the feature extraction process on an array of images, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of DetrFeatureExtractor
Returns: Promise.<DetrFeatureExtractorResult> - An object containing the concatenated pixel values of the preprocessed images.

ParamTypeDescription
imagesArray.<RawImage>

The image(s) to extract features from.


detrFeatureExtractor.post_process_object_detection() : <code> * </code>

Kind: instance method of DetrFeatureExtractor


detrFeatureExtractor.post_process_panoptic_segmentation() : <code> * </code>

Kind: instance method of DetrFeatureExtractor


processors.Processor ⇐ <code> Callable </code>

Represents a Processor that extracts features from an input.

Kind: static class of processors
Extends: Callable


new Processor(feature_extractor)

Creates a new Processor with the given feature extractor.

ParamTypeDescription
feature_extractorFeatureExtractor

The function used to extract features from the input.


processor._call(input, ...args) β‡’ <code> Promise. < any > </code>

Calls the feature_extractor function with the given input.

Kind: instance method of Processor
Overrides: _call
Returns: Promise.<any> - A Promise that resolves with the extracted features.

ParamTypeDescription
inputany

The input to extract features from.

...argsany

Additional arguments.


processors.WhisperProcessor ⇐ <code> Processor </code>

Represents a WhisperProcessor that extracts features from an audio input.

Kind: static class of processors
Extends: Processor


whisperProcessor._call(audio) β‡’ <code> Promise. < any > </code>

Calls the feature_extractor function with the given audio input.

Kind: instance method of WhisperProcessor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

ParamTypeDescription
audioany

The audio input to extract features from.


processors.AutoProcessor

Helper class which is used to instantiate pretrained processors with the from_pretrained function. The chosen processor class is determined by the type specified in the processor config.

Example: Load a processor using from_pretrained.

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');

Example: Run an image through a processor.

let processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// {
//   "pixel_values": {
//     "dims": [ 1, 3, 224, 224 ],
//     "type": "float32",
//     "data": Float32Array [ -1.558687686920166, -1.558687686920166, -1.5440893173217773, ... ],
//     "size": 150528
//   },
//   "original_sizes": [
//     [ 533, 800 ]
//   ],
//   "reshaped_input_sizes": [
//     [ 224, 224 ]
//   ]
// }

Kind: static class of processors


AutoProcessor.from_pretrained(pretrained_model_name_or_path, options) β‡’ <code> Promise. < Processor > </code>

Instantiate one of the processor classes of the library from a pretrained model.

The processor class to instantiate is selected based on the feature_extractor_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible)

Kind: static method of AutoProcessor
Returns: Promise.<Processor> - A new instance of the Processor class.

ParamTypeDescription
pretrained_model_name_or_pathstring

The name or path of the pretrained model. Can be either:

  • A string, the model id of a pretrained processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
  • A path to a directory containing processor files, e.g., ./my_model_directory/.
options*

Additional options for loading the processor.


processors.data : <code> Float32Array </code>

Kind: static property of processors


processors~center_to_corners_format(arr) β‡’ <code> Array. < number > </code>

Converts bounding boxes from center format to corners format.

Kind: inner method of processors
Returns: Array.<number> - The coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

ParamTypeDescription
arrArray.<number>

The coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height)


processors~post_process_semantic_segmentation(outputs, [target_sizes]) β‡’ <code> * </code>

Post-processes the outputs of the model (for semantic segmentation).

Kind: inner method of processors
Returns: * - The semantic segmentation maps.

ParamTypeDefaultDescription
outputs*

Raw outputs of the model.

[target_sizes]*

List of tuples corresponding to the requested final size (height, width) of each prediction. If unset, predictions will not be resized.


post_process_semantic_segmentation~labels : <code> Array. < number > </code>

The unique list of labels that were detected

Kind: inner constant of post_process_semantic_segmentation


processors~post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) β‡’ <code> Array. < {segmentation: Tensor, segments_info: Array < {id: number, label_id: number, score: number} > } > </code>

Post-process the model output to generate the final panoptic segmentation.

Kind: inner method of processors

ParamTypeDefaultDescription
outputs*

The model output to post process

[threshold]number0.5

The probability score threshold to keep predicted instance masks.

[mask_threshold]number0.5

Threshold to use when turning the predicted masks into binary values.

[overlap_mask_area_threshold]number0.8

The overlap mask area threshold to merge or discard small disconnected parts within each binary instance mask.

[label_ids_to_fuse]Set.<number>

The labels in this state will have all their instances be fused together.

[target_sizes]*

The target sizes to resize the masks to.


processors~post_process_instance_segmentation(outputs, [threshold], [target_sizes]) β‡’ <code> Array. < {segmentation: Tensor, segments_info: Array < {id: number, label_id: number, score: number} > } > </code>

Post-processes the outputs of the model (for instance segmentation).

Kind: inner method of processors

ParamTypeDefaultDescription
outputs*

Raw outputs of the model.

[threshold]number0.5

The probability score threshold to keep predicted instance masks.

[target_sizes]*

List of tuples corresponding to the requested final size (height, width) of each prediction. If unset, predictions will not be resized.


processors~enforce_size_divisibility(size, divisor) β‡’ <code> * </code>

Rounds the height and width down to the closest multiple of size_divisibility

Kind: inner method of processors
Returns: * - The rounded size.

ParamTypeDescription
size*

The size of the image

divisornumber

The divisor to use.


processors~HeightWidth : <code> * </code>

Named tuple to indicate the order we are using is (height x width), even though the Graphics’ industry standard is (width x height).

Kind: inner typedef of processors


processors~ImageFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors
Properties

NameTypeDescription
pixel_valuesTensor

The pixel values of the batched preprocessed images.

original_sizesArray.<HeightWidth>

Array of two-dimensional tuples like [[480, 640]].

reshaped_input_sizesArray.<HeightWidth>

Array of two-dimensional tuples like [[1000, 1330]].


processors~PreprocessedImage : <code> object </code>

Kind: inner typedef of processors
Properties

NameTypeDescription
original_sizeHeightWidth

The original size of the image.

reshaped_input_sizeHeightWidth

The reshaped input size of the image.

pixel_valuesTensor

The pixel values of the preprocessed image.


processors~DetrFeatureExtractorResult : <code> object </code>

Kind: inner typedef of processors
Properties

NameType
pixel_maskTensor

processors~SamImageProcessorResult : <code> object </code>

Kind: inner typedef of processors
Properties

NameType
pixel_valuesTensor
original_sizesArray.<HeightWidth>
reshaped_input_sizesArray.<HeightWidth>
[input_points]Tensor
[input_labels]Tensor
[input_boxes]Tensor

< > Update on GitHub