Transformers.js documentation

processors

Join the Model Database community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

processors

Processors are used to prepare non-textual inputs (e.g., image or audio) for a model.

Example: Using a WhisperProcessor to prepare an audio input for a model.

import { AutoProcessor, read_audio } from '@xenova/transformers';

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');
let audio = await read_audio('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac', 16000);
let { input_features } = await processor(audio);
// Tensor {
//   data: Float32Array(240000) [0.4752984642982483, 0.5597258806228638, 0.56434166431427, ...],
//   dims: [1, 80, 3000],
//   type: 'float32',
//   size: 240000,
// }

processors
- static
  - .FeatureExtractor ⇐ Callable
    - new FeatureExtractor(config)
  - .ImageFeatureExtractor ⇐ FeatureExtractor
    - new ImageFeatureExtractor(config)
    - .preprocess(image) ⇒ Promise.<any>
    - ._call(images) ⇒ Promise.<Object>
  - .DetrFeatureExtractor ⇐ ImageFeatureExtractor
    - ._call(urls) ⇒ Promise.<Object>
    - .post_process_object_detection() : post_process_object_detection
    - .remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *
    - .check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *
    - .compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *
    - .post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>
  - .Processor ⇐ Callable
    - new Processor(feature_extractor)
    - ._call(input) ⇒ Promise.<any>
  - .WhisperProcessor ⇐ Processor
    - ._call(audio) ⇒ Promise.<any>
  - .AutoProcessor
    - .from_pretrained(pretrained_model_name_or_path, options) ⇒ Promise.<Processor>
- inner
  - ~center_to_corners_format(arr) ⇒ Array.<number>
  - ~post_process_object_detection(outputs) ⇒ Array.<Object>
    - ~box : Array.<number>
  - ~PretrainedOptions : *

processors.FeatureExtractor ⇐ `Callable`

Base class for feature extractors.

Kind: static class of processors
Extends: Callable

`new FeatureExtractor(config)`

Constructs a new FeatureExtractor instance.

Param	Type	Description
config	`Object`	The configuration for the feature extractor.

processors.ImageFeatureExtractor ⇐ `FeatureExtractor`

Feature extractor for image models.

Kind: static class of processors
Extends: FeatureExtractor

.ImageFeatureExtractor ⇐ FeatureExtractor
- new ImageFeatureExtractor(config)
- .preprocess(image) ⇒ Promise.<any>
- ._call(images) ⇒ Promise.<Object>

`new ImageFeatureExtractor(config)`

Constructs a new ImageFeatureExtractor instance.

Param	Type	Description
config	`Object`	The configuration for the feature extractor.
config.image_mean	`Array.<number>`	The mean values for image normalization.
config.image_std	`Array.<number>`	The standard deviation values for image normalization.
config.do_rescale	`boolean`	Whether to rescale the image pixel values to the [0,1] range.
config.rescale_factor	`number`	The factor to use for rescaling the image pixel values.
config.do_normalize	`boolean`	Whether to normalize the image pixel values.
config.do_resize	`boolean`	Whether to resize the image.
config.resample	`number`	What method to use for resampling.
config.size	`number`	The size to resize the image to.

`imageFeatureExtractor.preprocess(image)` ⇒ `Promise.<any>`

Preprocesses the given image.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<any> - The preprocessed image as a Tensor.

Param	Type	Description
image	`RawImage`	The image to preprocess.

`imageFeatureExtractor._call(images)` ⇒ `Promise.<Object>`

Calls the feature extraction process on an array of image URLs, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of ImageFeatureExtractor
Returns: Promise.<Object> - An object containing the concatenated pixel values (and other metadata) of the preprocessed images.

Param	Type	Description
images	`any`	The URL(s) of the image(s) to extract features from.

processors.DetrFeatureExtractor ⇐ `ImageFeatureExtractor`

Detr Feature Extractor.

Kind: static class of processors
Extends: ImageFeatureExtractor

.DetrFeatureExtractor ⇐ ImageFeatureExtractor
- ._call(urls) ⇒ Promise.<Object>
- .post_process_object_detection() : post_process_object_detection
- .remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *
- .check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *
- .compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *
- .post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>

`detrFeatureExtractor._call(urls)` ⇒ `Promise.<Object>`

Calls the feature extraction process on an array of image URLs, preprocesses each image, and concatenates the resulting features into a single Tensor.

Kind: instance method of DetrFeatureExtractor
Returns: Promise.<Object> - An object containing the concatenated pixel values of the preprocessed images.

Param	Type	Description
urls	`any`	The URL(s) of the image(s) to extract features from.

`detrFeatureExtractor.post_process_object_detection()` : `post_process_object_detection`

Kind: instance method of DetrFeatureExtractor

`detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels)` ⇒ `*`

Binarize the given masks using object_mask_threshold, it returns the associated values of masks, scores and labels.

Kind: instance method of DetrFeatureExtractor
Returns: * - The binarized masks, the scores, and the labels.

Param	Type	Description
class_logits	`Tensor`	The class logits.
mask_logits	`Tensor`	The mask logits.
object_mask_threshold	`number`	A number between 0 and 1 used to binarize the masks.
num_labels	`number`	The number of labels.

`detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold)` ⇒ `*`

Checks whether the segment is valid or not.

Kind: instance method of DetrFeatureExtractor
Returns: * - Whether the segment is valid or not, and the indices of the valid labels.

Param	Type	Default	Description
mask_labels	`Int32Array`		Labels for each pixel in the mask.
mask_probs	`Array.<Tensor>`		Probabilities for each pixel in the masks.
k	`number`		The class id of the segment.
mask_threshold	`number`	`0.5`	The mask threshold.
overlap_mask_area_threshold	`number`	`0.8`	The overlap mask area threshold.

`detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size)` ⇒ `*`

Computes the segments.

Kind: instance method of DetrFeatureExtractor
Returns: * - The computed segments.

Param	Type	Description
mask_probs	`Array.<Tensor>`	The mask probabilities.
pred_scores	`Array.<number>`	The predicted scores.
pred_labels	`Array.<number>`	The predicted labels.
mask_threshold	`number`	The mask threshold.
overlap_mask_area_threshold	`number`	The overlap mask area threshold.
label_ids_to_fuse	`Set.<number>`	The label ids to fuse.
target_size	`Array.<number>`	The target size of the image.

`detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes])` ⇒ `Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>`

Post-process the model output to generate the final panoptic segmentation.

Kind: instance method of DetrFeatureExtractor

Param	Type	Default	Description
outputs	`*`		The model output to post process
[threshold]	`number`	`0.5`	The probability score threshold to keep predicted instance masks.
[mask_threshold]	`number`	`0.5`	Threshold to use when turning the predicted masks into binary values.
[overlap_mask_area_threshold]	`number`	`0.8`	The overlap mask area threshold to merge or discard small disconnected parts within each binary instance mask.
[label_ids_to_fuse]	`Set.<number>`		The labels in this state will have all their instances be fused together.
[target_sizes]	`Array.<Array<number>>`		The target sizes to resize the masks to.

processors.Processor ⇐ `Callable`

Represents a Processor that extracts features from an input.

Kind: static class of processors
Extends: Callable

.Processor ⇐ Callable
- new Processor(feature_extractor)
- ._call(input) ⇒ Promise.<any>

`new Processor(feature_extractor)`

Creates a new Processor with the given feature extractor.

Param	Type	Description
feature_extractor	`FeatureExtractor`	The function used to extract features from the input.

`processor._call(input)` ⇒ `Promise.<any>`

Calls the feature_extractor function with the given input.

Kind: instance method of Processor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

Param	Type	Description
input	`any`	The input to extract features from.

processors.WhisperProcessor ⇐ `Processor`

Represents a WhisperProcessor that extracts features from an audio input.

Kind: static class of processors
Extends: Processor

`whisperProcessor._call(audio)` ⇒ `Promise.<any>`

Calls the feature_extractor function with the given audio input.

Kind: instance method of WhisperProcessor
Returns: Promise.<any> - A Promise that resolves with the extracted features.

Param	Type	Description
audio	`any`	The audio input to extract features from.

processors.AutoProcessor

Helper class which is used to instantiate pretrained processors with the from_pretrained function. The chosen processor class is determined by the type specified in the processor config.

Example: Load a processor using from_pretrained.

let processor = await AutoProcessor.from_pretrained('openai/whisper-tiny.en');

Example: Run an image through a processor.

let processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// {
//   "pixel_values": {
//     "dims": [ 1, 3, 224, 224 ],
//     "type": "float32",
//     "data": Float32Array [ -1.558687686920166, -1.558687686920166, -1.5440893173217773, ... ],
//     "size": 150528
//   },
//   "original_sizes": [
//     [ 533, 800 ]
//   ],
//   "reshaped_input_sizes": [
//     [ 224, 224 ]
//   ]
// }

Kind: static class of processors

`AutoProcessor.from_pretrained(pretrained_model_name_or_path, options)` ⇒ `Promise.<Processor>`

Instantiate one of the processor classes of the library from a pretrained model.

The processor class to instantiate is selected based on the feature_extractor_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible)

Kind: static method of AutoProcessor
Returns: Promise.<Processor> - A new instance of the Processor class.

Param Type Description

pretrained_model_name_or_path

Param	Type	Description
pretrained_model_name_or_path	`string`	The name or path of the pretrained model. Can be either: A string, the model id of a pretrained processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`. A path to a directory containing processor files, e.g., `./my_model_directory/`.
options	`PretrainedOptions`	Additional options for loading the processor.

string

The name or path of the pretrained model. Can be either:

A string, the model id of a pretrained processor hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.
A path to a directory containing processor files, e.g., ./my_model_directory/.

options

PretrainedOptions

Additional options for loading the processor.

`processors~center_to_corners_format(arr)` ⇒ `Array.<number>`

Converts bounding boxes from center format to corners format.

Kind: inner method of processors
Returns: Array.<number> - The coodinates for the top-left and bottom-right corners of the box (top_left_x, top_left_y, bottom_right_x, bottom_right_y)

Param	Type	Description
arr	`Array.<number>`	The coordinate for the center of the box and its width, height dimensions (center_x, center_y, width, height)

`processors~post_process_object_detection(outputs)` ⇒ `Array.<Object>`

Post-processes the outputs of the model (for object detection).

Kind: inner method of processors
Returns: Array.<Object> - An array of objects containing the post-processed outputs.

Param	Type	Description
outputs	`Object`	The outputs of the model that must be post-processed
outputs.logits	`Tensor`	The logits
outputs.pred_boxes	`Tensor`	The predicted boxes.

`post_process_object_detection~box` : `Array.<number>`

Kind: inner property of post_process_object_detection

`processors~PretrainedOptions` : `*`

Kind: inner typedef of processors

←Tokenizers Configs→

Transformers.js

processors

processors.FeatureExtractor ⇐ Callable

new FeatureExtractor(config)

processors.ImageFeatureExtractor ⇐ FeatureExtractor

new ImageFeatureExtractor(config)

imageFeatureExtractor.preprocess(image) ⇒ Promise.<any>

imageFeatureExtractor._call(images) ⇒ Promise.<Object>

processors.DetrFeatureExtractor ⇐ ImageFeatureExtractor

detrFeatureExtractor._call(urls) ⇒ Promise.<Object>

detrFeatureExtractor.post_process_object_detection() : post_process_object_detection

detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels) ⇒ *

detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold) ⇒ *

detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size) ⇒ *

detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes]) ⇒ Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>

processors.Processor ⇐ Callable

new Processor(feature_extractor)

processor._call(input) ⇒ Promise.<any>

processors.WhisperProcessor ⇐ Processor

whisperProcessor._call(audio) ⇒ Promise.<any>

processors.AutoProcessor

AutoProcessor.from_pretrained(pretrained_model_name_or_path, options) ⇒ Promise.<Processor>

processors~center_to_corners_format(arr) ⇒ Array.<number>

processors~post_process_object_detection(outputs) ⇒ Array.<Object>

post_process_object_detection~box : Array.<number>

processors~PretrainedOptions : *

processors.FeatureExtractor ⇐ `Callable`

`new FeatureExtractor(config)`

processors.ImageFeatureExtractor ⇐ `FeatureExtractor`

`new ImageFeatureExtractor(config)`

`imageFeatureExtractor.preprocess(image)` ⇒ `Promise.<any>`

`imageFeatureExtractor._call(images)` ⇒ `Promise.<Object>`

processors.DetrFeatureExtractor ⇐ `ImageFeatureExtractor`

`detrFeatureExtractor._call(urls)` ⇒ `Promise.<Object>`

`detrFeatureExtractor.post_process_object_detection()` : `post_process_object_detection`

`detrFeatureExtractor.remove_low_and_no_objects(class_logits, mask_logits, object_mask_threshold, num_labels)` ⇒ `*`

`detrFeatureExtractor.check_segment_validity(mask_labels, mask_probs, k, mask_threshold, overlap_mask_area_threshold)` ⇒ `*`

`detrFeatureExtractor.compute_segments(mask_probs, pred_scores, pred_labels, mask_threshold, overlap_mask_area_threshold, label_ids_to_fuse, target_size)` ⇒ `*`

`detrFeatureExtractor.post_process_panoptic_segmentation(outputs, [threshold], [mask_threshold], [overlap_mask_area_threshold], [label_ids_to_fuse], [target_sizes])` ⇒ `Array.<{segmentation: Tensor, segments_info: Array<{id: number, label_id: number, score: number}>}>`

processors.Processor ⇐ `Callable`

`new Processor(feature_extractor)`

`processor._call(input)` ⇒ `Promise.<any>`

processors.WhisperProcessor ⇐ `Processor`

`whisperProcessor._call(audio)` ⇒ `Promise.<any>`

`AutoProcessor.from_pretrained(pretrained_model_name_or_path, options)` ⇒ `Promise.<Processor>`

`processors~center_to_corners_format(arr)` ⇒ `Array.<number>`

`processors~post_process_object_detection(outputs)` ⇒ `Array.<Object>`

`post_process_object_detection~box` : `Array.<number>`

`processors~PretrainedOptions` : `*`