utils/generation
Classes, functions, and utilities for generation.
Todo
- Describe how to create a custom
GenerationConfig
.
- utils/generation
- static
- .LogitsProcessorList β
Callable
- .LogitsProcessor β
Callable
- .ForceTokensLogitsProcessor β
LogitsProcessor
- .ForcedBOSTokenLogitsProcessor β
LogitsProcessor
- .ForcedEOSTokenLogitsProcessor β
LogitsProcessor
- .SuppressTokensAtBeginLogitsProcessor β
LogitsProcessor
- .WhisperTimeStampLogitsProcessor β
LogitsProcessor
- .NoRepeatNGramLogitsProcessor β
LogitsProcessor
new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)
.getNgrams(prevInputIds)
βMap.<string, Array<number>>
.getGeneratedNgrams(bannedNgrams, prevInputIds)
βArray.<number>
.calcBannedNgramTokens(prevInputIds)
βArray.<number>
._call(input_ids, logits)
βObject
- .RepetitionPenaltyLogitsProcessor β
LogitsProcessor
- .MinLengthLogitsProcessor β
LogitsProcessor
- .MinNewTokensLengthLogitsProcessor β
LogitsProcessor
- .NoBadWordsLogitsProcessor
- .Sampler
new Sampler(generation_config)
- instance
._call(logits, index)
βvoid
.sample(logits, index)
.getLogits(logits, index)
βFloat32Array
.randomSelect(probabilities)
βnumber
- static
.getSampler(generation_config)
βSampler
.GenerationConfig
:*
- .LogitsProcessorList β
- inner
- ~GenerationConfig
- ~GreedySampler β
Sampler
.sample(logits, [index])
βArray
- ~MultinomialSampler β
Sampler
.sample(logits, index)
βArray
- ~BeamSearchSampler β
Sampler
.sample(logits, index)
βArray
~GenerationConfigType
:Object
- static
utils/generation.LogitsProcessorList β <code> Callable </code>
A class representing a list of logits processors. A logits processor is a function that modifies the logits output of a language model. This class provides methods for adding new processors and applying all processors to a batch of logits.
Kind: static class of utils/generation
Extends: Callable
- .LogitsProcessorList β
Callable
new LogitsProcessorList()
Constructs a new instance of LogitsProcessorList
.
logitsProcessorList.push(item)
Adds a new logits processor to the list.
Kind: instance method of LogitsProcessorList
Param | Type | Description |
---|---|---|
item | LogitsProcessor | The logits processor function to add. |
logitsProcessorList.extend(items)
Adds multiple logits processors to the list.
Kind: instance method of LogitsProcessorList
Param | Type | Description |
---|---|---|
items | Array.<LogitsProcessor> | The logits processor functions to add. |
logitsProcessorList._call(input_ids, batchedLogits)
Applies all logits processors in the list to a batch of logits, modifying them in-place.
Kind: instance method of LogitsProcessorList
Param | Type | Description |
---|---|---|
input_ids | Array.<number> | The input IDs for the language model. |
batchedLogits | Array.<Array<number>> | A 2D array of logits, where each row corresponds to a single input sequence in the batch. |
utils/generation.LogitsProcessor β <code> Callable </code>
Base class for processing logits.
Kind: static class of utils/generation
Extends: Callable
logitsProcessor._call(input_ids, logits)
Apply the processor to the input logits.
Kind: instance abstract method of LogitsProcessor
Throws:
Error
Throws an error if `_call` is not implemented in the subclass.
Param | Type | Description |
---|---|---|
input_ids | Array | The input ids. |
logits | Tensor | The logits to process. |
utils/generation.ForceTokensLogitsProcessor β <code> LogitsProcessor </code>
A logits processor that forces a specific token to be generated by the decoder.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .ForceTokensLogitsProcessor β
LogitsProcessor
new ForceTokensLogitsProcessor(forced_decoder_ids)
Constructs a new instance of ForceTokensLogitsProcessor
.
Param | Type | Description |
---|---|---|
forced_decoder_ids | Array | The ids of tokens that should be forced. |
forceTokensLogitsProcessor._call(input_ids, logits) β <code> Tensor </code>
Apply the processor to the input logits.
Kind: instance method of ForceTokensLogitsProcessor
Returns: Tensor
- The processed logits.
Param | Type | Description |
---|---|---|
input_ids | Array | The input ids. |
logits | Tensor | The logits to process. |
utils/generation.ForcedBOSTokenLogitsProcessor β <code> LogitsProcessor </code>
A LogitsProcessor that forces a BOS token at the beginning of the generated sequence.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .ForcedBOSTokenLogitsProcessor β
LogitsProcessor
new ForcedBOSTokenLogitsProcessor(bos_token_id)
Create a ForcedBOSTokenLogitsProcessor.
Param | Type | Description |
---|---|---|
bos_token_id | number | The ID of the beginning-of-sequence token to be forced. |
forcedBOSTokenLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply the BOS token forcing to the logits.
Kind: instance method of ForcedBOSTokenLogitsProcessor
Returns: Object
- The logits with BOS token forcing.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.ForcedEOSTokenLogitsProcessor β <code> LogitsProcessor </code>
A logits processor that forces end-of-sequence token probability to 1.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .ForcedEOSTokenLogitsProcessor β
LogitsProcessor
new ForcedEOSTokenLogitsProcessor(max_length, forced_eos_token_id)
Create a ForcedEOSTokenLogitsProcessor.
Param | Type | Description |
---|---|---|
max_length | number | Max length of the sequence. |
forced_eos_token_id | number | Array<number> | The ID of the end-of-sequence token to be forced. |
forcedEOSTokenLogitsProcessor._call(input_ids, logits)
Apply the processor to input_ids and logits.
Kind: instance method of ForcedEOSTokenLogitsProcessor
Param | Type | Description |
---|---|---|
input_ids | Array.<number> | The input ids. |
logits | Tensor | The logits tensor. |
utils/generation.SuppressTokensAtBeginLogitsProcessor β <code> LogitsProcessor </code>
A LogitsProcessor that suppresses a list of tokens as soon as the generate
function starts
generating using begin_index
tokens. This should ensure that the tokens defined by
begin_suppress_tokens
at not sampled at the begining of the generation.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .SuppressTokensAtBeginLogitsProcessor β
LogitsProcessor
new SuppressTokensAtBeginLogitsProcessor(begin_suppress_tokens, begin_index)
Create a SuppressTokensAtBeginLogitsProcessor.
Param | Type | Description |
---|---|---|
begin_suppress_tokens | Array.<number> | The IDs of the tokens to suppress. |
begin_index | number | The number of tokens to generate before suppressing tokens. |
suppressTokensAtBeginLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply the BOS token forcing to the logits.
Kind: instance method of SuppressTokensAtBeginLogitsProcessor
Returns: Object
- The logits with BOS token forcing.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.WhisperTimeStampLogitsProcessor β <code> LogitsProcessor </code>
A LogitsProcessor that handles adding timestamps to generated text.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .WhisperTimeStampLogitsProcessor β
LogitsProcessor
new WhisperTimeStampLogitsProcessor(generate_config)
Constructs a new WhisperTimeStampLogitsProcessor.
Param | Type | Description |
---|---|---|
generate_config | Object | The config object passed to the |
generate_config.eos_token_id | number | The ID of the end-of-sequence token. |
generate_config.no_timestamps_token_id | number | The ID of the token used to indicate that a token should not have a timestamp. |
[generate_config.forced_decoder_ids] | Array.<Array<number>> | An array of two-element arrays representing decoder IDs that are forced to appear in the output. The second element of each array indicates whether the token is a timestamp. |
[generate_config.max_initial_timestamp_index] | number | The maximum index at which an initial timestamp can appear. |
whisperTimeStampLogitsProcessor._call(input_ids, logits) β <code> Tensor </code>
Modify the logits to handle timestamp tokens.
Kind: instance method of WhisperTimeStampLogitsProcessor
Returns: Tensor
- The modified logits.
Param | Type | Description |
---|---|---|
input_ids | Array | The input sequence of tokens. |
logits | Tensor | The logits output by the model. |
utils/generation.NoRepeatNGramLogitsProcessor β <code> LogitsProcessor </code>
A logits processor that disallows ngrams of a certain size to be repeated.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .NoRepeatNGramLogitsProcessor β
LogitsProcessor
new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)
.getNgrams(prevInputIds)
βMap.<string, Array<number>>
.getGeneratedNgrams(bannedNgrams, prevInputIds)
βArray.<number>
.calcBannedNgramTokens(prevInputIds)
βArray.<number>
._call(input_ids, logits)
βObject
new NoRepeatNGramLogitsProcessor(no_repeat_ngram_size)
Create a NoRepeatNGramLogitsProcessor.
Param | Type | Description |
---|---|---|
no_repeat_ngram_size | number | The no-repeat-ngram size. All ngrams of this size can only occur once. |
noRepeatNGramLogitsProcessor.getNgrams(prevInputIds) β <code> Map. < string, Array < number > > </code>
Generate n-grams from a sequence of token ids.
Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Map.<string, Array<number>>
- Map of generated n-grams
Param | Type | Description |
---|---|---|
prevInputIds | Array.<number> | List of previous input ids |
noRepeatNGramLogitsProcessor.getGeneratedNgrams(bannedNgrams, prevInputIds) β <code> Array. < number > </code>
Generate n-grams from a sequence of token ids.
Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Array.<number>
- Map of generated n-grams
Param | Type | Description |
---|---|---|
bannedNgrams | Map.<string, Array<number>> | Map of banned n-grams |
prevInputIds | Array.<number> | List of previous input ids |
noRepeatNGramLogitsProcessor.calcBannedNgramTokens(prevInputIds) β <code> Array. < number > </code>
Calculate banned n-gram tokens
Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Array.<number>
- Map of generated n-grams
Param | Type | Description |
---|---|---|
prevInputIds | Array.<number> | List of previous input ids |
noRepeatNGramLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply the no-repeat-ngram processor to the logits.
Kind: instance method of NoRepeatNGramLogitsProcessor
Returns: Object
- The logits with no-repeat-ngram processing.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.RepetitionPenaltyLogitsProcessor β <code> LogitsProcessor </code>
A logits processor that penalises repeated output tokens.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .RepetitionPenaltyLogitsProcessor β
LogitsProcessor
new RepetitionPenaltyLogitsProcessor(penalty)
Create a RepetitionPenaltyLogitsProcessor.
Param | Type | Description |
---|---|---|
penalty | number | The penalty to apply for repeated tokens. |
repetitionPenaltyLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply the repetition penalty to the logits.
Kind: instance method of RepetitionPenaltyLogitsProcessor
Returns: Object
- The logits with repetition penalty processing.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.MinLengthLogitsProcessor β <code> LogitsProcessor </code>
A logits processor that enforces a minimum number of tokens.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .MinLengthLogitsProcessor β
LogitsProcessor
new MinLengthLogitsProcessor(min_length, eos_token_id)
Create a MinLengthLogitsProcessor.
Param | Type | Description |
---|---|---|
min_length | number | The minimum length below which the score of |
eos_token_id | number | Array<number> | The ID/IDs of the end-of-sequence token. |
minLengthLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply logit processor.
Kind: instance method of MinLengthLogitsProcessor
Returns: Object
- The processed logits.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.MinNewTokensLengthLogitsProcessor β <code> LogitsProcessor </code>
A logits processor that enforces a minimum number of new tokens.
Kind: static class of utils/generation
Extends: LogitsProcessor
- .MinNewTokensLengthLogitsProcessor β
LogitsProcessor
new MinNewTokensLengthLogitsProcessor(prompt_length_to_skip, min_new_tokens, eos_token_id)
Create a MinNewTokensLengthLogitsProcessor.
Param | Type | Description |
---|---|---|
prompt_length_to_skip | number | The input tokens length. |
min_new_tokens | number | The minimum new tokens length below which the score of |
eos_token_id | number | Array<number> | The ID/IDs of the end-of-sequence token. |
minNewTokensLengthLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply logit processor.
Kind: instance method of MinNewTokensLengthLogitsProcessor
Returns: Object
- The processed logits.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.NoBadWordsLogitsProcessor
Kind: static class of utils/generation
new NoBadWordsLogitsProcessor(bad_words_ids, eos_token_id)
Create a NoBadWordsLogitsProcessor
.
Param | Type | Description |
---|---|---|
bad_words_ids | Array.<Array<number>> | List of list of token ids that are not allowed to be generated. |
eos_token_id | number | Array<number> | The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens. |
noBadWordsLogitsProcessor._call(input_ids, logits) β <code> Object </code>
Apply logit processor.
Kind: instance method of NoBadWordsLogitsProcessor
Returns: Object
- The processed logits.
Param | Type | Description |
---|---|---|
input_ids | Array | The input IDs. |
logits | Object | The logits. |
utils/generation.Sampler
Sampler is a base class for all sampling methods used for text generation.
Kind: static class of utils/generation
- .Sampler
new Sampler(generation_config)
- instance
._call(logits, index)
βvoid
.sample(logits, index)
.getLogits(logits, index)
βFloat32Array
.randomSelect(probabilities)
βnumber
- static
.getSampler(generation_config)
βSampler
new Sampler(generation_config)
Creates a new Sampler object with the specified generation config.
Param | Type | Description |
---|---|---|
generation_config | GenerationConfigType | The generation config. |
sampler._call(logits, index) β <code> void </code>
Executes the sampler, using the specified logits.
Kind: instance method of Sampler
Param | Type |
---|---|
logits | Tensor |
index | number |
sampler.sample(logits, index)
Abstract method for sampling the logits.
Kind: instance method of Sampler
Throws:
Error
Param | Type |
---|---|
logits | Tensor |
index | number |
sampler.getLogits(logits, index) β <code> Float32Array </code>
Returns the specified logits as an array, with temperature applied.
Kind: instance method of Sampler
Param | Type |
---|---|
logits | Tensor |
index | number |
sampler.randomSelect(probabilities) β <code> number </code>
Selects an item randomly based on the specified probabilities.
Kind: instance method of Sampler
Returns: number
- The index of the selected item.
Param | Type | Description |
---|---|---|
probabilities | Array | An array of probabilities to use for selection. |
Sampler.getSampler(generation_config) β <code> Sampler </code>
Returns a Sampler object based on the specified options.
Kind: static method of Sampler
Returns: Sampler
- A Sampler object.
Param | Type | Description |
---|---|---|
generation_config | GenerationConfigType | An object containing options for the sampler. |
utils/generation.GenerationConfig : <code> * </code>
Class that holds a configuration for a generation task.
Kind: static constant of utils/generation
utils/generation~GenerationConfig
Kind: inner class of utils/generation
new GenerationConfig(kwargs)
Create a new GenerationConfig object.
Param | Type |
---|---|
kwargs | GenerationConfigType |
utils/generation~GreedySampler β <code> Sampler </code>
Class representing a Greedy Sampler.
Kind: inner class of utils/generation
Extends: Sampler
greedySampler.sample(logits, [index]) β <code> Array </code>
Sample the maximum probability of a given logits tensor.
Kind: instance method of GreedySampler
Returns: Array
- An array with a single tuple, containing the index of the maximum value and a meaningless score (since this is a greedy search).
Param | Type | Default |
---|---|---|
logits | Tensor | |
[index] | number | -1 |
utils/generation~MultinomialSampler β <code> Sampler </code>
Class representing a MultinomialSampler.
Kind: inner class of utils/generation
Extends: Sampler
multinomialSampler.sample(logits, index) β <code> Array </code>
Sample from the logits.
Kind: instance method of MultinomialSampler
Param | Type |
---|---|
logits | Tensor |
index | number |
utils/generation~BeamSearchSampler β <code> Sampler </code>
Class representing a BeamSearchSampler.
Kind: inner class of utils/generation
Extends: Sampler
beamSearchSampler.sample(logits, index) β <code> Array </code>
Sample from the logits.
Kind: instance method of BeamSearchSampler
Param | Type |
---|---|
logits | Tensor |
index | number |
utils/generation~GenerationConfigType : <code> Object </code>
The default configuration parameters.
Kind: inner typedef of utils/generation
Properties
Name | Type | Default | Description |
---|---|---|---|
[max_length] | number | 20 | The maximum length the generated tokens can have. Corresponds to the length of the input prompt + |
[max_new_tokens] | number |
| The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. |
[min_length] | number | 0 | The minimum length of the sequence to be generated. Corresponds to the length of the input prompt + |
[min_new_tokens] | number |
| The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt. |
[early_stopping] | boolean | "never" | false | Controls the stopping condition for beam-based methods, like beam-search. It accepts the following values:
|
[max_time] | number |
| The maximum amount of time you allow the computation to run for in seconds. Generation will still finish the current pass after allocated time has been passed. |
[do_sample] | boolean | false | Whether or not to use sampling; use greedy decoding otherwise. |
[num_beams] | number | 1 | Number of beams for beam search. 1 means no beam search. |
[num_beam_groups] | number | 1 | Number of groups to divide |
[penalty_alpha] | number |
| The values balance the model confidence and the degeneration penalty in contrastive search decoding. |
[use_cache] | boolean | true | Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. |
[temperature] | number | 1.0 | The value used to modulate the next token probabilities. |
[top_k] | number | 50 | The number of highest probability vocabulary tokens to keep for top-k-filtering. |
[top_p] | number | 1.0 | If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to |
[typical_p] | number | 1.0 | Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to |
[epsilon_cutoff] | number | 0.0 | If set to float strictly between 0 and 1, only tokens with a conditional probability greater than |
[eta_cutoff] | number | 0.0 | Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either |
[diversity_penalty] | number | 0.0 | This value is subtracted from a beam's score if it generates a token same as any beam from other group at a particular time. Note that |
[repetition_penalty] | number | 1.0 | The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. |
[encoder_repetition_penalty] | number | 1.0 | The paramater for encoder_repetition_penalty. An exponential penalty on sequences that are not in the original input. 1.0 means no penalty. |
[length_penalty] | number | 1.0 | Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log likelihood of the sequence (i.e. negative), |
[no_repeat_ngram_size] | number | 0 | If set to int > 0, all ngrams of that size can only occur once. |
[bad_words_ids] | Array.<Array<number>> |
| List of token ids that are not allowed to be generated. In order to get the token ids of the words that should not appear in the generated text, use |
[force_words_ids] | Array<Array<number>> | Array<Array<Array<number>>> |
| List of token ids that must be generated. If given a |
[renormalize_logits] | boolean | false | Whether to renormalize the logits after applying all the logits processors or warpers (including the custom ones). It's highly recommended to set this flag to |
[constraints] | Array.<Object> |
| Custom constraints that can be added to the generation to ensure that the output will contain the use of certain tokens as defined by |
[forced_bos_token_id] | number |
| The id of the token to force as the first generated token after the |
[forced_eos_token_id] | number | Array<number> |
| The id of the token to force as the last generated token when |
[remove_invalid_values] | boolean | false | Whether to remove possible nan and inf outputs of the model to prevent the generation method to crash. Note that using |
[exponential_decay_length_penalty] | Array.<number> |
| This Tuple adds an exponentially increasing length penalty, after a certain amount of tokens have been generated. The tuple shall consist of: |
[suppress_tokens] | Array.<number> |
| A list of tokens that will be suppressed at generation. The |
[begin_suppress_tokens] | Array.<number> |
| A list of tokens that will be suppressed at the beginning of the generation. The |
[forced_decoder_ids] | Array.<Array<number>> |
| A list of pairs of integers which indicates a mapping from generation indices to token indices that will be forced before sampling. For example, |
[num_return_sequences] | number | 1 | The number of independently computed returned sequences for each element in the batch. |
[output_attentions] | boolean | false | Whether or not to return the attentions tensors of all attention layers. See |
[output_hidden_states] | boolean | false | Whether or not to return the hidden states of all layers. See |
[output_scores] | boolean | false | Whether or not to return the prediction scores. See |
[return_dict_in_generate] | boolean | false | Whether or not to return a |
[pad_token_id] | number |
| The id of the padding token. |
[bos_token_id] | number |
| The id of the beginning-of-sequence token. |
[eos_token_id] | number | Array<number> |
| The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens. |
[encoder_no_repeat_ngram_size] | number | 0 | If set to int > 0, all ngrams of that size that occur in the |
[decoder_start_token_id] | number |
| If an encoder-decoder model starts decoding with a different token than bos, the id of that token. |
[generation_kwargs] | Object | {} | Additional generation kwargs will be forwarded to the |
< > Update on GitHub