PEFT documentation

BOFT

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.13.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

BOFT

Orthogonal Butterfly (BOFT) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm — Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.

The abstract from the paper is:

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm — Orthogonal Finetuning (OFT) — for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

BOFTConfig

class peft.BOFTConfig

< >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False boft_block_size: int = 4 boft_block_num: int = 0 boft_n_butterfly_factor: int = 1 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None boft_dropout: float = 0.0 fan_in_fan_out: bool = False bias: str = 'none' modules_to_save: Optional[list[str]] = None init_weights: bool = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None )

Parameters

  • boft_block_size (int) — BOFT block size across different layers.
  • boft_block_num (int) — Number of BOFT blocks per injected layer.
  • boft_n_butterfly_factor (int) — Number of butterfly factors across different layers.
  • target_modules (Union[List[str],str]) — The names of the modules to apply the adapter to.
  • exclude_modules (Optional[Union[List[str], str]]) — The names of the modules to not apply the adapter. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings.
  • boft_dropout (float) — The multiplicative dropout probability, by setting OFT blocks to identity during training, similar to the dropout layer in LoRA.
  • fan_in_fan_out (bool) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 uses Conv1D which stores weights like (fan_in, fan_out) and hence this should be set to True.
  • bias (str) — Bias type for BOFT. Can be ‘none’, ‘all’ or ‘boft_only’. If ‘all’ or ‘boft_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
  • modules_to_save (List[str]) —List of modules apart from BOFT layers to be set as trainable and saved in the final checkpoint.
  • layers_to_transform (Union[List[int],int]) — The layer indexes to transform, if this argument is specified, it will apply the BOFT transformations on the layer indexes that are specified in this list. If a single integer is passed, it will apply the BOFT transformations on the layer at this index.
  • layers_pattern (Optional[Union[List[str], str]]) — The layer pattern name, used only if layers_to_transform is different from None and if the layer pattern is not in the common layers pattern. This should target the nn.ModuleList of the model, which is often called 'layers' or 'h'.

This is the configuration class to store the configuration of a BOFTModel.

BOFTModel

class peft.BOFTModel

< >

( model config adapter_name low_cpu_mem_usage: bool = False ) torch.nn.Module

Parameters

  • model ([transformers.PreTrainedModel]) — The model to be adapted.
  • config ([BOFTConfig]) — The configuration of the BOFT model.
  • adapter_name (str) — The name of the adapter, defaults to “default”.
  • low_cpu_mem_usage (bool, optional, defaults to False) — Create empty adapter weights on meta device. Useful to speed up the loading process.

Returns

torch.nn.Module

The BOFT model.

Creates BOFT and OFT model from a pretrained transformers model. Paper: https://arxiv.org/abs/2311.06243 https://arxiv.org/abs/2306.07280

Example:

>>> import transformers >>> from transformers import AutoModelForSeq2SeqLM, BOFTConfig >>> from peft import
BOFTConfig, get_peft_model

>>> config = BOFTConfig( ... boft_block_size=8, ... boft_n_butterfly_factor=1, ... target_modules=["query",
"value", "key", "output.dense", "mlp.fc1", "mlp.fc2"], ... boft_dropout=0.1, ... bias="boft_only", ...
modules_to_save=["classifier"], ... )

>>> model = transformers.Dinov2ForImageClassification.from_pretrained( ... "facebook/dinov2-large", ...
num_labels=100, ... ) >>> boft_model = get_peft_model(model, config)

Attributes:

  • model ([transformers.PreTrainedModel]) — The model to be adapted.
  • peft_config ([BOFTConfig]): The configuration of the BOFT model.

delete_adapter

< >

( adapter_name: str )

Parameters

  • adapter_name (str) — Name of the adapter to be deleted.

Deletes an existing adapter.

merge_and_unload

< >

( progressbar: bool = False safe_merge: bool = False adapter_names: typing.Optional[typing.List[str]] = None )

Parameters

  • progressbar (bool) — whether to show a progressbar indicating the unload and merge process
  • safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights
  • adapter_names (List[str], optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults to None.

This method merges the BOFT layers into the base model. This is needed if someone wants to use the base model as a standalone model.

unload

< >

( )

Gets back the base model by removing all the boft modules without merging. This gives back the original base model.

< > Update on GitHub