BOFT
Orthogonal Butterfly (BOFT) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm — Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
The abstract from the paper is:
Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm — Orthogonal Finetuning (OFT) — for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
BOFTConfig
class peft.BOFTConfig
< source >( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False boft_block_size: int = 4 boft_block_num: int = 0 boft_n_butterfly_factor: int = 1 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None boft_dropout: float = 0.0 fan_in_fan_out: bool = False bias: str = 'none' modules_to_save: Optional[list[str]] = None init_weights: bool = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None )
Parameters
- boft_block_size (
int
) — BOFT block size across different layers. - boft_block_num (
int
) — Number of BOFT blocks per injected layer. - boft_n_butterfly_factor (
int
) — Number of butterfly factors across different layers. - target_modules (
Union[List[str],str]
) — The names of the modules to apply the adapter to. - exclude_modules (
Optional[Union[List[str], str]]
) — The names of the modules to not apply the adapter. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. - boft_dropout (
float
) — The multiplicative dropout probability, by setting OFT blocks to identity during training, similar to the dropout layer in LoRA. - fan_in_fan_out (
bool
) — Set this to True if the layer to replace stores weight like (fan_in, fan_out). For example, gpt-2 usesConv1D
which stores weights like (fan_in, fan_out) and hence this should be set toTrue
. - bias (
str
) — Bias type for BOFT. Can be ‘none’, ‘all’ or ‘boft_only’. If ‘all’ or ‘boft_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation. - modules_to_save (
List[str]
) —List of modules apart from BOFT layers to be set as trainable and saved in the final checkpoint. - layers_to_transform (
Union[List[int],int]
) — The layer indexes to transform, if this argument is specified, it will apply the BOFT transformations on the layer indexes that are specified in this list. If a single integer is passed, it will apply the BOFT transformations on the layer at this index. - layers_pattern (
Optional[Union[List[str], str]]
) — The layer pattern name, used only iflayers_to_transform
is different fromNone
and if the layer pattern is not in the common layers pattern. This should target thenn.ModuleList
of the model, which is often called'layers'
or'h'
.
This is the configuration class to store the configuration of a BOFTModel.
BOFTModel
class peft.BOFTModel
< source >( model config adapter_name low_cpu_mem_usage: bool = False ) → torch.nn.Module
Parameters
- model ([transformers.PreTrainedModel]) — The model to be adapted.
- config ([BOFTConfig]) — The configuration of the BOFT model.
- adapter_name (str) — The name of the adapter, defaults to “default”.
- low_cpu_mem_usage (bool, optional, defaults to False) — Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns
torch.nn.Module
The BOFT model.
Creates BOFT and OFT model from a pretrained transformers model. Paper: https://arxiv.org/abs/2311.06243 https://arxiv.org/abs/2306.07280
Example:
>>> import transformers >>> from transformers import AutoModelForSeq2SeqLM, BOFTConfig >>> from peft import
BOFTConfig, get_peft_model
>>> config = BOFTConfig( ... boft_block_size=8, ... boft_n_butterfly_factor=1, ... target_modules=["query",
"value", "key", "output.dense", "mlp.fc1", "mlp.fc2"], ... boft_dropout=0.1, ... bias="boft_only", ...
modules_to_save=["classifier"], ... )
>>> model = transformers.Dinov2ForImageClassification.from_pretrained( ... "facebook/dinov2-large", ...
num_labels=100, ... ) >>> boft_model = get_peft_model(model, config)
Attributes:
- model ([transformers.PreTrainedModel]) — The model to be adapted.
- peft_config ([BOFTConfig]): The configuration of the BOFT model.
delete_adapter
< source >( adapter_name: str )
Deletes an existing adapter.
merge_and_unload
< source >( progressbar: bool = False safe_merge: bool = False adapter_names: typing.Optional[typing.List[str]] = None )
Parameters
- progressbar (
bool
) — whether to show a progressbar indicating the unload and merge process - safe_merge (
bool
) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights - adapter_names (
List[str]
, optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults toNone
.
This method merges the BOFT layers into the base model. This is needed if someone wants to use the base model as a standalone model.
Gets back the base model by removing all the boft modules without merging. This gives back the original base model.