AdaLoRA
AdaLoRA is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters.
The abstract from the paper is:
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA.
AdaLoraConfig
class peft.AdaLoraConfig
< source >( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False r: int = 8 target_modules: Optional[Union[list[str], str]] = None exclude_modules: Optional[Union[list[str], str]] = None lora_alpha: int = 8 lora_dropout: float = 0.0 fan_in_fan_out: bool = False bias: Literal['none', 'all', 'lora_only'] = 'none' use_rslora: bool = False modules_to_save: Optional[list[str]] = None init_lora_weights: bool | Literal['gaussian', 'eva', 'olora', 'pissa', 'pissa_niter_[number of iters]', 'loftq'] = True layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None rank_pattern: typing.Optional[dict] = None alpha_pattern: Optional[dict] = <factory> megatron_config: Optional[dict] = None megatron_core: Optional[str] = 'megatron.core' loftq_config: Union[LoftQConfig, dict] = <factory> eva_config: Optional[EvaConfig] = None use_dora: bool = False layer_replication: Optional[list[tuple[int, int]]] = None runtime_config: LoraRuntimeConfig = <factory> target_r: int = 8 init_r: int = 12 tinit: int = 0 tfinal: int = 0 deltaT: int = 1 beta1: float = 0.85 beta2: float = 0.85 orth_reg_weight: float = 0.5 total_step: typing.Optional[int] = None )
Parameters
- target_r (
int
) — The target average rank of incremental matrix. - init_r (
int
) — The initial rank for each incremental matrix. - tinit (
int
) — The steps of initial fine-tuning warmup. - tfinal (
int
) — The step of final fine-tuning. - deltaT (
int
) — The time internval between two budget allocations. - beta1 (
float
) — The hyperparameter of EMA for sensitivity smoothing. - beta2 (
float
) — The hyperparameter of EMA for undertainty quantification. - orth_reg_weight (
float
) — The coefficient of orthogonal regularization. - total_step (
int
) — The total training steps that should be specified before training. - rank_pattern (
list
) — The allocated rank for each weight matrix by RankAllocator.
This is the configuration class to store the configuration of a ~peft.AdaLora
.
AdaLoraModel
class peft.AdaLoraModel
< source >( model config adapter_name ) → torch.nn.Module
Parameters
- model ([transformers.PreTrainedModel]) — The model to be adapted.
- config ([AdaLoraConfig]) — The configuration of the AdaLora model.
- adapter_name (str) — The name of the adapter, defaults to “default”.
- low_cpu_mem_usage (bool, optional, defaults to False) — Create empty adapter weights on meta device. Useful to speed up the loading process.
Returns
torch.nn.Module
The AdaLora model.
Creates AdaLoRA (Adaptive LoRA) model from a pretrained transformers model. Paper: https://openreview.net/forum?id=lq62uWRJjiY
Example:
>>> from transformers import AutoModelForSeq2SeqLM >>> from peft import LoraConfig, AdaLoraModel, AdaLoraConfig
>>> config = AdaLoraConfig(
peft_type="ADALORA", task_type="SEQ_2_SEQ_LM", init_r=12, lora_alpha=32, target_modules=["q", "v"],
lora_dropout=0.01,
)
>>> model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") >>> model = AdaLoraModel(model, config, "default")
Attributes:
- model ([transformers.PreTrainedModel]) — The model to be adapted.
- peft_config ([AdaLoraConfig]): The configuration of the AdaLora model.
This method is not supported for AdaLoRA, use LoRA instead.
update_and_allocate
< source >( global_step )
This method updates Adalora budget and mask.
This should be called in every training step after loss.backward()
and before zero_grad()
.
tinit
, tfinal
and deltaT
are handled with in the method.