Utilities for Megatron-LM

class accelerate.utils.MegatronLMPlugin

( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activations: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: List = None no_wd_decay_cond: Optional = None scale_lr_cond: Optional = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: Optional = None custom_train_step_kwargs: Optional = None custom_model_provider_function: Optional = None custom_prepare_model_function: Optional = None custom_megatron_datasets_provider_function: Optional = None custom_get_batch_function: Optional = None custom_loss_function: Optional = None other_megatron_args: Optional = None )

Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.

class accelerate.utils.MegatronLMDummyScheduler

< source >

( optimizer total_num_steps = None warmup_num_steps = 0 **kwargs )

Parameters

optimizer (torch.optim.optimizer.Optimizer) — The optimizer to wrap.
total_num_steps (int) — Total number of steps.
warmup_num_steps (int) — Number of steps for warmup.
**kwargs (additional keyword arguments, optional) — Other arguments.

Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.