Utilities for Megatron-LM
class accelerate.utils.MegatronLMPlugin
< source >( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activations: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: List = None no_wd_decay_cond: Optional = None scale_lr_cond: Optional = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: Optional = None custom_train_step_kwargs: Optional = None custom_model_provider_function: Optional = None custom_prepare_model_function: Optional = None custom_megatron_datasets_provider_function: Optional = None custom_get_batch_function: Optional = None custom_loss_function: Optional = None other_megatron_args: Optional = None )
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
class accelerate.utils.MegatronLMDummyScheduler
< source >( optimizer total_num_steps = None warmup_num_steps = 0 **kwargs )
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training
Abstract class for batching, forward pass and loss handler.
class accelerate.utils.GPTTrainStep
< source >( accelerator args )
GPT train step class.
class accelerate.utils.BertTrainStep
< source >( accelerator args )
Bert train step class.
class accelerate.utils.T5TrainStep
< source >( accelerator args )
T5 train step class.
accelerate.utils.avg_losses_across_data_parallel_group
< source >( losses )
Average losses across data parallel group.