Edit model card

Training Hyperparameters

  • evaluation_strategy: epoch
  • prediction_loss_only: False
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: 1
  • eval_delay: 0
  • learning_rate: 0.0004
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 0.3
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_strategy: steps
  • logging_first_step: False
  • logging_steps: 500
  • logging_nan_inf_filter: True
  • save_strategy: epoch
  • save_steps: 500
  • save_total_limit: 5
  • save_safetensors: True
  • save_on_each_node: False
  • no_cuda: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • eval_steps: None
  • dataloader_num_workers: 0
  • past_index: -1
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • metric_for_best_model: eval_loss
  • greater_is_better: False
  • ignore_data_skip: False
  • sharded_ddp: []
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_strategy: all_checkpoints
  • gradient_checkpointing: True

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False

  • load_in_4bit: True

  • llm_int8_threshold: 6.0

  • llm_int8_skip_modules: None

  • llm_int8_enable_fp32_cpu_offload: False

  • llm_int8_has_fp16_weight: False

  • bnb_4bit_quant_type: nf4

  • bnb_4bit_use_double_quant: True

  • bnb_4bit_compute_dtype: bfloat16

Framework versions

  • PEFT 0.4.0
Downloads last month
7
Inference Examples
Inference API (serverless) does not yet support peft models for this pipeline type.

Model tree for Weni/ZeroShot-2.2.1-Llama2-13b-Multilanguage-3.0.3

Adapter
(2)
this model

Dataset used to train Weni/ZeroShot-2.2.1-Llama2-13b-Multilanguage-3.0.3