pythia410m-sft-tldr / code /configs /dpo2_costa_1b_20k_fp16.yml

Training in progress, step 500

1904ee8 verified 6 months ago

905 Bytes

	## dpo 2
	pseudo_dataset_name: mnoukhov/summarize_from_feedback_tldr3_generated_relabel_20k_dpo_costa_1b_fp16.yml_3d94f50_b9ff2
	train_split: train[:1]
	max_prompt_length: 512
	max_target_length: 131
	max_length: 640
	lr_scheduler_type: cosine
	## costa stuff
	model_name: vwxyzjn/EleutherAI_pythia-1b-deduped__sft__tldr
	model_revision: sft__55513__1706646024
	dataset_name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1706381144
	tokenizer_name: EleutherAI/pythia-1b-deduped
	prompt_field: query
	eval_split: validation
	## hub stuff
	push_to_hub: True
	push_to_hub_organization: mnoukhov
	## training stuff
	gold_eval: ppl
	eval_steps: 0.2
	save_steps: 0.2
	beta: 0.05
	max_steps: -1
	num_train_epochs: 2
	load_in_8bit: False
	bf16: False
	fp16: True
	learning_rate: 1e-5
	use_peft: True
	lora_r: 16
	lora_alpha: 32
	lora_dropout: 0.
	gradient_accumulation_steps: 4
	per_device_train_batch_size: 4
	per_device_eval_batch_size: 4