Mahou-1.1-llama3-8B
flammenai/Mahou-1.1-llama3-8B finetuned on a Japanese DPO set.
Chat Format
This model has been trained to use ChatML format.
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
ST Settings
- Use ChatML for the Context Template.
- Turn on Instruct Mode for ChatML.
- Use the following stopping strings:
["<", "|", "<|", "\n"]
License
This model is based on Meta Llama-3-8B and is governed by the META LLAMA 3 COMMUNITY LICENSE AGREEMENT.
Method
Finetuned using an A100 on Google Colab.
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=1000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
force_use_ref_model=True
)
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for nbeerbower/KawaiiMahou-llama3-8B
Base model
nbeerbower/llama-3-stella-8B
Finetuned
nbeerbower/llama-3-stella-truthy-dpo-8B
Finetuned
flammenai/Mahou-1.0-llama3-8B
Finetuned
flammenai/Mahou-1.1-llama3-8B