library_name: transformers
datasets:
- nooynoos/M.O.M_Dataset_GemmaSprint
language:
- ko
base_model:
- unsloth/gemma-2-2b
**Model Card for gemma2-2b-M.O.M-gemma-sprint
This model is fine-tuned from the base google/gemma-2-2b-it
model using the M.O.M dataset.
What is M.O.M Project?
The Motivational Organizer & Mentor (M.O.M.) project is designed to replicate the familiar and persistent encouragement that a caring parent might provide. By leveraging large language models (LLMs), M.O.M. delivers timely reminders, motivational "nags," and personalized feedback to keep users focused and productive. This service helps users manage their tasks by offering gentle yet persistent nudges, task prioritization, and empathetic guidance, ultimately reducing procrastination and boosting accountability.
Model Details
Model Description
M.O.M model uses the "nagging mom" concept to provide motivation to users through warm but persistent reminders based on the tasks they need to accomplish.
This model receives keywords representing the user's daily tasks and turns them into motivational messages delivered in the tone of a loving yet slightly exasperated mother. The model skillfully weaves four provided keywords into a cohesive story, ensuring the tone is warm while also urging the user to take action.
Key Features:
- Input: Keywords representing the tasks the user needs to do.
- Output: Motivational "mom nagging" messages.
- Tone: Warm but persistently urging action.
- Purpose: To motivate users to manage their time effectively and take responsibility for their tasks.
This model helps users stop procrastinating by giving them structured yet loving reminders, encouraging them to be more productive in their daily lives.
Training Procedure
Make Q/A Pairs
First, to fine-tune the Gemma 2b model as M.O.M, a Q/A Pair dataset is required. Typically, a QA Pair dataset can either be manually created or generated by prompting a good model with clear instructions. In my case, I used Prompt Engineering to create 600 Q/A Pairs based on examples I crafted myself. The resulting Q/A Pair dataset can be found at: nooynoos/M.O.M_Dataset_GemmaSprint.
Below is the code for generating the Q/A Pairs.
import json
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate
# Set OpenAI API key
openai_api_key = "" # Enter your API key here
# Define the prompt template
prompt = PromptTemplate.from_template(
"""๋๋ ์ง๊ตฌ์์ ์๋
๋ฅผ ๊ฐ์ฅ ์ฌ๋ํ์ง๋ง, ์์๋ฆฌ๊ฐ ์ ๋ง ๋ง์ ์๋ง์ผ.
ํค์๋๋ 20๋ ์ฒญ๋
์ด ์ผ์์ํ์์ ํด์ผํ๋ ์ผ์ ์ ์ด์ฃผ๋ฉด ๋ผ.
๊ทธ ํค์๋์ ๋ง์ถฐ ์๋ง๊ฐ ์ฌ๋์ค๋ฝ์ง๋ง ์ฝ๊ฐ ์ง์ฆ๋ ๋ฏํ ์์๋ฆฌ๋ก ๋๊ธฐ๋ถ์ฌํด์ฃผ๋ ๋ต๋ณ์ ์์ฑํด์ค.
์๋ง์ ์์๋ฆฌ๋ 4๊ฐ์ ํค์๋๋ฅผ ์ฐ๊ฒฐ๋ ์คํ ๋ฆฌ๋ก ์์ฐ์ค๋ฝ๊ฒ ํฌํจํด์ผ ํด.
์์๋ฆฌ๋ ๋ฐ๋ปํ์ง๋ง ๊พธ์คํ ํ๋์ ์ด๊ตฌํ๋ ํค์ผ๋ก ์์ฑ๋์ด์ผ ํ๊ณ . ๋ค์ ํ์์ ๋ฐ๋ผ ๋ต๋ณ์ ์์ฑํด์ค!:
Format์ ๋ง์ถฐ์, ์๋ก์ด ํค์๋์ ํจ๊ป ์๋ก์ด QA PAIR 5๊ฐ๋ฅผ ์์ฑํด์ฃผ๋ฉด ๋ผ.
#Format:
```json
{{
"QUESTION": "๋ฏธ์์ฌ ๊ณผ์ , ์ฝ๋ฉ ๊ณต๋ถ",
"ANSWER": "๋ฏธ์์ฌ ๊ณผ์ ๋ ์ฝ๋ฉ ๊ณต๋ถ ๋ ๋ค ์ธ์ ํ ๊ฑฐ์ผ? ๊ณผ์ ๋ ๋๋ผ ๊ธฐ๋ฏธ๋ ์ ๋ณด์ด๊ณ , ์ฝ๋ฉ์ ์์๋ ์ ํ์์! ํ๋ฃจ ์ข
์ผ ํธ๋ํฐ๋ง ๋ง์ง์๊ฑฐ๋ฆด ๊ฒ ์๋๋ผ, ๊ทธ ์๊ฐ์ ์ฐจ๋ผ๋ฆฌ ์ฝ๋ฉ์ด๋ผ๋ ์กฐ๊ธ์ฉ ํด๋ฌ. ๊ทธ๋ฆฌ๊ณ ๊ณผ์ ๋ ๋ฏธ๋ฆฌ๋ฏธ๋ฆฌ ํด๋์ผ ๋์ค์ ์ ํ๋ค์ง! ๋ ๊ณผ์ ๋ชฐ์์ ํ๋ค๊ฐ ๋ฐค์ํ ๊น ๋ด ๊ฑฑ์ ๋ผ ์ฃฝ๊ฒ ๋ค."
}},
{{
"QUESTION": "๋ฐฉ ์ ๋ฆฌ, ์๊ธฐ์๊ฐ์ ์์ฑ",
"ANSWER": "๋ฐฉ์ด ์ด๋ ๊ฒ ์ด์ง๋ฌ์ ธ ์์ผ๋ฉด ๋ค ์๊ฐ๋ ์ ๋ฆฌ๊ฐ ์ ๋ ๊ฑฐ์ผ! ๋นจ๋ฆฌ ๋ฐฉ๋ถํฐ ์น์ฐ๊ณ , ์๊ธฐ์๊ฐ์๋ ์ข ์จ! ๋ง๊ฐ์ ์ผ๋ง ์ ๋จ์๋๋ฐ, ๋ค ๋ฐฉ ์ํ๋ ์์์ ์ํ๊ฐ ๋๊ฐ์ ๋ณด์ธ๋ค, ์ง์ง. ๋ฐฉ๊ธ ์น์ฐ๊ณ ์๊ธฐ์๊ฐ์ ์กฐ๊ธ์ฉ ์ฐ๋ฉด ๋ง์๋ ๋ ๊ฐ๋ฒผ์์ง ๊ฑฐ์ผ."
}},
{{
"QUESTION": "Cousera ๊ฐ์, LLM Fine Tuning",
"ANSWER": "Cousera ๊ฐ์ ์ผ๋ฅธ ๋ค์ด์ผ์ง. ์ด๊ฑฐ ๋ง๊ฐ ์ผ๋ง ๋จ์ง ์์์์! Cousera ๊ฐ์ ๋น ๋ฅด๊ฒ ๋ง๋ฌด๋ฆฌ ํด์ผ, LLM Fine Tuning๊น์ง ๋ง๋ฌด๋ฆฌ ํ ์ ์์ง ์๊ฒ ์ด? ์กฐ๊ธ ๋ ์ง์คํด์ ๋นจ๋ฆฌ ํด!"
}}
```
"""
)
# Custom JSON parser function
def custom_json_parser(response):
json_string = response.content.strip().removeprefix("```json\n").removesuffix("\n```").strip()
json_string = f'[{json_string}]'
return json.loads(json_string)
# Configure the chain
chain = (
prompt
| ChatOpenAI(
model="gpt-4o",
temperature=0,
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
openai_api_key=openai_api_key # Use the API key set directly
)
| custom_json_parser
)
# List to store QA pairs
qa_pairs = []
# Repeat 60 times to generate a total of 300 QA pairs
for i in range(1):
response = chain.invoke({"domain": "AI", "num_questions": "3"})
# Add the results to qa_pairs
qa_pairs.extend(response)
# Finally, 300 QA pairs are stored in the qa_pairs list.
print(f"A total of {len(qa_pairs)} QA pairs have been generated.")
And save this dataset as a jsonl file.
from datasets import load_dataset
# Path to the JSONL file
jsonl_file = "qa_pair.jsonl"
# Load the JSONL file as a Dataset
dataset = load_dataset("json", data_files=jsonl_file)
# Save the QA pairs to a JSONL file
Loading/Preparing Training Data
The dataset uploaded to HuggingFace is loaded, and a function is applied to split it into Instruction and Response.
from datasets import load_dataset
# EOS_TOKEN is the token that indicates the end of a sentence. This token must be added.
EOS_TOKEN = tokenizer.eos_token
# Function to format instructions using AlpacaPrompt.
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{}
### Response:
{}"""
# Function to format the given examples.
def formatting_prompts_func(examples):
instructions = examples["instruction"] # Get the instructions.
outputs = examples["output"] # Get the outputs.
texts = [] # List to store the formatted texts.
for instruction, output in zip(instructions, outputs):
# The EOS_TOKEN must be added; otherwise, generation may continue indefinitely.
text = alpaca_prompt.format(instruction, output) + EOS_TOKEN
texts.append(text)
return {
"text": texts, # Return the formatted texts.
}
# Load the dataset from the specified source.
dataset = load_dataset("nooynoos/M.O.M_Dataset_GemmaSprint", split="train")
# Apply the formatting_prompts_func to the dataset with batch processing enabled.
dataset = dataset.map(
formatting_prompts_func,
batched=True,
)
Training the Model
Unsloth
Fine-tune using Unsloth. The reason for using Unsloth is that it supports 16-bit LoRA or 4-bit QLoRA, which allows for faster fine-tuning speeds.
First, use the FastLanguageModel.from_pretrained function to load the pre-trained Gemma 2-2b model.
from unsloth import FastLanguageModel
import torch
max_seq_length = 1024 # Set the maximum sequence length
dtype = None
# Use 4-bit quantization to reduce memory usage
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-2-2b",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # Use if working with gated models like meta-llama/Llama-2-7b-hf
)
Additionally, use the LoRA adapter to update only 1โ10% of all parameters.
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
Training the Model
Train the model. If you want to reduce VRAM usage, you can adjust the batch size.
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 100,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()
Testing the Model
Let's check if it has become the 'nagging LLM' we wanted.
from transformers import StoppingCriteria, StoppingCriteriaList
class StopOnToken(StoppingCriteria):
def __init__(self, stop_token_id):
self.stop_token_id = stop_token_id # Initialize the stop token ID.
def __call__(self, input_ids, scores, **kwargs):
return (
self.stop_token_id in input_ids[0]
) # Stop if the stop token ID is present in the input IDs.
from transformers import TextStreamer
# Set inference speed to be twice as fast using FastLanguageModel.
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
alpaca_prompt.format(
"์ด๋, ์ฝ๋ฉ, ๊ณผ์ ", # Instruction
"", # Output - leave this blank for generation!
)
],
return_tensors="pt",
).to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(
**inputs,
streamer=text_streamer,
max_new_tokens=4096, # Set the maximum number of tokens to generate.
stopping_criteria=stopping_criteria # Set the criteria to stop generation.
)
The detailed results are as follows.
Save the merged model
base_model = "unsloth/gemma-2-2b" # Base model to be merged.
huggingface_token = "" # HuggingFace token.
huggingface_repo = "gemma2-2b-M.O.M-gemma-sprint" # Repository to upload the model.
save_method = (
"merged_16bit" # Options: "merged_4bit", "merged_4bit_forced", "merged_16bit", "lora".
)
model.save_pretrained_merged(
base_model,
tokenizer,
save_method=save_method, # Set the save method to 16-bit merged.
)
Push the merged model to the Hugging Face Hub
merged_model.push_to_hub("Hyeonseo/gemma2-2b-it-finetuned-ko-bias-detection_merged", safe_serialization=True)
# Upload to the Hub
model.push_to_hub_merged(
huggingface_repo,
tokenizer,
save_method=save_method,
token=huggingface_token,
)
Performance
Fine-tuned Model(gemma2-2b-M.O.M)