EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval
About the Model
This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.
The base model for this model is yanolja/EEVE-Korean-Instruct-10.8B-v1.0.
Prompt Template
์ฃผ์ด์ง ์ง๋ฌธ๊ณผ ์ ๋ณด๊ฐ ์ฃผ์ด์ก์ ๋ ์ง๋ฌธ์ ๋ตํ๊ธฐ์ ์ถฉ๋ถํ ์ ๋ณด์ธ์ง ํ๊ฐํด์ค.
์ ๋ณด๊ฐ ์ถฉ๋ถํ์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด "์" ๋๋ "์๋์ค"๋ก ๋ตํด์ค.
### ์ง๋ฌธ:
{question}
### ์ ๋ณด:
{context}
### ํ๊ฐ:
How to Use it
import torch
from transformers import (
BitsAndBytesConfig,
AutoModelForCausalLM,
AutoTokenizer,
)
model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)
prompt_template = '์ฃผ์ด์ง ์ง๋ฌธ๊ณผ ์ ๋ณด๊ฐ ์ฃผ์ด์ก์ ๋ ์ง๋ฌธ์ ๋ตํ๊ธฐ์ ์ถฉ๋ถํ ์ ๋ณด์ธ์ง ํ๊ฐํด์ค.\n์ ๋ณด๊ฐ ์ถฉ๋ถํ์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด "์" ๋๋ "์๋์ค"๋ก ๋ตํด์ค.\n\n### ์ง๋ฌธ:\n{question}\n\n### ์ ๋ณด:\n{context}\n\n### ํ๊ฐ:\n'
query = {
"question": "๋์๋ฆฌ ์ข
๊ฐ์ดํ๊ฐ ์ธ์ ์ธ๊ฐ์?",
"context": "์ข
๊ฐ์ดํ ๋ ์ง๋ 6์ 21์ผ์
๋๋ค."
}
model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)
Example Output
์ฃผ์ด์ง ์ง๋ฌธ๊ณผ ์ ๋ณด๊ฐ ์ฃผ์ด์ก์ ๋ ์ง๋ฌธ์ ๋ตํ๊ธฐ์ ์ถฉ๋ถํ ์ ๋ณด์ธ์ง ํ๊ฐํด์ค.
์ ๋ณด๊ฐ ์ถฉ๋ถํ์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด "์" ๋๋ "์๋์ค"๋ก ๋ตํด์ค.
### ์ง๋ฌธ:
๋์๋ฆฌ ์ข
๊ฐ์ดํ๊ฐ ์ธ์ ์ธ๊ฐ์?
### ์ ๋ณด:
์ข
๊ฐ์ดํ ๋ ์ง๋ 6์ 21์ผ์
๋๋ค.
### ํ๊ฐ:
์<|end_of_text|>
Training Data
- Referenced generated_instruction by stanford_alpaca
- use yanolja/EEVE-Korean-Instruct-10.8B-v1.0 as the model for question generation.
Metrics
Korean LLM Benchmark
Model | Average | Ko-ARC | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2 |
---|---|---|---|---|---|---|
EEVE-Korean-Instruct-10.8B-v1.0 | 56.08 | 55.2 | 66.11 | 56.48 | 49.14 | 53.48 |
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 56.1 | 55.55 | 65.95 | 56.24 | 48.66 | 54.07 |
Generated Dataset
Model | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|
EEVE-Korean-Instruct-10.8B-v1.0 | 0.824 | 0.800 | 0.885 | 0.697 |
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 0.892 | 0.875 | 0.903 | 0.848 |
- Downloads last month
- 2,379
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.