Edit model card

EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval

About the Model

This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.

The base model for this model is yanolja/EEVE-Korean-Instruct-10.8B-v1.0.

Prompt Template

์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜. 

### ์งˆ๋ฌธ: 
{question}

### ์ •๋ณด: 
{context}

### ํ‰๊ฐ€: 

How to Use it

import torch
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
)

model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)

prompt_template = '์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.\n์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.\n\n### ์งˆ๋ฌธ:\n{question}\n\n### ์ •๋ณด:\n{context}\n\n### ํ‰๊ฐ€:\n'
query = {
    "question": "๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?",
    "context": "์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค."
}

model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)

Example Output

์ฃผ์–ด์ง„ ์งˆ๋ฌธ๊ณผ ์ •๋ณด๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•œ ์ •๋ณด์ธ์ง€ ํ‰๊ฐ€ํ•ด์ค˜.
์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํ•œ์ง€๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด "์˜ˆ" ๋˜๋Š” "์•„๋‹ˆ์˜ค"๋กœ ๋‹ตํ•ด์ค˜.

### ์งˆ๋ฌธ:
๋™์•„๋ฆฌ ์ข…๊ฐ•์ดํšŒ๊ฐ€ ์–ธ์ œ์ธ๊ฐ€์š”?

### ์ •๋ณด:
์ข…๊ฐ•์ดํšŒ ๋‚ ์งœ๋Š” 6์›” 21์ผ์ž…๋‹ˆ๋‹ค.

### ํ‰๊ฐ€:
์˜ˆ<|end_of_text|>

Training Data

Metrics

Korean LLM Benchmark

Model Average Ko-ARC Ko-HellaSwag Ko-MMLU Ko-TruthfulQA Ko-CommonGen V2
EEVE-Korean-Instruct-10.8B-v1.0 56.08 55.2 66.11 56.48 49.14 53.48
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 56.1 55.55 65.95 56.24 48.66 54.07

Generated Dataset

Model Accuracy F1 Precision Recall
EEVE-Korean-Instruct-10.8B-v1.0 0.824 0.800 0.885 0.697
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval 0.892 0.875 0.903 0.848
Downloads last month
2,379
Safetensors
Model size
10.8B params
Tensor type
FP16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.