metadata

license: apache-2.0

Model Card for MediaTek Research Breeze-7B-FC-v1_0

🏆 Performance

Models	#Parameters	Organization	License	🧰 Function Calling?	💬 Instrustion Following?
Breeze-7B-Instruct-v1_0	7B	MediaTek Research	Apache 2.0	❌	✅
Breeze-7B-FC-v1_0	7B	MediaTek Research	Apache 2.0	✅	✅
Gorilla-OpenFunctions-v2	7B	Gorilla LLM	Apache 2.0	✅	❌
GPT-3.5-Turbo-0125		OpenAI	Proprietary	✅	✅

Evaluate function calling on EN benchmark

Berkeley function-calling leaderboard

Models	↑ Overall	Irrelevance Detection	AST/ Simple	AST/ Multiple	AST/ Parallel	AST/ Parallel-Multiple	Exec/ Simple	Exec/ Multiple	Exec/ Parallel	Exec/ Parallel-Multiple
Breeze-7B-FC-v1_0 (FC)	86.01	74.58	90.00	93.00	82.00	83.00	98.00	92.00	88.00	75.00
Gorilla-OpenFunctions-v2 (FC)	85.95	60.00	94.25	95.50	86.50	86.00	97.00	96.00	80.00	75.00
GPT-3.5-Turbo-0125 (FC)	72.77	4.58	87.75	90.50	88.50	82.50	91.00	82.00	78.00	52.50

Evaluate function calling on ZHTW benchmark

function-calling-leaderboard-for-zhtw

Models	↑ Overall	Irrelevance Detection	AST/ Simple	AST/ Multiple	AST/ Parallel	AST/ Parallel-Multiple	Exec/ Simple	Exec/ Multiple	Exec/ Parallel	Exec/ Parallel-Multiple
Breeze-7B-FC-v1_0 (FC)	77.70	71.67	82.00	86.50	76.00	65.50	87.00	88.00	80.00	57.50
Gorilla-OpenFunctions-v2 (FC)	75.68	53.75	84.75	86.50	72.50	68.00	92.00	92.00	62.00	72.50
GPT-3.5-Turbo-0125 (FC)	66.15	7.50	83.75	83.50	73.00	65.50	88.00	84.00	72.00	40.00

Evaluate instrustion following on EN benchmark

MT-Bench

	Win	Tie	Lose
Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0	25 (15.6%)	72 (45.0%)	63 (39.4%)

Evaluate instrustion following on ZHTW benchmark

MT-Bench-TC

	Win	Tie	Lose
Breeze-7B-FC-v1_0 v.s. Breeze-7B-Instruct-v1_0	36 (22.5%)	81 (50.6%)	43 (26.9%)

👩‍💻 How to use

Dependiency

Install mtkresearch package

git clone https://github.com/mtkresearch/mtkresearch.git
cd mtkresearch
pip install -e .

Hosting by VLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model='MediaTek-Research/Breeze-7B-FC-v1_0',
    tensor_parallel_size=num_gpu, # number of gpus
    gpu_memory_utilization=0.7
)

instance_end_token_id = llm.get_tokenizer().convert_token_to_ids('<|im_end|>')
params = SamplingParams(
    temperature=0.01,
    top_p=0.01,
    max_tokens=4096,
    repetition_penalty=1.1,
    stop_token_ids=[instance_end_token_id]
)

def _inference(prompt, llm, params):
    return llm.generate(prompt, params)[0].outputs[0].text

Instruction following

from mtkresearch.llm.prompt import MRPromptV2

sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

prompt_engine = MRPromptV2()

conversations = [
    {"role": "system", "content": sys_prompt},
    {"role": "user", "content": "請問什麼是深度學習？"},
]

prompt = prompt_engine.get_prompt(conversations)


output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result) #

Function Calling

from mtkresearch.llm.prompt import MRPromptV2

sys_prompt = 'You are a helpful AI assistant built by MediaTek Research. The user you are helping speaks Traditional Chinese and comes from Taiwan.'

functions = [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
]

prompt_engine = MRPromptV2()

# stage 1: query
conversations = [
    {"role": "user", "content": "台北目前溫度是攝氏幾度？"},
]

prompt = prompt_engine.get_prompt(conversations, functions=functions)

output_str = _inference(prompt, llm, params)
result = prompt_engine.parse_generated_str(output_str)

print(result) #

# stage 2: execute called functions

# stage 3: put executed results