YC-Chen commited on
Commit
d26125e
•
1 Parent(s): c9b58fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -17,7 +17,7 @@ license: apache-2.0
17
 
18
  **Evaluate function calling on EN benchmark**
19
 
20
- [Berkeley function-calling leaderboard](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html)
21
 
22
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
23
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
@@ -31,7 +31,7 @@ license: apache-2.0
31
 
32
  **Evaluate function calling on ZHTW benchmark**
33
 
34
- [function-calling-leaderboard-for-zhtw](https://github.com/mtkresearch/function-calling-leaderboard-for-zhtw)
35
 
36
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
37
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
@@ -46,7 +46,7 @@ license: apache-2.0
46
 
47
  **Evaluate instrustion following on EN benchmark**
48
 
49
- MT-Bench
50
 
51
  | | Win | Tie | Lose |
52
  |---|---|---|---|
@@ -55,7 +55,7 @@ MT-Bench
55
 
56
  **Evaluate instrustion following on ZHTW benchmark**
57
 
58
- MT-Bench-TC
59
 
60
  | | Win | Tie | Lose |
61
  |---|---|---|---|
 
17
 
18
  **Evaluate function calling on EN benchmark**
19
 
20
+ We evaluate the performance of function calling on English with benchmark [Berkeley function-calling leaderboard](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html).
21
 
22
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
23
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
 
31
 
32
  **Evaluate function calling on ZHTW benchmark**
33
 
34
+ We evaluate the performance of function calling on Traditional Chinese with benchmark [function-calling-leaderboard-for-zhtw](https://github.com/mtkresearch/function-calling-leaderboard-for-zhtw).
35
 
36
  | Models | ↑ Overall | Irrelevance<br/>Detection | AST/<br/>Simple | AST/<br/>Multiple | AST/<br/>Parallel | AST/<br/>Parallel-Multiple | Exec/<br/>Simple | Exec/<br/>Multiple | Exec/<br/>Parallel | Exec/<br/>Parallel-Multiple |
37
  |-----------------------------------|----------|---------------------|------------|--------------|--------------|------------------------|--------------|---------------------|---------------------|-------------------------------|
 
46
 
47
  **Evaluate instrustion following on EN benchmark**
48
 
49
+ We evaluate the performance of instruction following on English with benchmark [MT-Bench](https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/README.md).
50
 
51
  | | Win | Tie | Lose |
52
  |---|---|---|---|
 
55
 
56
  **Evaluate instrustion following on ZHTW benchmark**
57
 
58
+ We evaluate the performance of instruction following on Traditional Chinese with benchmark [MT-Bench-TC](https://github.com/mtkresearch/TCEval).
59
 
60
  | | Win | Tie | Lose |
61
  |---|---|---|---|