File size: 4,828 Bytes
14bbd09
753852b
 
 
 
 
 
 
 
 
 
 
 
74cc608
753852b
 
 
 
 
 
 
 
 
 
 
43dea8e
753852b
14bbd09
753852b
 
 
9828fac
 
 
 
1288051
 
4f21684
753852b
 
 
 
 
 
 
 
 
1862e0a
 
753852b
1862e0a
 
 
 
 
 
 
 
 
9828fac
 
 
 
 
43dea8e
 
 
 
 
 
 
 
9828fac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43dea8e
 
 
 
 
 
 
 
 
 
 
9828fac
43dea8e
9828fac
 
43dea8e
 
 
 
 
 
 
9828fac
 
 
753852b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f21684
753852b
 
 
 
4f21684
 
 
 
 
 
 
 
163653f
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
language:
- en
library_name: transformers
pipeline_tag: text-generation
datasets:
- jondurbin/airoboros-2.2
- Open-Orca/OpenOrca
- garage-bAInd/Open-Platypus
- WizardLM/WizardLM_evol_instruct_V2_196k
- TokenBender/python_eval_instruct_51k
tags:
- code
license: apache-2.0
model-index:
- name: SpeechlessCoder
  results:
  - task:
      type: text-generation
    dataset:
      type: openai_humaneval
      name: HumanEval
    metrics:
    - name: pass@1
      type: pass@1
      value: 51.21951219512195
      verified: false
---

<p><h1> speechless-code-mistral-7b-v1.0  </h1></p>

* [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-AWQ)
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-GPTQ)
* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/speechless-code-mistral-7B-v1.0-GGUF)

Code: https://github.com/uukuguy/speechless

Use the following dataset to fine-tune mistralai/Mistral-7B-v0.1 in order to improve the model's reasoning and planning abilities.

Total 201,981 samples.
- jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 23,462 samples.
- Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 74,440 samples.
- garage-bAInd/Open-Platypus: 100%, 24,926 samples.
- WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,185 samples
- TokenBender/python_eval_instruct_51k: “python” in output .40,309 samples
- Spider: 8,659 samples

## How to Prompt the Model
This model accepts the Alpaca instruction format.

For example:
```
You are an intelligent programming assistant.

### Instruction:
Implement a linked list in C++

### Response:
```

## HumanEval

| Metric | Value |
| --- | --- |
| humaneval-python | 51.21951219512195|

## Big Code Evaluation

| | Humaneval | Java | Javascript | CPP | Php | Rust | Swift | R | Lua | D | Racket | Julia |
| ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| pass@1 |  0.4260 | 0.3165 | 0.4241 | 0.3467 | 0.3548 | 0.2454 | 0.0000 | 0.1735 | 0.2942 | 0.1087 | 0.0000 | 0.3081 |
| pass@10 | 0.5784 | 0.4506 | 0.5891 | 0.4845 | 0.4997 | 0.3858 | 0.0000 | 0.2516 | 0.4126 | 0.2018 | 0.0000 | 0.4427 |

[Big Code Models Leaderboard](https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard)

CodeLlama-34B-Python: 53.29

CodeLlama-34B-Instruct: 50.79

CodeLlama-13B-Instruct: 50.6

CodeLlama-34B: 45.11

CodeLlama-13B-Python: 42.89

CodeLlama-13B: 35.07

## lm-evaluation-harness

```json
{'ARC (acc_norm)': 0.6109215017064846,
'HellaSwag (acc_norm)': 0.8358892650866361,
'MMLU (acc)': 0.6325456394049195,
'TruthfulQA (mc2)': 0.4746745250371087,
'Winoground (acc)': 0.7829518547750592,
'GSM8K (acc)': 0.467778620166793,
'DROP (f1)': 0.49585675335570545,
'Open LLM Score': 0.61437428571428571}
```

[Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

| Metric | Value |
| --- | --- |
| ARC |60.58 |
| HellaSwag |83.47 |
| MMLU | 62.98 |
| TruthfulQA | 47.9 |
| Winoground | 78.69 |
| GSM8K | 19.18 |
| Average | 58.85 |

## Parameters

| | |
|------ | ------ |
| lr | 2e-4 |
| lr_scheduler_type | cosine |
| weight_decay | 0.0 |
| optim | paged_adamw_8bit |
| flash_attention | True |
| rerope | False |
| max_new_tokens | 4096 |
| num_train_epochs | 2 |
| bits | 4 |
| lora_r | 64 |
| lora_alpha | 16 |
| lora_dropout | 0.05 |
| double_quant | True |
| quant_type | nf4 |
| dataset_format | airoboros |
| mini_batch_size | 2 |
| grandient_accumulation_steps | 32 |
| bf16 | True |

A40-48G x 2

| | |
|------ | ------ |
| epoch                    |                2.0 |
| etrain_loss               |             0.5 |
| etrain_runtime            | 1 day, 10:25:26.77 |
| etrain_samples_per_second |              3.194 |
| etrain_steps_per_second   |              0.025 |
| eeval_loss               |     0.5146 |
| eeval_runtime            | 0:00:25.04 |
| eeval_samples_per_second |      7.985 |
| eeval_steps_per_second   |       |

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_uukuguy__speechless-code-mistral-7b-v1.0)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 53.47   |
| ARC (25-shot)         | 60.58          |
| HellaSwag (10-shot)   | 83.75    |
| MMLU (5-shot)         | 62.98         |
| TruthfulQA (0-shot)   | 47.9   |
| Winogrande (5-shot)   | 78.69   |
| GSM8K (5-shot)        | 19.18        |
| DROP (3-shot)         | 21.19         |