File size: 8,563 Bytes
855c4f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
Quantization made by Richard Erkhov.
[Github](https://github.com/RichardErkhov)
[Discord](https://discord.gg/pvy7H8DZMG)
[Request more models](https://github.com/RichardErkhov/quant_request)
Mistral-10.7B-Instruct-v0.3-depth-upscaling - GGUF
- Model creator: https://huggingface.co/giannisan/
- Original model: https://huggingface.co/giannisan/Mistral-10.7B-Instruct-v0.3-depth-upscaling/
| Name | Quant method | Size |
| ---- | ---- | ---- |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q2_K.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q2_K.gguf) | Q2_K | 3.73GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_XS.gguf) | IQ3_XS | 4.14GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_S.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_S.gguf) | IQ3_S | 4.37GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_S.gguf) | Q3_K_S | 4.35GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_M.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ3_M.gguf) | IQ3_M | 4.52GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K.gguf) | Q3_K | 4.84GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_M.gguf) | Q3_K_M | 4.84GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q3_K_L.gguf) | Q3_K_L | 5.27GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ4_XS.gguf) | IQ4_XS | 5.43GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_0.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_0.gguf) | Q4_0 | 5.66GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.IQ4_NL.gguf) | IQ4_NL | 5.72GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K_S.gguf) | Q4_K_S | 5.7GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K.gguf) | Q4_K | 6.02GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_K_M.gguf) | Q4_K_M | 6.02GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_1.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q4_1.gguf) | Q4_1 | 6.28GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_0.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_0.gguf) | Q5_0 | 6.89GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K_S.gguf) | Q5_K_S | 6.89GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K.gguf) | Q5_K | 7.08GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_K_M.gguf) | Q5_K_M | 7.08GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_1.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q5_1.gguf) | Q5_1 | 7.51GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q6_K.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q6_K.gguf) | Q6_K | 8.21GB |
| [Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q8_0.gguf](https://huggingface.co/RichardErkhov/giannisan_-_Mistral-10.7B-Instruct-v0.3-depth-upscaling-gguf/blob/main/Mistral-10.7B-Instruct-v0.3-depth-upscaling.Q8_0.gguf) | Q8_0 | 10.63GB |
Original model description:
---
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
library_name: transformers
license: apache-2.0
language:
- en
---
# mistral-7b-instruct-v0.3-depth-upscaling
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/qwYq9q2PpTfYwb1nsym9u.webp)Mistral:
a strong, cold northwesterly wind that blows through the Rhône valley and southern France into the Mediterranean, mainly in winter.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/643eab4f05a395e2b1c727e3/elcrExK_Q5MQjcdAjYi9V.png)
This is an attempt at depth upscaling , Based on the paper [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166), which is a technique designed to efficiently scale large language models. The process begins with structural depthwise scaling which may initially reduce performance, but this is rapidly restored during a crucial continued pretraining phase. This phase optimizes the expanded model's parameters to the new depth configuration, significantly enhancing performance.
It's important to note that this represents only the initial phase of the model's development. The next critical steps involve fine-tuning. As expected and according to the paper, the initial upscaled model in phase one (without fine-tuning) scores lower than the base model. This is expected to improve above and beyond this after fine-tuning is finished.
Feel free to fine-tune on your own dataset.
## Merge Details
### Merge Method
This model was merged using the passthrough merge method. The first 24 layers of one copy of the model are stitched to the last 24 layers of another copy, resulting in a total of 48 layers with 10.7B parameters.
### Models Merged
The following models were included in the merge:
* [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) merged with itself.
### Configuration
The following configuration was used to produce this model:
```yaml
slices:
- sources:
- model: mistralai/Mistral-7B-Instruct-v0.3
layer_range: [0, 24]
- sources:
- model: mistralai/Mistral-7B-Instruct-v0.3
layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
```
Eval results:
| Metric | Value |
|----------------------|-------|
| **Avg.** | 64.04 |
| **ARC (25-shot)** | 63.14 |
| **HellaSwag (10-shot)** | 83.29 |
| **MMLU (5-shot)** | 62.31 |
| **TruthfulQA (0-shot)** | 60.65 |
| **Winogrande (5-shot)** | 78.45 |
| **GSM8K (5-shot)** | 36.39 |
Full results [here](https://huggingface.co/datasets/open-llm-leaderboard/details_giannisan__Mistral-10.7B-Instruct-v0.3-depth-upscaling/blob/main/results_2024-05-30T06-01-17.134852.json)
|