aberrio commited on
Commit
acf1cdf
1 Parent(s): 878afd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -1
README.md CHANGED
@@ -12,4 +12,93 @@ tags:
12
  - nlp
13
  - code
14
  - gguf
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - nlp
13
  - code
14
  - gguf
15
+ - causal-lm
16
+ - instruction-tuned
17
+ - direct-preference-optimization
18
+ - synthetic-data
19
+ ---
20
+
21
+ # StableLM 2 Zephyr 1.6B
22
+
23
+ ## Model Details
24
+
25
+ **Model Name**: Zephyr 1.6B (GGUF Format)
26
+
27
+ **Quantization Options**:
28
+ - `F16` (16-bit float)
29
+ - `Q8_0` (8-bit integer)
30
+
31
+ This repository hosts quantized versions of the StabilityAI `Zephyr 1.6B` model for efficient inference using the `llama.cpp` library. The quantized models have been optimized for both performance and memory usage, making them suitable for a variety of platforms, including constrained hardware setups like ARM64 and low-memory x86 machines.
32
+
33
+ ## Core Libraries
34
+
35
+ - **Core Library**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
36
+ - **Model Format**: GGUF (f16 and q8)
37
+ - **Original Model Source**: [stabilityai/stablelm-2-zephyr-1_6b](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b)
38
+
39
+ The original `Zephyr 1.6B` model has been adapted for `llama.cpp` with `gguf` quantization, providing seamless integration with a wide range of inference tools.
40
+
41
+ ## Quantized Model Files
42
+
43
+ | Format | File Name | Size | Description |
44
+ |--------|-----------|--------|-----------------------------------------------------|
45
+ | F16 | `ggml-model-f16.gguf` | ~3.2 GB | 16-bit float precision for balanced speed and accuracy |
46
+ | Q8_0 | `ggml-model-q8_0.gguf` | ~1.8 GB | 8-bit integer precision for reduced memory usage |
47
+
48
+ The `F16` format provides high precision and is ideal for scenarios where maintaining output quality is crucial. The `Q8_0` format significantly reduces the model size, making it suitable for deployments where memory is a key constraint.
49
+
50
+ ## Hardware Recommendations
51
+
52
+ | Format | Minimum RAM | Recommended GPU |
53
+ |--------|-------------|----------------|
54
+ | F16 | 8 GB | 16 GB (with offloading) |
55
+ | Q8_0 | 4 GB | 8 GB (CPU only recommended) |
56
+
57
+ For optimal performance, it is recommended to use GPU offloading for the `F16` format. The `Q8_0` variant works well on CPUs with low RAM requirements.
58
+
59
+ ## Usage Example
60
+
61
+ You can use the following commands to run the Zephyr 1.6B models using `llama.cpp`:
62
+
63
+ ### Running with `f16` Format
64
+
65
+ ```bash
66
+ ./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Write a Python function that computes the Fibonacci sequence."
67
+ ```
68
+
69
+ ### Running with `q8_0` Format
70
+
71
+ ```bash
72
+ ./main -m ggml-model-q8_0.gguf --n-predict -1 --prompt "Explain the concept of machine learning in simple terms."
73
+ ```
74
+
75
+ ### GPU Offloading (for `f16` Models)
76
+
77
+ To leverage GPU offloading for the `f16` model, you can use the following command:
78
+
79
+ ```bash
80
+ ./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Summarize the impact of quantum computing on cryptography." --n-gpu-layers 32
81
+ ```
82
+
83
+ ## Safety and Responsible Use
84
+
85
+ The Zephyr 1.6B model has been trained using a combination of instruction-tuning and synthetic data to enhance safety and ensure coherent responses. However, as with any large language model, there may be scenarios where outputs are not fully aligned with user expectations. It is recommended to always supervise outputs, particularly when used in sensitive applications.
86
+
87
+ For more details, refer to the original [model card](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b).
88
+
89
+ ## License
90
+
91
+ The quantized models in this repository are released under the `CC-BY-NC-SA-4.0` license.
92
+ For details, see the [license file](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b/blob/main/LICENSE).
93
+
94
+ ## Citation
95
+
96
+ If you use the Zephyr 1.6B models in your research or applications, please cite the original authors:
97
+
98
+ ```
99
+ @article{stabilitylm2023,
100
+ title={StableLM 2: Zephyr 1.6B},
101
+ author={Stability AI},
102
+ year={2023}
103
+ }
104
+ ```