teleprint-me
/

stablelm-2-zephyr-1_6b

   - nlp
   - code
   - gguf
+  - causal-lm
+  - instruction-tuned
+  - direct-preference-optimization
+  - synthetic-data
+---
+# StableLM 2 Zephyr 1.6B
+## Model Details
+**Model Name**: Zephyr 1.6B (GGUF Format)
+**Quantization Options**:
+- `F16` (16-bit float)
+- `Q8_0` (8-bit integer)
+This repository hosts quantized versions of the StabilityAI `Zephyr 1.6B` model for efficient inference using the `llama.cpp` library. The quantized models have been optimized for both performance and memory usage, making them suitable for a variety of platforms, including constrained hardware setups like ARM64 and low-memory x86 machines.
+## Core Libraries
+- **Core Library**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
+- **Model Format**: GGUF (f16 and q8)
+- **Original Model Source**: [stabilityai/stablelm-2-zephyr-1_6b](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b)
+The original `Zephyr 1.6B` model has been adapted for `llama.cpp` with `gguf` quantization, providing seamless integration with a wide range of inference tools.
+## Quantized Model Files
+| Format | File Name | Size   | Description                                         |
+|--------|-----------|--------|-----------------------------------------------------|
+| F16    | `ggml-model-f16.gguf` | ~3.2 GB | 16-bit float precision for balanced speed and accuracy |
+| Q8_0   | `ggml-model-q8_0.gguf` | ~1.8 GB | 8-bit integer precision for reduced memory usage       |
+The `F16` format provides high precision and is ideal for scenarios where maintaining output quality is crucial. The `Q8_0` format significantly reduces the model size, making it suitable for deployments where memory is a key constraint.
+## Hardware Recommendations
+| Format | Minimum RAM | Recommended GPU |
+|--------|-------------|----------------|
+| F16    | 8 GB        | 16 GB (with offloading) |
+| Q8_0   | 4 GB        | 8 GB  (CPU only recommended) |
+For optimal performance, it is recommended to use GPU offloading for the `F16` format. The `Q8_0` variant works well on CPUs with low RAM requirements.
+## Usage Example
+You can use the following commands to run the Zephyr 1.6B models using `llama.cpp`:
+### Running with `f16` Format
+```bash
+./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Write a Python function that computes the Fibonacci sequence."
+```
+### Running with `q8_0` Format
+```bash
+./main -m ggml-model-q8_0.gguf --n-predict -1 --prompt "Explain the concept of machine learning in simple terms."
+```
+### GPU Offloading (for `f16` Models)
+To leverage GPU offloading for the `f16` model, you can use the following command:
+```bash
+./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Summarize the impact of quantum computing on cryptography." --n-gpu-layers 32
+```
+## Safety and Responsible Use
+The Zephyr 1.6B model has been trained using a combination of instruction-tuning and synthetic data to enhance safety and ensure coherent responses. However, as with any large language model, there may be scenarios where outputs are not fully aligned with user expectations. It is recommended to always supervise outputs, particularly when used in sensitive applications.
+For more details, refer to the original [model card](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b).
+## License
+The quantized models in this repository are released under the `CC-BY-NC-SA-4.0` license.
+For details, see the [license file](https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b/blob/main/LICENSE).
+## Citation
+If you use the Zephyr 1.6B models in your research or applications, please cite the original authors:
+```
+@article{stabilitylm2023,
+  title={StableLM 2: Zephyr 1.6B},
+  author={Stability AI},
+  year={2023}
+}
+```