royleibov
/

granite-7b-instruct-ZipNN-Compressed

@@ -11,6 +11,25 @@ language:
 - en
 base_model: ibm/granite-7b-base
 ---
 # Model Card for Granite-7b-lab [Paper](https://arxiv.org/abs/2403.01081)
 ### Overview
@@ -78,39 +97,29 @@ Importantly, we use a set of hyper-parameters for training that are very differe
 - **Base model:** [ibm/granite-7b-base](https://huggingface.co/ibm/granite-7b-base)
 - **Teacher Model:** [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
-## Usage
-This fork is compressed using ZipNN. To use the model, decompress the model tensors as discribed below and load the local weights.
-You need to [clone this repository](https://huggingface.co/royleibov/granite-7b-instruct-ZipNN-Compressed?clone=true) to decompress the model.
-Then:
-```bash
-cd granite-7b-instruct-ZipNN-Compressed
-```
-First decompress the model weights:
-```bash
-python3 zipnn_decompress_path.py --path .
-```
-Now just run the local version of the model.
 ### Use a pipeline as a high-level helper
 ```python
 from transformers import pipeline
 messages = [
     {"role": "user", "content": "Who are you?"},
 ]
-pipe = pipeline("text-generation", model="PATH_TO_MODEL") # "." if in directory
 pipe(messages)
 ```
 ### Load model directly
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("PATH_TO_MODEL") # "." if in directory
-model = AutoModelForCausalLM.from_pretrained("PATH_TO_MODEL") # "." if in directory
 ```
 ## Prompt Template

 - en
 base_model: ibm/granite-7b-base
 ---
+# Disclaimer and Requirements
+This model is a clone of [ibm-granite/granite-7b-instruct](https://huggingface.co/ibm-granite/granite-7b-instruct) compressed using ZipNN. Compressed losslessly to 67% its original size, ZipNN saved ~5GB in storage and potentially ~30TB in data transfer **monthly**.
+## Requirement
+In order to use the model, ZipNN is necessary:
+```bash
+pip install zipnn
+```
+Then simply add at the beginning of the file
+```python
+from zipnn import zipnn_hf_patch
+zipnn_hf_patch()
+```
+And continue as usual. The patch will take care of decompressing the model correctly and safely.
 # Model Card for Granite-7b-lab [Paper](https://arxiv.org/abs/2403.01081)
 ### Overview
 - **Base model:** [ibm/granite-7b-base](https://huggingface.co/ibm/granite-7b-base)
 - **Teacher Model:** [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
 ### Use a pipeline as a high-level helper
 ```python
 from transformers import pipeline
+from zipnn import zipnn_hf_patch
+zipnn_hf_patch()
 messages = [
     {"role": "user", "content": "Who are you?"},
 ]
+pipe = pipeline("text-generation", model="royleibov/granite-7b-instruct-ZipNN-Compressed")
 pipe(messages)
 ```
 ### Load model directly
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+from zipnn import zipnn_hf_patch
+zipnn_hf_patch()
+tokenizer = AutoTokenizer.from_pretrained("royleibov/granite-7b-instruct-ZipNN-Compressed")
+model = AutoModelForCausalLM.from_pretrained("royleibov/granite-7b-instruct-ZipNN-Compressed")
 ```
 ## Prompt Template