Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ tags:
|
|
14 |
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
15 |
You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
16 |
|
17 |
-
This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't
|
18 |
|
19 |
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
|
20 |
|
@@ -26,7 +26,10 @@ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co
|
|
26 |
|
27 |
## Set up the environment
|
28 |
|
29 |
-
First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled
|
|
|
|
|
|
|
30 |
|
31 |
```
|
32 |
sudo apt-get update -y \
|
@@ -67,10 +70,11 @@ Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
|
|
67 |
[{'generated_text': 'Hi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if'}]
|
68 |
```
|
69 |
|
70 |
-
##Compiling for different instances or settings
|
71 |
|
72 |
If this repository doesn't have the exact version or settings, you can compile your own.
|
73 |
|
|
|
74 |
from optimum.neuron import NeuronModelForCausalLM
|
75 |
#num_cores should be changed based on the instance. inf2.24xlarge has 6 neuron processors (they have two cores each) so 12 total
|
76 |
input_shapes = {"batch_size": 1, "sequence_length": 4096}
|
@@ -81,6 +85,7 @@ model.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
|
|
81 |
from transformers import AutoTokenizer
|
82 |
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-v1.0")
|
83 |
tokenizer.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
|
|
|
84 |
|
85 |
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
|
86 |
|
|
|
14 |
This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
15 |
You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
|
16 |
|
17 |
+
This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't what you are looking for.
|
18 |
|
19 |
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
|
20 |
|
|
|
26 |
|
27 |
## Set up the environment
|
28 |
|
29 |
+
First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled, but hasn't been updated to 2.16 as of 1/13/24.
|
30 |
+
However, you will need version 2.16 to use these binaries. 2.16 shows a significant performance increase over 2.15 for Llama based models.
|
31 |
+
|
32 |
+
The commands below will update your 2.15 libraries to 2.16.
|
33 |
|
34 |
```
|
35 |
sudo apt-get update -y \
|
|
|
70 |
[{'generated_text': 'Hi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if'}]
|
71 |
```
|
72 |
|
73 |
+
## Compiling for different instances or settings
|
74 |
|
75 |
If this repository doesn't have the exact version or settings, you can compile your own.
|
76 |
|
77 |
+
```
|
78 |
from optimum.neuron import NeuronModelForCausalLM
|
79 |
#num_cores should be changed based on the instance. inf2.24xlarge has 6 neuron processors (they have two cores each) so 12 total
|
80 |
input_shapes = {"batch_size": 1, "sequence_length": 4096}
|
|
|
85 |
from transformers import AutoTokenizer
|
86 |
tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-v1.0")
|
87 |
tokenizer.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
|
88 |
+
```
|
89 |
|
90 |
This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
|
91 |
|