jburtoft commited on
Commit
3b9c8b0
1 Parent(s): e4a13b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md CHANGED
@@ -1,3 +1,103 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - pytorch
9
+ - inferentia2
10
+ - neuron
11
  ---
12
+ # Neuronx model for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)
13
+
14
+ This repository contains [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoints for [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
15
+ You can find detailed information about the base model on its [Model Card](https://huggingface.co/upstage/SOLAR-10.7B-v1.0).
16
+
17
+ This model card also includes instructions for how to compile other SOLAR models with other settings if this combination isn't quite what you are looking for.
18
+
19
+ This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
20
+
21
+ It has been compiled to run on an inf2.24xlarge instance on AWS. Note that while the inf2.24xlarge has 12 cores, this compilation is only use 8. For this model and configuration, the cores have to be a power of 2.
22
+
23
+ **This has been compiled using version 2.16 of the Neuron SDK. Make sure your environment has version 2.16 installed**
24
+
25
+ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
26
+
27
+ ## Set up the environment
28
+
29
+ First, use the [DLAMI image from Hugging Face](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). It has most of the utilities and drivers preinstalled. However, you may need to update to version 2.16 to use these binaries.
30
+
31
+ ```
32
+ sudo apt-get update -y \
33
+ && sudo apt-get install -y --no-install-recommends \
34
+ aws-neuronx-dkms=2.15.9.0 \
35
+ aws-neuronx-collectives=2.19.7.0-530fb3064 \
36
+ aws-neuronx-runtime-lib=2.19.5.0-97e2d271b \
37
+ aws-neuronx-tools=2.16.1.0
38
+
39
+ pip3 install --upgrade \
40
+ neuronx-cc==2.12.54.0 \
41
+ torch-neuronx==1.13.1.1.13.0 \
42
+ transformers-neuronx==0.9.474 \
43
+ --extra-index-url=https://pip.repos.neuron.amazonaws.com
44
+
45
+ pip3 install git+https://github.com/huggingface/optimum-neuron.git
46
+ ```
47
+ ## Running inference from this repository
48
+
49
+
50
+ ```
51
+ from optimum.neuron import pipeline
52
+ p = pipeline('text-generation', 'jburtoft/SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096')
53
+ p("Hi, my name is ",
54
+ do_sample=True,
55
+ top_k=10,
56
+ temperature=0.1,
57
+ top_p=0.95,
58
+ num_return_sequences=1,
59
+ max_length=200,
60
+ )
61
+ ```
62
+ sample output:
63
+ ```
64
+ Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
65
+ 2024-Jan-13 04:48:45.0857 15117:15313 [6] nccl_net_ofi_init:1415 CCOM WARN NET/OFI aws-ofi-nccl initialization failed
66
+ 2024-Jan-13 04:48:45.0857 15117:15313 [6] init.cc:137 CCOM WARN OFI plugin initNet() failed is EFA enabled?
67
+ [{'generated_text': 'Hi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if you have any questions about your ***** ***** account.\nHi, my name is ***** ***** I am calling from ***** ***** and I am calling to see if'}]
68
+ ```
69
+
70
+ ##Compiling for different instances or settings
71
+
72
+ If this repository doesn't have the exact version or settings, you can compile your own.
73
+
74
+ from optimum.neuron import NeuronModelForCausalLM
75
+ #num_cores should be changed based on the instance. inf2.24xlarge has 6 neuron processors (they have two cores each) so 12 total
76
+ input_shapes = {"batch_size": 1, "sequence_length": 4096}
77
+ compiler_args = {"num_cores": 8, "auto_cast_type": 'fp16'}
78
+ model = NeuronModelForCausalLM.from_pretrained("SOLAR-10.7B-v1.0", export=True, **compiler_args, **input_shapes)
79
+ model.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
80
+
81
+ from transformers import AutoTokenizer
82
+ tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-10.7B-v1.0")
83
+ tokenizer.save_pretrained("SOLAR-10.7B-v1.0-neuron-24xlarge-2.16-8core-4096")
84
+
85
+ This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
86
+
87
+ ## Arguments passed during export
88
+
89
+ **input_shapes**
90
+ ```json
91
+ {
92
+ "batch_size": 1,
93
+ "sequence_length": 4096,
94
+ }
95
+ ```
96
+ **compiler_args**
97
+
98
+ ```json
99
+ {
100
+ "auto_cast_type": "fp16",
101
+ "num_cores": 8,
102
+ }
103
+ ```