yinsong1986
commited on
Commit
•
114c6dc
1
Parent(s):
52d2d5a
Update README.md
Browse files
README.md
CHANGED
@@ -81,7 +81,7 @@ there were some limitations on its performance on longer context. Motivated by i
|
|
81 |
- **Contact:** [GitHub issues](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/issues)
|
82 |
- **Inference Code** [Github Repo](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/)
|
83 |
|
84 |
-
## How to Use
|
85 |
|
86 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/huggingface-transformers/example_usage.ipynb).
|
87 |
|
@@ -132,7 +132,7 @@ for seq in sequences:
|
|
132 |
<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>
|
133 |
```
|
134 |
|
135 |
-
## How to Serve
|
136 |
**Important:**
|
137 |
- For an end-to-end example Jupyter notebook using the native TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi/example_usage.ipynb).
|
138 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi-custom/example_usage.ipynb).
|
@@ -199,7 +199,7 @@ result = invoke_tgi(prompt)
|
|
199 |
**Important** - When using MistralLite for inference for the first time, it may require a brief 'warm-up' period that can take 10s of seconds. However, subsequent inferences should be faster and return results in a more timely manner. This warm-up period is normal and should not affect the overall performance of the system once the initialisation period has been completed.
|
200 |
|
201 |
|
202 |
-
## How to Deploy
|
203 |
**Important:**
|
204 |
- For an end-to-end example Jupyter notebook using the SageMaker built-in container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi/example_usage.ipynb).
|
205 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom docker container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi-custom/example_usage.ipynb).
|
@@ -307,7 +307,7 @@ print(result)
|
|
307 |
```
|
308 |
|
309 |
|
310 |
-
## How to Serve
|
311 |
Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
|
312 |
|
313 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/vllm/example_usage.ipynb).
|
|
|
81 |
- **Contact:** [GitHub issues](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/issues)
|
82 |
- **Inference Code** [Github Repo](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/)
|
83 |
|
84 |
+
## How to Use MistralLite from Python Code (HuggingFace transformers) ##
|
85 |
|
86 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/huggingface-transformers/example_usage.ipynb).
|
87 |
|
|
|
132 |
<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>
|
133 |
```
|
134 |
|
135 |
+
## How to Serve MistralLite on TGI ##
|
136 |
**Important:**
|
137 |
- For an end-to-end example Jupyter notebook using the native TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi/example_usage.ipynb).
|
138 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi-custom/example_usage.ipynb).
|
|
|
199 |
**Important** - When using MistralLite for inference for the first time, it may require a brief 'warm-up' period that can take 10s of seconds. However, subsequent inferences should be faster and return results in a more timely manner. This warm-up period is normal and should not affect the overall performance of the system once the initialisation period has been completed.
|
200 |
|
201 |
|
202 |
+
## How to Deploy MistralLite on Amazon SageMaker ##
|
203 |
**Important:**
|
204 |
- For an end-to-end example Jupyter notebook using the SageMaker built-in container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi/example_usage.ipynb).
|
205 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom docker container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi-custom/example_usage.ipynb).
|
|
|
307 |
```
|
308 |
|
309 |
|
310 |
+
## How to Serve MistralLite on vLLM ##
|
311 |
Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
|
312 |
|
313 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/vllm/example_usage.ipynb).
|