Update README.md
Browse files
README.md
CHANGED
@@ -11,12 +11,21 @@ tags:
|
|
11 |
|
12 |
# ChatGLM-6B + ONNX
|
13 |
|
14 |
-
This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference.
|
15 |
|
16 |
Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU.
|
17 |
|
18 |
安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理。
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
Codes are released under MIT license.
|
21 |
|
22 |
Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).
|
|
|
11 |
|
12 |
# ChatGLM-6B + ONNX
|
13 |
|
14 |
+
This model is exported from [ChatGLM-6b](https://huggingface.co/THUDM/chatglm-6b) with int8 quantization and optimized for [ONNXRuntime](https://onnxruntime.ai/) inference. Export code in [this repo](https://github.com/K024/chatglm-q).
|
15 |
|
16 |
Inference code with ONNXRuntime is uploaded with the model. Install requirements and run `streamlit run web-ui.py` to start chatting. Currently the `MatMulInteger` (for u8s8 data type) and `DynamicQuantizeLinear` operators are only supported on CPU.
|
17 |
|
18 |
安装依赖并运行 `streamlit run web-ui.py` 预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理。
|
19 |
|
20 |
+
## Usage
|
21 |
+
|
22 |
+
```sh
|
23 |
+
git lfs clone https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8
|
24 |
+
cd ChatGLM-6b-onnx-u8s8
|
25 |
+
pip install -r requirements.txt
|
26 |
+
streamlit run web-ui.py
|
27 |
+
```
|
28 |
+
|
29 |
Codes are released under MIT license.
|
30 |
|
31 |
Model weights are released under the same license as ChatGLM-6b, see [MODEL LICENSE](https://huggingface.co/THUDM/chatglm-6b/blob/main/MODEL_LICENSE).
|