Update README
Browse files
README.md
CHANGED
@@ -18,9 +18,11 @@ license: llama2
|
|
18 |
|
19 |
## <a id="models"></a> Usage
|
20 |
|
21 |
-
To use these models, we highly recommend installing the OpenChat
|
22 |
|
23 |
-
When started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat). See the example request below for reference. Additionally, you can access the [OpenChat Web UI](
|
|
|
|
|
24 |
|
25 |
<details>
|
26 |
<summary>Example request (click to expand)</summary>
|
@@ -33,14 +35,15 @@ curl http://localhost:18888/v1/chat/completions \
|
|
33 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
34 |
}'
|
35 |
```
|
|
|
36 |
</details>
|
37 |
|
38 |
| Model | Size | Context | Weights | Serving |
|
39 |
|---------------|------|---------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
|
40 |
-
| OpenChat 3.1 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --
|
41 |
-
| OpenChat 3.2 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --
|
42 |
|
43 |
-
|
44 |
|
45 |
<details>
|
46 |
<summary>Conversation templates (click to expand)</summary>
|
|
|
18 |
|
19 |
## <a id="models"></a> Usage
|
20 |
|
21 |
+
To use these models, we highly recommend installing the OpenChat package by following the [installation guide](https://github.com/imoneoi/openchat/#installation) and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using [vLLM](https://github.com/vllm-project/vllm) and can run on a GPU with at least 48GB RAM or two consumer GPUs with tensor parallelism. To enable tensor parallelism, append `--tensor-parallel-size 2` to the serving command.
|
22 |
|
23 |
+
When started, the server listens at `localhost:18888` for requests and is compatible with the [OpenAI ChatCompletion API specifications](https://platform.openai.com/docs/api-reference/chat). See the example request below for reference. Additionally, you can access the [OpenChat Web UI](#web-ui) for a user-friendly experience.
|
24 |
+
|
25 |
+
To deploy the server as an online service, use `--api-keys sk-KEY1 sk-KEY2 ...` to specify allowed API keys and `--disable-log-requests --disable-log-stats --log-file openchat.log` for logging only to a file. We recommend using a [HTTPS gateway](https://fastapi.tiangolo.com/es/deployment/concepts/#security-https) in front of the server for security purposes.
|
26 |
|
27 |
<details>
|
28 |
<summary>Example request (click to expand)</summary>
|
|
|
35 |
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
|
36 |
}'
|
37 |
```
|
38 |
+
|
39 |
</details>
|
40 |
|
41 |
| Model | Size | Context | Weights | Serving |
|
42 |
|---------------|------|---------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
|
43 |
+
| OpenChat 3.1 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.1) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.1_llama2 --model openchat/openchat_v3.1 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
44 |
+
| OpenChat 3.2 | 13B | 4096 | [Huggingface](https://huggingface.co/openchat/openchat_v3.2) | `python -m ochat.serving.openai_api_server --model-type openchat_v3.2 --model openchat/openchat_v3.2 --engine-use-ray --worker-use-ray --max-num-batched-tokens 5120` |
|
45 |
|
46 |
+
For inference with Huggingface Transformers (slow and not recommended), follow the conversation template provided below:
|
47 |
|
48 |
<details>
|
49 |
<summary>Conversation templates (click to expand)</summary>
|