Copycats
/

EEVE-Korean-Instruct-10.8B-v1.0-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

Copycats commited on Apr 6

Commit

b0bb228

•

1 Parent(s): 3a82246

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -44,8 +44,11 @@ Documentation on installing and using vLLM [can be found here](https://vllm.read
 - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
 ```shell
-python3 -m vllm.entrypoints.openai.api_server --model Copycats/EEVE-Korean-Instruct-10.8B-v1.0-AWQ --quantization awq --dtype float16
 ```
 #### Querying the model using OpenAI Chat API:
 - You can use the create chat completion endpoint to communicate with the model in a chat-like interface:

 - vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API
 ```shell
+python3 -m vllm.entrypoints.openai.api_server --model Copycats/EEVE-Korean-Instruct-10.8B-v1.0-AWQ --quantization awq --dtype half
 ```
+ - --model: huggingface model path
+ - --quantization: ”awq”
+ - --dtype: “half” for FP16. Recommended for AWQ quantization.
 #### Querying the model using OpenAI Chat API:
 - You can use the create chat completion endpoint to communicate with the model in a chat-like interface: