hf-llm-api / README.md
Hansimov's picture
:gem: [Feature] New model enabled: gemma-7b
77b5a47
|
raw
history blame
4.18 kB
---
title: HF LLM API
emoji: ☯️
colorFrom: gray
colorTo: gray
sdk: docker
app_port: 23333
---
## HF-LLM-API
Huggingface LLM Inference API in OpenAI message format.
Project link: https://github.com/Hansimov/hf-llm-api
## Features
- Available Models (2024/01/22): [#5](https://github.com/Hansimov/hf-llm-api/issues/5)
- `mistral-7b`, `mixtral-8x7b`, `nous-mixtral-8x7b`, `gemma-7b`
- Adaptive prompt templates for different models
- Support OpenAI API format
- Enable api endpoint via official `openai-python` package
- Support both stream and no-stream response
- Support API Key via both HTTP auth header and env varible [#4](https://github.com/Hansimov/hf-llm-api/issues/4)
- Docker deployment
## Run API service
### Run in Command Line
**Install dependencies:**
```bash
# pipreqs . --force --mode no-pin
pip install -r requirements.txt
```
**Run API:**
```bash
python -m apis.chat_api
```
## Run via Docker
**Docker build:**
```bash
sudo docker build -t hf-llm-api:1.0 . --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
```
**Docker run:**
```bash
# no proxy
sudo docker run -p 23333:23333 hf-llm-api:1.0
# with proxy
sudo docker run -p 23333:23333 --env http_proxy="http://<server>:<port>" hf-llm-api:1.0
```
## API Usage
### Using `openai-python`
See: [`examples/chat_with_openai.py`](https://github.com/Hansimov/hf-llm-api/blob/main/examples/chat_with_openai.py)
```py
from openai import OpenAI
# If runnning this service with proxy, you might need to unset `http(s)_proxy`.
base_url = "http://127.0.0.1:23333"
# Your own HF_TOKEN
api_key = "hf_xxxxxxxxxxxxxxxx"
# use below as non-auth user
# api_key = "sk-xxx"
client = OpenAI(base_url=base_url, api_key=api_key)
response = client.chat.completions.create(
model="mixtral-8x7b",
messages=[
{
"role": "user",
"content": "what is your model",
}
],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
elif chunk.choices[0].finish_reason == "stop":
print()
else:
pass
```
### Using post requests
See: [`examples/chat_with_post.py`](https://github.com/Hansimov/hf-llm-api/blob/main/examples/chat_with_post.py)
```py
import ast
import httpx
import json
import re
# If runnning this service with proxy, you might need to unset `http(s)_proxy`.
chat_api = "http://127.0.0.1:23333"
# Your own HF_TOKEN
api_key = "hf_xxxxxxxxxxxxxxxx"
# use below as non-auth user
# api_key = "sk-xxx"
requests_headers = {}
requests_payload = {
"model": "mixtral-8x7b",
"messages": [
{
"role": "user",
"content": "what is your model",
}
],
"stream": True,
}
with httpx.stream(
"POST",
chat_api + "/chat/completions",
headers=requests_headers,
json=requests_payload,
timeout=httpx.Timeout(connect=20, read=60, write=20, pool=None),
) as response:
# https://docs.aiohttp.org/en/stable/streams.html
# https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb
response_content = ""
for line in response.iter_lines():
remove_patterns = [r"^\s*data:\s*", r"^\s*\[DONE\]\s*"]
for pattern in remove_patterns:
line = re.sub(pattern, "", line).strip()
if line:
try:
line_data = json.loads(line)
except Exception as e:
try:
line_data = ast.literal_eval(line)
except:
print(f"Error: {line}")
raise e
# print(f"line: {line_data}")
delta_data = line_data["choices"][0]["delta"]
finish_reason = line_data["choices"][0]["finish_reason"]
if "role" in delta_data:
role = delta_data["role"]
if "content" in delta_data:
delta_content = delta_data["content"]
response_content += delta_content
print(delta_content, end="", flush=True)
if finish_reason == "stop":
print()
```