Using FAST API to query the model.
#99
by
moc1pher
- opened
How to use FAST API to query the model. Any example that I can use ?
tabbyAPI is based on fastAPI - can probably learn a lot from it.
https://github.com/theroyallab/tabbyAPI
Is there an option where it is completely Fastapi ?
Maybe this is better for your use case: https://github.com/c0sogi/LLMChat
based on fastAPI, but without the OpenAI API layer
I just like tabbyAPI for running local models on exl2 quant and consuming them like you would OpenAI API
I wrote the following example that you can use:
# In-memory store for conversation history
session_history = {}
class ChatInput(BaseModel):
session_id: str
prompt: str
clear_history: bool = False
tone: str = None
class ChatOutput(BaseModel):
response: str
@app
.post("/bot", response_model=ChatOutput)
def chat_bot(input_data: ChatInput):
try:
session_id = input_data.session_id
prompt = input_data.prompt
clear_history = input_data.clear_history
tone = input_data.tone
if clear_history:
session_history.pop(session_id, None)
if session_id not in session_history:
session_history[session_id] = []
chat_history = session_history[session_id]
if prompt:
if tone:
prompt = f"[Tone: {tone}] {prompt}"
chat_history.append({"role": "user", "content": prompt})
input_ids = tokenizer.apply_chat_template(chat_history, return_tensors="pt").to(device)
input_length = input_ids.shape[1]
# Generate output from the model
outputs = model.generate(
input_ids,
max_new_tokens=200,
do_sample=True
)
generated_tokens = outputs[0][input_length:]
assistant_response = tokenizer.decode(generated_tokens, skip_special_tokens=True)
chat_history.append({"role": "assistant", "content": assistant_response})
# Return the response to the client
return {"response": assistant_response}
else:
return {"response": "Chat history cleared." if clear_history else "No prompt provided."}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))