Deploy hundreds of open source models on one GPU using LoRAX
β’
3
@
decorator in ChatGPT. Once the function is selected, the model will either extract or improve your prompt (depending on how you ask).async def query_web_scraper(url: str) -> dict:
scraper = WebScraper(headless=False)
return await scraper.query_page_content(url)
# First API call: Send the query and function description to the model
response = ollama.chat(
model=model,
messages=messages,
tools=[
{
'type': 'function',
'function': {
'name': 'query_web_scraper',
'description': 'Scrapes the content of a web page and returns the structured JSON object with titles, articles, and associated links.',
'parameters': {
'type': 'object',
'properties': {
'url': {
'type': 'string',
'description': 'The URL of the web page to scrape.',
},
},
'required': ['url'],
},
},
},
]
)
docker pull apostacyh/vllm:lmcache-0.1.0
model=mistralai/Mistral-7B-Instruct-v0.2 # Replace with your model name
sudo docker run --runtime nvidia --gpus '"device=0"' \
-v <Huggingface cache dir on your local machine>:/root/.cache/huggingface \
-p 8000:8000 \
--env "HF_TOKEN=<Your huggingface access token>" \
--ipc=host \
--network=host \
apostacyh/vllm:lmcache-0.1.0 \
--model $model --gpu-memory-utilization 0.6 --port 8000 \
--lmcache-config-file /lmcache/LMCache/examples/example-local.yaml
# The second vLLM instance listens at port 8001
model=mistralai/Mistral-7B-Instruct-v0.2 # Replace with your model name
sudo docker run --runtime nvidia --gpus '"device=1"' \
-v <Huggingface cache dir on your local machine>:/root/.cache/huggingface \
-p 8001:8001 \
--env "HF_TOKEN=<Your huggingface token>" \
--ipc=host \
--network=host \
apostacyh/vllm:lmcache-0.1.0 \
--model $model --gpu-memory-utilization 0.7 --port 8001 \
--lmcache-config-file /lmcache/LMCache/examples/example.yaml
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = []
mapper = {"system": "system\n", "human": "\nuser\n", "gpt": "\nassistant\n"}
end_mapper = {"system": "", "human": "", "gpt": ""}
for convo in convos:
text = "".join(f"{mapper[(turn := x['from'])]} {x['value']}\n{end_mapper[turn]}" for x in convo)
texts.append(f"{text}{EOS_TOKEN}")
return {"text": texts}
dataset = dataset.map(formatting_prompts_func, batched=True)
print(dataset['text'][8])
thanks for letting me know. I have updated the post with the correct link https://colab.research.google.com/drive/1l9zh_VX0X4ylbzpGckCjH5yEflFsLW04?usp=sharing
TLDR:
BioLORD-2023 is a series of semantic language models for the biomedical domain, capable of representing clinical concepts and sentences in a semantic space aligned with human preferences. Our new multilingual version supports 50+ languages and is further finetuned on 7 European languages. These models were trained contrastively and through distillations, using a corpus unifying in the same latent space the concept names of biomedical concepts and their descriptions. For concepts which didn't have a description written by humans in UMLS, we use information contained in the SnomedCT knowledge graph and the capabilities of ChatGPT to generate synthetic data and improve our results.Thank you! π€
Thank you! I will try it again and let you know if there is any issues