gemma-2b-orpo-GGUF
This is a GGUF quantized version of the gemma-2b-orpo
model:
an ORPO fine-tune of google/gemma-2b.
You can find more information, including evaluation and training/usage notebook in the gemma-2b-orpo
model card
๐ฎ Model in action
The model can run with all the libraries that are part of the Llama.cpp ecosystem.
If you need to apply the prompt template manually, take a look at the tokenizer_config.json of the original model.
๐ฑ Run the model on a budget smartphone -> see my recent post
Here a simple example with Llama.cpp python:
! pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="anakin87/gemma-2b-orpo-GGUF",
filename="gemma-2b-orpo.Q5_K_M.gguf",
verbose=True # for a known bug, verbose must be True
)
# text generation - prompt template applied manually
llm("<bos><|im_start|> user\nName the planets in the solar system<|im_end|>\n<|im_start|>assistant\n", max_tokens=75)
# chat completion - prompt template automatically applied
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "Please list some places to visit in Italy"
}
]
)
- Downloads last month
- 7