Edit model card

Usage

This model uses the 4-bits quantization. So you need to install bitsandbytes to use it.

pip install bitsandbytes

For inference (streaming):

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
from transformers import TextIteratorStreamer
from threading import Thread
device = 'cuda' if torch.cuda.is_available() else 'cpu'

model_id = "Arthur-LAGACHERIE/Reflection-Gemma-2-2b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = """
### System
You are a world-class AI system, capable of complex reasoning and reflection. 
Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. 
If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
Try an answer and see if it's correct before generate the ouput. 
But don't forget to think very carefully.

### Question
The question here.
"""

chat = [
    { "role": "user", "content": prompt},
]
question = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
question = tokenizer(question, return_tensors="pt").to(device)
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True)
generation_kwargs = dict(question, streamer=streamer, max_new_tokens=4000)
thread = Thread(target=model.generate, kwargs=generation_kwargs)

# generate
thread.start()
for new_text in streamer:
    print(new_text, end="")

Some info

If you want to know how I fine tune it, what datasets I used and the training code. See here

Model Trained Using AutoTrain

This model was trained using AutoTrain. For more information, please visit AutoTrain.

Downloads last month
16
Safetensors
Model size
1.63B params
Tensor type
F32
FP16
U8
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for Arthur-LAGACHERIE/Reflection-Gemma-2-2b

Quantized
this model