--- tags: - autotrain - text-generation-inference - text-generation - peft library_name: transformers base_model: Arthur-LAGACHERIE/Gemma-2-2b-4bit widget: - messages: - role: user content: What is your favorite condiment? license: other --- # Usage This model uses the 4-bits quantization. So you need to install bitsandbytes to use it. ```python pip install bitsandbytes ``` For inference (streaming): ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline import torch from transformers import TextIteratorStreamer from threading import Thread device = 'cuda' if torch.cuda.is_available() else 'cpu' model_id = "Arthur-LAGACHERIE/Reflection-Gemma-2-2b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) prompt = """ ### System You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside tags, and then provide your final response inside tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside tags. Try an answer and see if it's correct before generate the ouput. But don't forget to think very carefully. ### Question The question here. """ chat = [ { "role": "user", "content": prompt}, ] question = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) question = tokenizer(question, return_tensors="pt").to(device) streamer = TextIteratorStreamer(tokenizer, skip_prompt=True) generation_kwargs = dict(question, streamer=streamer, max_new_tokens=4000) thread = Thread(target=model.generate, kwargs=generation_kwargs) # generate thread.start() for new_text in streamer: print(new_text, end="") ``` # Some info If you want to know how I fine tune it, what datasets I used and the training code. [See here]() # Model Trained Using AutoTrain This model was trained using AutoTrain. For more information, please visit [AutoTrain](https://hf.co/docs/autotrain).