--- language: - en license: llama3 tags: - text-generation-inference - transformers - unsloth - llama - trl - sft base_model: unsloth/llama-3-8b-bnb-4bit --- ### Model Description - **Developed by:** [Aadarsh Unni Wilson](https://huggingface.co/waadarsh) - **License:** https://llama.meta.com/llama3/license/ - **Developed by:** waadarsh - **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. ### Inference ```python !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" from unsloth import FastLanguageModel import torch max_seq_length = 2048 dtype = None load_in_4bit = False model, tokenizer = FastLanguageModel.from_pretrained( model_name = "waadarsh/llama3-8b-nissan-magnite-16bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) prompt_template_1 = """ You are a helpful assistant for customers of nissan magnite. You are given the following input. Please complete the response in a clear and comprehensive way. ## Question: {} ## Response: {}""" ``` ```python FastLanguageModel.for_inference(model) inputs = tokenizer( [ prompt_template_1.format( "Tell me about different variants of nissan magnite", #input "" # response ) ], return_tensors = "pt").to("cuda") with torch.autocast(device_type="cuda"): outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.5, repetition_penalty=1.2, use_cache=False) # Decode the outputs tokenizer.batch_decode(outputs) ``` ```shell Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation. ['\nYou are a helpful assistant for customers of nissan magnite. You are given the following input. Please complete the response in a clear and comprehensive way.\n## Question:\nTell me about different variants of nissan magnite\n\n## Response:\nThe Nissan Magnite comes in multiple variants: XE, XL, XV and XV Premium. Each variant has unique features and specifications suited for different needs.<|end_of_text|>'] ``` ```python inputs = tokenizer( [ prompt_template_1.format( "What type of infotainment system is available in the Nissan Magnite?", #input "" # response ) ], return_tensors = "pt").to("cuda") from transformers import TextStreamer text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128) ``` ```shell Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation. You are a helpful assistant for customers of nissan magnite. You are given the following input. Please complete the response in a clear and comprehensive way. ## Question: What type of infotainment system is available in the Nissan Magnite? ## Response: The Nissan Magnite features an 8-inch touchscreen infotainment system with Android Auto and Apple CarPlay compatibility. It is designed with a user-friendly interface and provides both entertainment and navigation solutions.<|end_of_text|> ``` [](https://github.com/unslothai/unsloth)