--- license: mit datasets: - nvidia/OpenMathInstruct-2 language: - en pipeline_tag: text2text-generation tags: - education - maths - art library_name: transformers --- # Llama-3.1 8B - OpenMathInstruct-2 This model is a fine-tuned version of Llama-3.1 8B designed specifically for solving mathematical problems. Leveraging the OpenMath dataset, it excels in generating accurate mathematical solutions based on instructional prompts. ## Table of Contents - [Model Description](#model-description) - [Usage](#usage) - [Installation](#installation) - [Loading the Model](#loading-the-model) - [Inference](#inference) - [Normal Inference](#normal-inference) - [Streaming Inference](#streaming-inference) - [Benefits](#benefits) - [License](#license) ## Model Description The Llama-3.1 8B model has been fine-tuned with the OpenMath dataset, which enhances its capability to interpret and solve mathematical problems. This model is particularly adept at understanding instructions and providing appropriate solutions. ## Usage ### Installation To use this model, ensure you have the required libraries installed: ```bash pip install torch transformers unsloth ``` ### Loading the Model You can load the model as follows: ```python from unsloth import FastLanguageModel model_name = "shivvamm/llama-3.18B-OpenMathInstruct-2" model = FastLanguageModel.from_pretrained(model_name) tokenizer = FastLanguageModel.from_pretrained(model_name, tokenizer=True) ``` ### Inference #### Normal Inference For standard inference, you can use the following code snippet: ```python input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Continue the Fibonacci sequence. ### Input: 1, 1, 2, 3, 5, 8 ### Response: """ inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=64) response = tokenizer.batch_decode(outputs, skip_special_tokens=True) print(response) ``` #### Streaming Inference For a more interactive experience, you can use streaming inference, which outputs tokens as they are generated: ```python from transformers import TextStreamer input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Continue the Fibonacci sequence. ### Input: 1, 1, 2, 3, 5, 8 ### Response: """ inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda") text_streamer = TextStreamer(tokenizer) model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000) ``` ## Benefits - **Fast Inference:** The model is optimized for speed, allowing for efficient generation of responses. - **High Accuracy:** Fine-tuned specifically for mathematical instructions, enhancing its problem-solving capabilities. - **Low Memory Usage:** Utilizing 4-bit quantization enables running on lower-end GPUs without running out of memory. ## License This model is licensed under the MIT License. See the LICENSE file for more information.