---
license: mit
datasets:
- nvidia/OpenMathInstruct-2
language:
- en
pipeline_tag: text2text-generation
tags:
- education
- maths
- art
library_name: transformers
---


# Llama-3.1 8B - OpenMathInstruct-2

This model is a fine-tuned version of Llama-3.1 8B designed specifically for solving mathematical problems. Leveraging the OpenMath dataset, it excels in generating accurate mathematical solutions based on instructional prompts.

## Table of Contents
- [Model Description](#model-description)
- [Usage](#usage)
  - [Installation](#installation)
  - [Loading the Model](#loading-the-model)
  - [Inference](#inference)
    - [Normal Inference](#normal-inference)
    - [Streaming Inference](#streaming-inference)
- [Benefits](#benefits)
- [License](#license)

## Model Description

The Llama-3.1 8B model has been fine-tuned with the OpenMath dataset, which enhances its capability to interpret and solve mathematical problems. This model is particularly adept at understanding instructions and providing appropriate solutions.

## Usage

### Installation

To use this model, ensure you have the required libraries installed:

```bash
pip install torch transformers unsloth
```

### Loading the Model

You can load the model as follows:

```python
from unsloth import FastLanguageModel

model_name = "shivvamm/llama-3.18B-OpenMathInstruct-2"
model = FastLanguageModel.from_pretrained(model_name)
tokenizer = FastLanguageModel.from_pretrained(model_name, tokenizer=True)
```

### Inference

#### Normal Inference

For standard inference, you can use the following code snippet:

```python
input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response)
```

#### Streaming Inference

For a more interactive experience, you can use streaming inference, which outputs tokens as they are generated:

```python
from transformers import TextStreamer

input_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the Fibonacci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
"""

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)
```

## Benefits

- **Fast Inference:** The model is optimized for speed, allowing for efficient generation of responses.
- **High Accuracy:** Fine-tuned specifically for mathematical instructions, enhancing its problem-solving capabilities.
- **Low Memory Usage:** Utilizing 4-bit quantization enables running on lower-end GPUs without running out of memory.

## License

This model is licensed under the MIT License. See the LICENSE file for more information.