---
license: gemma
datasets:
- flytech/python-codes-25k
widget:
- text: "write a simple python function"
  example_title: "Example 1"
- text: "write a python program using flask"
  example_title: "Example 2"
- text: "make a todo list using python"
  example_title: "Example 3"
- text: "print current date and time using python"
  example_title: "Example 4"
language:
- en
pipeline_tag: text-generation
---

# Gemma-2b-it-finetuned-python-codes

This model card corresponds to the 2B finetuned version of the Gemma-2b-it model. You can visit the model card of the [2B Gemma Instruct](https://huggingface.co/google/gemma-2b-it). 

**Author**: Dishank Shah

### Description

GifPC-2b (Gemma-2b-it-finetuned-python-codes) LLM is trained on a dataset containing Python code snippets.
This specialized training aimed to enhance Gemma-2b-it's understanding of Python syntax, semantics, and common programming patterns.
With this finetuning, Gemma-2b-it is now proficient in not only comprehending Python code but also capable of assisting in debugging tasks.
Users can leverage its trained knowledge to seek guidance on Python-related issues, understand code logic, and identify potential errors within their programs.
This specialized Gemma-2b-it variant serves as a valuable tool for programmers seeking assistance and guidance in Python programming and debugging tasks.

### Usage

Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.

#### Running the model on Google Colab CPU

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "shahdishank/gemma-2b-it-finetune-python-codes"
HUGGING_FACE_TOKEN = "YOUR_TOKEN"
tokenizer = AutoTokenizer.from_pretrained(model_name, token="HUGGING_FACE_TOKEN")
model = AutoModelForCausalLM.from_pretrained(model_name, token="HUGGING_FACE_TOKEN")

prompt_template = """\
  user:\n{query} \n\n assistant:\n
  """
prompt = prompt_template.format(query="write a simple python function") # write your query here

input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
outputs = model.generate(**input_ids, max_new_tokens=2000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Model Data

Data used for model training [python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k).

### Training Dataset

These models were trained on a dataset of text data that includes a wide variety
of python codes. Here are the key components:

* Instruction: The instructional task to be performed / User input.
* Input: Very short, introductive part of AI response or empty.
* Output: Python code that accomplishes the task.
* Text: All fields combined together.

This diverse data source is crucial for training a powerful
language model that can handle a wide variety of different tasks.

### Usage

This LLM can be used for:
* Code generation
* Debugging
* Learn and understand various python coding styles