--- license: gemma datasets: - flytech/python-codes-25k widget: - text: "write a simple python function" example_title: "Example 1" - text: "write a python program using flask" example_title: "Example 2" - text: "make a todo list using python" example_title: "Example 3" - text: "print current date and time using python" example_title: "Example 4" language: - en pipeline_tag: text-generation --- # Gemma-2b-it-finetuned-python-codes This model card corresponds to the 2B finetuned version of the Gemma-2b-it model. You can visit the model card of the [2B Gemma Instruct](https://huggingface.co/google/gemma-2b-it). **Author**: Dishank Shah ### Description GifPC-2b (Gemma-2b-it-finetuned-python-codes) LLM is trained on a dataset containing Python code snippets. This specialized training aimed to enhance Gemma-2b-it's understanding of Python syntax, semantics, and common programming patterns. With this finetuning, Gemma-2b-it is now proficient in not only comprehending Python code but also capable of assisting in debugging tasks. Users can leverage its trained knowledge to seek guidance on Python-related issues, understand code logic, and identify potential errors within their programs. This specialized Gemma-2b-it variant serves as a valuable tool for programmers seeking assistance and guidance in Python programming and debugging tasks. ### Usage Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase. #### Running the model on Google Colab CPU ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "shahdishank/gemma-2b-it-finetune-python-codes" HUGGING_FACE_TOKEN = "YOUR_TOKEN" tokenizer = AutoTokenizer.from_pretrained(model_name, token="HUGGING_FACE_TOKEN") model = AutoModelForCausalLM.from_pretrained(model_name, token="HUGGING_FACE_TOKEN") prompt_template = """\ user:\n{query} \n\n assistant:\n """ prompt = prompt_template.format(query="write a simple python function") # write your query here input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=True) outputs = model.generate(**input_ids, max_new_tokens=2000, do_sample=True, pad_token_id=tokenizer.eos_token_id) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Model Data Data used for model training [python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k). ### Training Dataset These models were trained on a dataset of text data that includes a wide variety of python codes. Here are the key components: * Instruction: The instructional task to be performed / User input. * Input: Very short, introductive part of AI response or empty. * Output: Python code that accomplishes the task. * Text: All fields combined together. This diverse data source is crucial for training a powerful language model that can handle a wide variety of different tasks. ### Usage This LLM can be used for: * Code generation * Debugging * Learn and understand various python coding styles