Edit model card

Model Card for Model ID

Learning

This exploration highlights the innovative use of the Learning Rate Annealing (LoRA) technique in the context of fine-tuning a T5 model. Based on the google/flan-t5-large architecture and utilizing the PEFT library, this approach aims to refine the model's capabilities specifically for question-answering (QA) tasks. The entire fine-tuning code is available on Kaggle at the following link: Kaggle code link.

The exploration focuses on the fine-tuning methodology, leveraging LoRA to dynamically adjust the learning rate during the training process. This strategic choice aims to optimize the model's convergence and enhance its performance specifically for text generation tasks in response to questions.

The utilized datasets, such as MohamedRashad/ChatGPT-prompts and Hello-SimpleAI/HC3, contribute to enriching the diversity and complexity of linguistic interactions, thereby strengthening the model's ability to adapt to varied conversational contexts.

The resulting model, identified by the specified model ID, is intended for direct use in text generation scenarios while also offering the possibility of additional fine-tuning for specific tasks. Evaluation metrics, including accuracy and ROUGE score, provide an objective assessment of the model's performance.

To facilitate accessibility and usage, the entire fine-tuning code is available on Kaggle, serving as a practical and transparent resource for the natural language processing (NLP) practitioner community.

Model Details

Model Description

The model is based on the T5 architecture (google/flan-t5-large) and has undergone fine-tuning using the PEFT library. It is designed to generate text responses in a question-answering format. The model is available under the Creative Commons Attribution-ShareAlike 4.0 International License (creativeml-openrail-m).

  • Developed by: YanSte
  • Model type: [flan-t5-large]

Uses

Direct Use

The model can be directly employed for text-to-text generation tasks, with a focus on generating responses to questions in a conversational format.

How to Get Started with the Model

Use the code below to get started with the model.

# Importing necessary libraries
from transformers import AutoTokenizer, T5ForConditionalGeneration
from transformers import pipeline

# Load the pre-trained tokenizer and fine-tuned model from the specified hub repository
tokenizer = AutoTokenizer.from_pretrained(hub_repo_name)
finetuned_model = T5ForConditionalGeneration.from_pretrained(hub_repo_name)

# Create a text generation pipeline using the fine-tuned model
text_generation_pipeline = pipeline(
    task=pipeline_task,
    model=finetuned_model,
    tokenizer=tokenizer,
    truncation=True,
    max_length=pipeline_max_length,
    min_length=pipeline_min_length,
    temperature=pipeline_temperature,
    device=0  # Set device to 0 for GPU, -1 for CPU
)

# Define a list of questions for text generation
questions = ["What is Sherlock Holmes' job?"]

# Prefix each question with the specified prefix for the task
prefix = "Answer this question: "
transformed_questions = [prefix + question for question in questions]

# Generate texts using the text generation pipeline with the transformed questions
generated_texts = text_generation_pipeline(transformed_questions, do_sample=True)

Training Details

Training Data

The model has been fine-tuned on datasets such as MohamedRashad/ChatGPT-prompts and Hello-SimpleAI/HC3. More detailed information on the training data, including links to Dataset Cards and preprocessing details, is needed.

Framework versions

  • PEFT 0.7.1
Downloads last month
15
Inference Examples
Inference API (serverless) does not yet support peft models for this pipeline type.

Model tree for YanSte/t5_large_fine_tuning_question_answering_hc3_chatgpt_prompts

Adapter
(143)
this model

Datasets used to train YanSte/t5_large_fine_tuning_question_answering_hc3_chatgpt_prompts