pcl_22 / README.md
oluwatosin adewumi
code fixed
bc3a45b
metadata
thumbnail: https://huggingface.co/front/thumbnails/dialogpt.png
language:
  - en
license: cc-by-4.0
tags:
  - text classification
  - transformers
datasets:
  - PCL
metrics:
  - F1
inference: false

T5Base-PCL

This is a fine-tuned model of T5 (base) on the patronizing and condenscending language (PCL) dataset by Pérez-Almendros et al (2020) used for Task 4 competition of SemEval-2022. It is intended to be used as a classification model for identifying PCL (0 - neg; 1 - pos). The task prefix we used for the T5 model is 'classification: '.

The dataset it's trained on is limited in scope, as it covers only some news texts covering about 20 English-speaking countries. The macro F1 score achieved on the test set, based on the official evaluation, is 0.5452. More information about the original pre-trained model can be found here

  • Classification examples:
    Prediction Input
    0 selective kindness : in europe , some refugees are more equal than others
    1 he said their efforts should not stop only at creating many graduates but also extended to students from poor families so that they could break away from the cycle of poverty

How to use

from transformers import T5ForConditionalGeneration, T5Tokenizer
import torch
model = T5ForConditionalGeneration.from_pretrained("tosin/pcl_22")
tokenizer = T5Tokenizer.from_pretrained("t5-base") # use the source tokenizer because T5 finetuned tokenizer breaks
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer("he said their efforts should not stop only at creating many graduates but also extended to students from poor families so that they could break away from the cycle of poverty", padding=True, truncation=True, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
pred = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(pred)