CPU-Paper
Collection
Explore the use of NLP as a tool for policy advisors to efficiently track and assess climate policy documents (CPU: Climate Policy Understanding)
•
12 items
•
Updated
•
3
This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-base-en-v1.5 as the Sentence Transformer embedding model. A SetFitHead instance is used for classification.
The model has been trained using an efficient few-shot learning technique that involves:
The purpose of this model is to predict multiple labels simultaneously from a given input data. Specifically, the model will predict 3 labels - GHGLabel, NetzeroLabel, NonGHGLabel- that are relevant to a particular task or application
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("GIZ/SUBTARGET_multilabel_bge")
# Run inference
preds = model("This document enfolds Iceland’s first communication on its long-term strategy (LTS), to be updated when further analysis and policy documents are published on the matter. Iceland is committed to reducing its overall greenhouse gas emissions and reaching climate neutrality no later than 2040 and become fossil fuel free in 2050, which should set Iceland on a path to net negative emissions.")
Training set | Min | Median | Max |
---|---|---|---|
Word count | 19 | 78.5467 | 173 |
Training Dataset: 728
Class | Positive Count of Class |
---|---|
GHGLabel | 440 |
NetzeroLabel | 120 |
NonGHGLabel | 259 |
Validation Dataset: 80
Class | Positive Count of Class |
---|---|
GHGLabel | 49 |
NetzeroLabel | 11 |
NonGHGLabel | 30 |
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.0000 | 1 | 0.2227 | - |
0.1519 | 5000 | 0.015 | 0.0831 |
0.3038 | 10000 | 0.0146 | 0.0924 |
0.4557 | 15000 | 0.0197 | 0.0827 |
0.6076 | 20000 | 0.0031 | 0.0883 |
0.7595 | 25000 | 0.0439 | 0.0865 |
0.9114 | 30000 | 0.0029 | 0.0914 |
label | precision | recall | f1-score | support |
---|---|---|---|---|
GHG | 0.884 | 0.938 | 0.910 | 49.0 |
Netzero | 0.846 | 1.000 | 0.916 | 11.0 |
NonGHG | 0.903 | 0.933 | 0.918 | 30.0 |
Carbon emissions were measured using CodeCarbon.
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
Base model
BAAI/bge-base-en-v1.5