mtyrrell's picture
Update README.md
5a7f0b1
metadata
license: apache-2.0
base_model: sentence-transformers/all-mpnet-base-v2
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: IKT_classifier_transport_ghg_best
    results: []
widget:
  - text: >-
      Unconditional Contribution In the unconditional scenario, GHG emissions
      would be reduced by 27.56 Mt CO2e (6.73%) below BAU in 2030 in the
      respective sectors. 26.3 Mt CO2e (95.4%) of this emission reduction will
      be from the Energy sector while 0.64 (2.3%) and 0.6 (2.2%) Mt CO2e
      reduction will be from AFOLU (agriculture) and waste sector respectively.
      There will be no reduction in the IPPU sector. Conditional Contribution In
      the conditional scenario, GHG emissions would be reduced by 61.9 Mt CO2e
      (15.12%) below BAU in 2030 in the respective sectors.
    example_title: GHG
  - text: >-
      Key Long-Term Climate Actions Cleaner and greener vehicles on our roads
      Singapore is working to enhance the overall carbon efficiency of our land
      transport system through the large-scale adoption of green vehicles. By
      2040, we aim to phase out internal combustion engine vehicles and have all
      vehicles running on cleaner energy. We will introduce policies and
      initiatives to encourage the adoption of EVs. The public sector itself
      will take the lead and progressively procure and use cleaner vehicles.
    example_title: NOT_GHG
  - text: >-
      This includes installation of rooftop PV panels for electricity
      generation, 5,300 solar water heaters, and expand the use of LED lighting
      in residential sector by 2030. • Expanding on energy efficiency labels and
      specifications for appliances programme, elimination of non-energy
      efficient equipment, and raising awareness among consumers on purchasing
      alternative energy efficient home appliances.
    example_title: NEGATIVE

IKT_classifier_transport_ghg_best

This model is a fine-tuned version of sentence-transformers/all-mpnet-base-v2 on the GIZ/policy_qa_v0_1 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4963
  • Precision Macro: 0.9175
  • Precision Weighted: 0.8942
  • Recall Macro: 0.9156
  • Recall Weighted: 0.8936
  • F1-score: 0.9162
  • Accuracy: 0.8936

Model description

The model is a multi-class text classifier based on sentence-transformers/all-mpnet-base-v2 and fine-tuned on text sourced from national climate policy documents.

Intended uses & limitations

The classifier assigns a class of 'GHG','NOT_GHG', or 'NEGATIVE' to denote alignment with GHG-related transport targets in extracted passages from the documents. The 'NEGATIVE' class in this case relates to negative samples not aligning with targets.

The intended use is for climate policy researchers and analysts seeking to automate the process of reviewing lengthy, non-standardized PDF documents to produce summaries and reports.

The performance of the classifier is middle of the road. On training, the classifier exhibited very good overall performance (F1 ~ 0.9). This performance was evenly balanced between precise identification of true positive classifications (precision ~ 0.9) and a wide net to capture as many true positives as possible (recall ~ 0.9). By contrast, when tested on real world unseen test data, the performance was mediocre (F1 ~ 0.6). However, testing was based on a very small out-of-sample dataset. Therefore classification performance may differ in the wild.

Training and evaluation data

The training dataset is comprised of labelled passages from 2 sources:

The combined datasetGIZ/policy_qa_v0_1 contains ~85k rows. Each row is duplicated twice, to provide varying sequence lengths (denoted by the values 'small', 'medium', and 'large', which correspond to sequence lengths of 60, 85, and 150 respectively - indicated in the 'strategy' column). This effectively means the dataset is reduced by 1/3 in useful size, and the 'strategy' value should be selected based on the use case. For this training, we utilized the 'medium' samples Furthermore, for each row, the 'context' column contains 3 samples of varying quality. The approach used to assess quality and select samples is described below.

The pre-processing operations used to produce the final training dataset were as follows:

  1. Dataset is filtered based on 'medium' value in 'strategy' column (sequence length = 85).
  2. For ClimateWatch, all rows are removed as there was assessed to be no taxonomical alignment with the IKITracs labels inherent to the dataset.
  3. For IKITracs, labels are assigned based on 'parameter' values which correspond to assessments of Transport-related GHG targets by human annotaters. The specific assignments are as follows:
    • 'GHG': target_labels_ghg_yes = ['T_Transport_Unc','T_Transport_C']
    • 'NOT_GHG': target_labels_ghg_no = ['T_Adaptation_Unc', 'T_Adaptation_C', 'T_Transport_O_Unc', 'T_Transport_O_C']
    • 'NEGATIVE': random sample of other labeled data omitting above labels
  4. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'.
  5. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
  6. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
  7. Data is then augmented using sentence shuffle from the albumentations library and NLP-based insertions using nlpaug. This is done to increase the number of training samples available for the GHG class from 42 to 84. The end result is a more equal sample per class breakdown of:
    • GHG: 84
    • NOT-GHG: 191
    • NEGATIVE: 190
  8. To address the remaining class imbalance, inverse frequency class weights are computed and passed to a custom single label trainer function which is used during hyperparameter tuning and final model training.

Training procedure

The model hyperparameters were tuned using optuna over 10 trials on a truncated training and validation dataset. The model was then trained over 5 epochs using the best hyperparameters identified.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6.900299287565753e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100.0
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Precision Macro Precision Weighted Recall Macro Recall Weighted F1-score Accuracy
No log 1.0 53 0.3979 0.8806 0.8800 0.8964 0.8723 0.8819 0.8723
No log 2.0 106 0.7787 0.8428 0.8005 0.7377 0.7872 0.7695 0.7872
No log 3.0 159 0.4507 0.9028 0.8747 0.8981 0.8723 0.8990 0.8723
No log 4.0 212 0.7270 0.9019 0.8752 0.8680 0.8723 0.8830 0.8723
No log 5.0 265 0.4963 0.9175 0.8942 0.9156 0.8936 0.9162 0.8936

Framework versions

  • Transformers 4.31.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.13.1
  • Tokenizers 0.13.3