This is a Llama 3.1 fine tune using the RL algorithm and benchmark data proposed in the paper "Deal or no deal (or who knows)" published in ACL Findings 2024. Models from this paper are designed to predict the outcome of an unfolding conversation, specifically noting the probability that the outcome will occur. For instance, these models can estimate the probability that a deal will occur before the end of a negotiation.
The "Direct Forecaster" (the model in this repo) is trained with RL to output the probability in it's sampled tokens. In the paper, this model seemed to handle out-of-distribution data the best. Based off experiments, we expect lower, non-zero temperatures to be best for sampling.
The "Implicit Forecaster" (available here) is trained with SFT to output the estimated probability using the logit for the token " Yes". In the paper, this model performed best overall . Temperature should be the default value (i.e., 1).
Here's a comparison of these models with some previous runs of GPT-4 (no fine-tuning). We use data priors and temperature scaling for both models (see paper for details).
model | alg | instances | Brier Score |
---|---|---|---|
Llama-3.1-8B-Instruct | DF RL interp | awry | 0.255467 |
casino | 0.216955 | ||
cmv | 0.261726 | ||
deals | 0.174899 | ||
deleted | 0.255129 | ||
donations | 0.251880 | ||
supreme | 0.231955 | ||
Llama-3.1-8B-Instruct | IF SFT | awry | 0.220083 |
casino | 0.196558 | ||
cmv | 0.207542 | ||
deals | 0.118853 | ||
deleted | 0.114553 | ||
donations | 0.238121 | ||
supreme | 0.223060 | ||
OpenAI GPT 4 | None | awry | 0.247775 |
casino | 0.204828 | ||
cmv | 0.230229 | ||
deals | 0.132760 | ||
deleted | 0.169750 | ||
donations | 0.262453 | ||
supreme | 0.230321 |
Note, for the best performance, certain prompt-engineering and post-processing procedures should be used (details in the paper).
The GitHub repo. (here) is also available if you wish to train new models with similiar training algorithms. This repo. also contains plenty of examples of how to use these models for inference and load them from a local directory.
For any questions, please reach feel free to reach out!
Some quantization details are given below:
library_name: peft
Training procedure
The following bitsandbytes
quantization config was used during training:
- quant_method: QuantizationMethod.BITS_AND_BYTES
- _load_in_8bit: False
- _load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float16
- bnb_4bit_quant_storage: uint8
- load_in_4bit: True
- load_in_8bit: False
The following bitsandbytes
quantization config was used during training:
- quant_method: QuantizationMethod.BITS_AND_BYTES
- _load_in_8bit: False
- _load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float16
- bnb_4bit_quant_storage: uint8
- load_in_4bit: True
- load_in_8bit: False
Framework versions
PEFT 0.5.0
PEFT 0.5.0