File size: 5,663 Bytes
c542397
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
datasets:
- emnlp2023/Calc-gsm8k
- emnlp2023/Calc-aqua_rat
- emnlp2023/Calc-math_qa
- emnlp2023/Calc-ape210k
metrics:
- exact_match
- rouge
model-index:
- name: calc-t5-lm-xl
  results:
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      type: gsm8k
      name: GSM8K
      split: validation
    metrics:
    - type: exact_match
      value: 0.420
    - type: rouge
      value: 0.627
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      type: aqua_rat
      name: AQUA-RAT
      split: validation
    metrics:
    - type: exact_match
      value: 0.06
    - type: rouge
      value: 0.323
license: apache-2.0
language:
- en
---

# Model Card for calc-t5-lm-xl

<!-- Provide a quick summary of what the model is/does. -->

This model generates reasoning chains over mathematical questions while **using an external tool: Sympy calculator**.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

With the idea to offload a symbolic reasoning from the stochastic language model, 
we train this model to utilize a calculator **for all applicable numeric operations**.
This is achieved by training the model to construct calls to the tool's API in this format:

```html
<gadget id="calculator">100/2</gadget> <output>50</output>
```

where `<gadget>` segment triggers a call of the tool, 
which is subsequently served by extending model's decoder input context by adding the output of the tool within the `<output>` segment.

- **Developed by:** Anonymous
- **Model type:** Autoregressive Encoder-Decoder
- **Language(s):** en
- **Finetuned from:** google/calc-t5-lm-xl

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/emnlp2023/gadgets
- **Paper:** Stay tuned!

## Usage

Additionally to conventional generation, using Tool-augmented generation requires 
(1) implementation of the tool(s) and 
(2) a customization of generate() method augmenting input context on-demand with the outputs of the tools.

You can find these two components implemented in the attached **gadget_assisted_model.py** and **gadget.py** in this model's repo 
and the project's [home repo](https://github.com/emnlp2023/gadgets).

After adding these two scripts to your directory, you can use the model as follows:

```python
from gadget_assisted_model import GadgetAssistedModel
from gadget import Calculator

from transformers import T5ForConditionalGeneration, T5Tokenizer


class GadgetAssistedT5(GadgetAssistedModel, T5ForConditionalGeneration):
    # GadgetAssistedModel overrides the standard generate() from transformers
    pass


model = GadgetAssistedT5.from_pretrained("emnlp2023/calc-t5-lm-xl")
tokenizer = T5Tokenizer.from_pretrained("emnlp2023/calc-t5-lm-xl")

model.prepare_for_generate(tokenizer, 
                           enabled_gadgets=[Calculator()], 
                           default_max_tokens=512)
query = """
    The profit from a business transaction is shared among 2 business partners, 
    Mike and Johnson in the ratio 2:5 respectively. 
    If Johnson got $2500, how much will Mike have 
    after spending some of his share on a shirt that costs $200?
"""

inputs = tokenizer(query, return_tensors="pt")
output_ids = model.generate(**inputs)
tokenizer.decode(output_ids[0], spaces_between_special_tokens=False)
```
This returns:
```html
According to the ratio, Mike got 2/5*$2500 = $<gadget id="calculator">2/5*2500</gadget><output>1_000</output> 1000 
Mike will have $1000-$200 = $<gadget id="calculator">1000-200</gadget><output>800</output> 800 after buying a shirt. 
Final result is<result>800</result></s>
```

### Out-of-Scope Usage

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring 
more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).

## Training Details

### Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

This model was trained on our Calculator-augmented set of [ape210k dataset github](https://github.com/Chenny0808/ape210k), 
[mathqa HF dataset](https://huggingface.co/datasets/math_qa),
[gsm8k HF dataset](https://huggingface.co/datasets/gsm8k),
[aqua_rat](https://huggingface.co/datasets/aqua_rat),
in a standard auto-regressive setup i.e. for a conditional next-token prediction with teacher-forced prefix.

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

The model was fine-tuned from [google/calc-t5-lm-xl](https://huggingface.co/google/calc-t5-lm-xl) for TODO steps 
aiming to maximise exact-match ration on a validation split of the questions from [gsm8k dataset](https://huggingface.co/datasets/gsm8k).
We fine-tune only TODO of the parameters finding that this circumvents overfitting to relatively small training dataset.

The full training configuration can be identified from the [training script](https://github.com/emnlp2023/gadgets/blob/9185d1fc4b4812321179f8e5cad3e2f2a764f1df/examples/train_gsm8k_flan-t5-slice.py).