File size: 4,746 Bytes
c542397
 
ffc4a3d
 
 
 
c542397
 
 
 
 
 
 
 
e588049
c542397
 
 
 
e588049
c542397
e588049
c542397
 
 
 
 
 
 
 
 
 
e588049
c542397
 
e588049
c542397
 
e588049
 
 
 
 
 
c542397
 
 
 
 
 
e588049
c542397
e588049
 
c542397
 
 
 
 
 
f892e7e
 
c542397
f892e7e
e588049
 
 
c542397
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e588049
c542397
e588049
c542397
f892e7e
 
 
 
c542397
 
 
e588049
c542397
 
 
 
 
e588049
d2c4df6
e588049
 
c542397
2068669
 
 
 
 
 
 
 
 
 
dc8e682
2068669
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
datasets:
- MU-NLPC/Calc-gsm8k
- MU-NLPC/Calc-aqua_rat
- MU-NLPC/Calc-math_qa
- MU-NLPC/Calc-ape210k
metrics:
- exact_match
- rouge
license: apache-2.0
language:
- en
---

# Model Card for calcformer-t5-xl

This model generates reasoning chains over mathematical questions while **using an external tool: Sympy calculator**.


## Model Description

With the idea to offload the symbolic computation from the stochastic language model, 
we train this model to utilize a calculator **for all applicable numeric operations**.
This is achieved by training the model to construct calls to the tool's API in this format:

```html
<gadget id="calculator">100/2</gadget> <output>50</output>
```

where `<gadget>` segment triggers a call of the tool, 
which is subsequently served by extending model's decoder input context by adding the output of the tool within the `<output>` segment.

- **Developed by:** Calcformer team
- **Model type:** Autoregressive Encoder-Decoder
- **Language(s):** en
- **Finetuned from:** t5-xl


## Sources

- **Repository:** <https://github.com/prompteus/calc-x>
- **Paper:** <https://arxiv.org/abs/2305.15017>
- [**Calcformer model family on HF**](https://huggingface.co/collections/MU-NLPC/calcformers-65367392badc497807b3caf5)
- [**Calc-X dataset collection on HF**](https://huggingface.co/collections/MU-NLPC/calc-x-652fee9a6b838fd820055483)


## Usage

Additionally to conventional generation, using Tool-augmented generation requires 
(1) implementation of the tool(s) and 
(2) a customization of `generate()` method augmenting input context on-demand with the outputs of the tools.

You can find these two components implemented in the attached **gadgets/model.py** and **gadgets/gadget.py** in this model's repo 
and the project's [home repo](https://github.com/prompteus/calc-x).

After adding these two scripts to your directory, you can use the model as follows:

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

from gadgets.model import gadget_assisted_model
from gadgets.gadget import Calculator

GadgetAssistedT5 = gadget_assisted_model(T5ForConditionalGeneration)
model_name = "MU-NLPC/calcformer-t5-xl"
model = GadgetAssistedT5.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

model.prepare_for_generate(tokenizer, 
                           enabled_gadgets=[Calculator()], 
                           default_max_tokens=512)
query = """
    The profit from a business transaction is shared among 2 business partners, 
    Mike and Johnson in the ratio 2:5 respectively. 
    If Johnson got $2500, how much will Mike have 
    after spending some of his share on a shirt that costs $200?
"""

inputs = tokenizer(query, return_tensors="pt")
output_ids = model.generate(**inputs)
tokenizer.decode(output_ids[0], spaces_between_special_tokens=False)
```

This returns:

```html
According to the ratio, for every 5 parts that Johnson gets, Mike gets 2 parts Since Johnson got $2500,
each part is therefore $2500/5 = $<gadget id="calculator">2500/5</gadget><output>500</output> 500
Mike will get 2*$500 = $<gadget id="calculator">2*500</gadget><output>1_000</output> 1000
After buying the shirt he will have $1000-$200 = $<gadget id="calculator">1000-200</gadget><output>800</output> 800 left.
Final result is<result>800</result></s>
```

## Out-of-Scope Usage

Note that given the limited scope of the exercises' complexity in the training, this model will not work well for tasks requiring 
more complex algebraic operations, including equations, variables and operations outside the scope of (+-*/).


## Training

This model was trained on [Calc-X](https://huggingface.co/collections/MU-NLPC/calc-x-652fee9a6b838fd820055483), a collection of math problem datasets which we converted into CoT with calculator interactions.
We used a standard auto-regressive transformer training, i.e. a conditional next-token prediction with cross-entropy loss. For more detail about data, training or evaluation, see the [Calc-X and Calcformers paper](https://arxiv.org/abs/2305.15017).


## Cite

Please cite the [Calcformers paper](https://arxiv.org/abs/2305.15017) as follows:

```bibtex
@inproceedings{kadlcik-etal-2023-soft,
    title = "Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems",
    author = "Marek Kadlčík and Michal Štefánik and Ondřej Sotolář and Vlastimil Martinek",
    booktitle = "Proceedings of the The 2023 Conference on Empirical Methods in Natural Language Processing: Main track",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2305.15017",
}
```