Spaces:
Runtime error
On compositional task structure
In the current version of the benchmark, we implement a selection of separate tasks like counting and simple arithmetic – one example prompt for arithmetic might be “Show 4 cups and half as many plates”. To solve this problem, an algorithm does not only need to know the arithmetic representation of “half” but also needs to do counting (4 cups) and know the constituent objects (cups, plates). In our current implementation we hence introduced one layer of task compositionality: Every task has linked subtasks which are about the separate objects represented in a prompt (cup and plate for the prompt above). Our prompt database already decomposes prompts into their sub-representations and notes these links for prompts to be presented together.
This implementation ignores that the arithmetic prompt mentioned above also includes counting as a subtask. To fully delineate the capabilities of algorithms on subtasks, it might be desirable to note all these hierarchical links of task compositionality in the prompt database. We are considering adding this to a future version of the benchmark – in a way that still allows users to easily add their own tasks and does not overcomplicate the task category names. Currently these are just the top-level tasks (like arithmetic and counting) and the subtask (single object representation). If we move to a fully hierarchical representation of tasks, we will need to carry this complexity into the task name of each prompt.
If you have an opinion on whether this level of detail would be a valuable feature to have, we would love to hear your opinion below.
This hierarchical structure could also be used to account for common vs uncommon relationships which currently is not neatly delineated across all the prompts we use.