GPT-3.5Turbo HumanEval Contamination based on "Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models"
Browse files## What are you reporting:
- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
**Evaluation dataset(s)**: Name(s) of the evaluation dataset(s). If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise provide a link to a paper, GitHub or dataset-card.
`openai_humaneval`
**Contaminated model(s)**: Name of the model(s) (if any) that have been contaminated with the evaluation dataset. If available in the HuggingFace Hub please list the corresponding paths (e.g. `allenai/OLMo-7B`).
`GPT-3.5Turbo0613, GPT-3.5Turbo1106`
## Briefly describe your method to detect data contamination
- [ ] Data-based approach
- [x] Model-based approach
[Paper](https://arxiv.org/pdf/2402.15938) introduces Contamination Detection via output Distribution approach for LLM contamination detection by identifying the peakedness of LLM’s output distribution, under the assumption that exposure to the data during the training would alter the shape of the model's output distribution. Paper presents an example of data contamination and non-contamination detection for HumanEval and custom subset of CodeForces dataset.
## Citation
Is there a paper that reports the data contamination or describes the method used to detect data contamination?
URL: [https://arxiv.org/pdf/2402.15938](https://arxiv.org/pdf/2402.15938)
Citation:
`@article{dong2024generalization,
title={Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models},
author={Dong, Yihong and Jiang, Xue and Liu, Huanyu and Jin, Zhi and Li, Ge},
journal={arXiv preprint arXiv:2402.15938},
year={2024}
}`
*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Kateryna Solonko
- Email: [email protected]
- contamination_report.csv +2 -0
@@ -3,6 +3,8 @@ Evaluation Dataset;Subset;Contaminated Source;Model or corpus;Train Split;Develo
|
|
3 |
gsm8k;;GPT-4;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
|
4 |
ucinlp/drop;;GPT-4;model;;44.00;;model-based;https://arxiv.org/abs/2311.06233;8
|
5 |
openai_humaneval;;GPT-4;model;;;56.71;model-based;https://arxiv.org/abs/2311.06233;8
|
|
|
|
|
6 |
imdb;;GPT-4;model;;;82.00;model-based;https://arxiv.org/abs/2311.06233;8
|
7 |
imdb;;GPT-3.5;model;;;55.00;model-based;https://arxiv.org/abs/2311.06233;8
|
8 |
ag_news;;GPT-4;model;;;91.00;model-based;https://arxiv.org/abs/2311.06233;8
|
|
|
3 |
gsm8k;;GPT-4;model;79.00;;;model-based;https://arxiv.org/abs/2311.06233;8
|
4 |
ucinlp/drop;;GPT-4;model;;44.00;;model-based;https://arxiv.org/abs/2311.06233;8
|
5 |
openai_humaneval;;GPT-4;model;;;56.71;model-based;https://arxiv.org/abs/2311.06233;8
|
6 |
+
openai_humaneval;;GPT-3.5Turbo0613;model;;;23.79;model-based;https://arxiv.org/pdf/2402.15938;
|
7 |
+
openai_humaneval;;GPT-3.5Turbo1106;model;;;41.47;model-based;https://arxiv.org/pdf/2402.15938;
|
8 |
imdb;;GPT-4;model;;;82.00;model-based;https://arxiv.org/abs/2311.06233;8
|
9 |
imdb;;GPT-3.5;model;;;55.00;model-based;https://arxiv.org/abs/2311.06233;8
|
10 |
ag_news;;GPT-4;model;;;91.00;model-based;https://arxiv.org/abs/2311.06233;8
|