Update contamination_report.csv
Browse files## What are you reporting:
- [ ] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
**Evaluation dataset(s)**: openai_humaneval
**Contaminated model(s)**: gpt-3.5-turbo-1106, gpt-3.5-turbo-0613
**Contaminated split(s)**: 41.47%, 23.79%
## Briefly describe your method to detect data contamination
- [ ] Model-based approach
#### Model-based approaches
The cited paper highlights how ChatGPT, when tested with the HumanEval dataset, shows high contamination levels. This is evident from the high Average Peak and Leak Ratios, especially compared to the clean CodeForces2305 dataset where ChatGPT's performance drops. The TED method proves effective in identifying and mitigating these contamination issues. The values can be verified from Table 5 of the cited paper.
## Citation
Is there a paper that reports the data contamination or describes the method used to detect data contamination?
URL: `https://arxiv.org/pdf/2402.15938`
Citation: `@misc{dong2024generalization,
title={Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models},
author={Yihong Dong and Xue Jiang and Huanyu Liu and Zhi Jin and Ge Li},
year={2024},
eprint={2402.15938},
archivePrefix={arXiv},
primaryClass={cs.CL}
}`
*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Suryansh Sharma
- Institution: Indian Institute of Technology Kharagpur
- Email: [email protected]
- contamination_report.csv +3 -0
@@ -707,3 +707,6 @@ zest;;EleutherAI/pile;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;
|
|
707 |
zest;;allenai/c4;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
|
708 |
zest;;oscar-corpus/OSCAR-2301;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
|
709 |
zest;;togethercomputer/RedPajama-Data-V2;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
|
|
|
|
|
|
|
|
707 |
zest;;allenai/c4;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
|
708 |
zest;;oscar-corpus/OSCAR-2301;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
|
709 |
zest;;togethercomputer/RedPajama-Data-V2;;corpus;;;0.0;data-based;https://arxiv.org/abs/2310.20707;2
|
710 |
+
|
711 |
+
openai_humaneval;;GPT-3.5;turbo-0613;model;;;23.79;model-based;https://arxiv.org/pdf/2402.15938;
|
712 |
+
openai_humaneval;;GPT-3.5;turbo-1106;model;;;41.47;model-based;https://arxiv.org/pdf/2402.15938;
|