Add info about the changes in the markdown.
Browse files- markdown.py +2 -1
markdown.py
CHANGED
@@ -60,8 +60,9 @@ Citation: `@inproceedings{...`
|
|
60 |
|
61 |
The [contamination_report.csv](https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database/blob/main/contamination_report.csv) file is a csv filed with `;` delimiters. You will need to update the following columns:
|
62 |
- **Evaluation Dataset**: Name of the evaluation dataset that has has (not) been compromised. If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise proviede the name of the dataset.
|
63 |
-
- **Subset**: Many HuggingFace datasets have different subsets or splits on a single dataset. This field is to define a particular subset of a given dataset. For example, `qnli` subset of `glue`.
|
64 |
- **Contaminated Source**: Name of the model that has been trained with the evaluation dataset or name of the pre-training copora that contains the evaluation datset. If available in the HuggingFace Hub please write the path (e.g. `allenai/OLMo-7B`), otherwise proviede the name of the model/dataset.
|
|
|
65 |
- **Train split**: Percentage of the train split contaminated. 0 means no contamination. 100 means that the dataset has been fully compromised. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
66 |
- **Development split**: Percentage of the development split contaminated. 0 means no contamination. 100 means that the dataset has been fully compromised.
|
67 |
- **Train split**: Percentage of the test split contaminated. 0 means no contamination. 100 means that the dataset has been fully compromised. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
|
|
60 |
|
61 |
The [contamination_report.csv](https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database/blob/main/contamination_report.csv) file is a csv filed with `;` delimiters. You will need to update the following columns:
|
62 |
- **Evaluation Dataset**: Name of the evaluation dataset that has has (not) been compromised. If available in the HuggingFace Hub please write the path (e.g. `uonlp/CulturaX`), otherwise proviede the name of the dataset.
|
63 |
+
- **Subset**: (Optional) Many HuggingFace datasets have different subsets or splits on a single dataset. This field is to define a particular subset of a given dataset. For example, `qnli` subset of `glue`.
|
64 |
- **Contaminated Source**: Name of the model that has been trained with the evaluation dataset or name of the pre-training copora that contains the evaluation datset. If available in the HuggingFace Hub please write the path (e.g. `allenai/OLMo-7B`), otherwise proviede the name of the model/dataset.
|
65 |
+
- **Version**: (Optional) Any information relevant to identify the version of the model or dataset. This information will be shown between parentheses in the Contaminated Source column.
|
66 |
- **Train split**: Percentage of the train split contaminated. 0 means no contamination. 100 means that the dataset has been fully compromised. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|
67 |
- **Development split**: Percentage of the development split contaminated. 0 means no contamination. 100 means that the dataset has been fully compromised.
|
68 |
- **Train split**: Percentage of the test split contaminated. 0 means no contamination. 100 means that the dataset has been fully compromised. If the dataset doesn't have splits, you can consider that the full dataset is a train or test split.
|