Spaces:
Running
Running
Fix format & Add BigCodeBench
Browse files
@lvwerra
@loubnabnl
Could you please help merge this PR? Thanks!
README.md
CHANGED
@@ -41,6 +41,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
41 |
- [StarCoder2 Search](https://huggingface.co/spaces/bigcode/search-v2): Full-text search code in the pretraining dataset.
|
42 |
- [StarCoder2 Membership Test](https://stack-v2.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
43 |
</details>
|
|
|
44 |
---
|
45 |
<details>
|
46 |
<summary>
|
@@ -52,6 +53,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
52 |
- [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
|
53 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
54 |
</details>
|
|
|
55 |
---
|
56 |
<details>
|
57 |
<summary>
|
@@ -82,17 +84,34 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
82 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
83 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
84 |
</details>
|
|
|
85 |
---
|
86 |
<details>
|
87 |
<summary>
|
88 |
<b><font size="+1">📑The Stack</font></b>
|
89 |
</summary>
|
90 |
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
91 |
-
|
92 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
93 |
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
94 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
</details>
|
|
|
96 |
---
|
97 |
<details>
|
98 |
<summary>
|
@@ -111,6 +130,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
111 |
- [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
|
112 |
- [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
|
113 |
</details>
|
|
|
114 |
---
|
115 |
<details>
|
116 |
<summary>
|
@@ -126,6 +146,7 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
126 |
- [Astraios-7B](https://huggingface.co/collections/bigcode/astraios-7b-65788b509c5c26f96c08d576): Collection of StarCoderBase-7B models instruction tuned on CommitPackFT + OASST with 7 method.
|
127 |
- [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
|
128 |
</details>
|
|
|
129 |
---
|
130 |
<details>
|
131 |
<summary>
|
|
|
41 |
- [StarCoder2 Search](https://huggingface.co/spaces/bigcode/search-v2): Full-text search code in the pretraining dataset.
|
42 |
- [StarCoder2 Membership Test](https://stack-v2.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
43 |
</details>
|
44 |
+
|
45 |
---
|
46 |
<details>
|
47 |
<summary>
|
|
|
53 |
- [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
|
54 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
55 |
</details>
|
56 |
+
|
57 |
---
|
58 |
<details>
|
59 |
<summary>
|
|
|
84 |
- [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
|
85 |
- [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
86 |
</details>
|
87 |
+
|
88 |
---
|
89 |
<details>
|
90 |
<summary>
|
91 |
<b><font size="+1">📑The Stack</font></b>
|
92 |
</summary>
|
93 |
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
94 |
+
|
95 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
96 |
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
97 |
+
- [Paper](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
98 |
+
</details>
|
99 |
+
|
100 |
+
---
|
101 |
+
<details>
|
102 |
+
<summary>
|
103 |
+
<b><font size="+1">🌸BigCodeBench</font></b>
|
104 |
+
</summary>
|
105 |
+
BigCodeBench is the next generation of HumanEval, benchmarking code generation with diverse function calls and complex instructions.
|
106 |
+
|
107 |
+
- [Github](https://github.com/bigcode-project/bigcodebench): Evaluation tool designed for BigCodeBench.
|
108 |
+
- [HF Leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard): BigCodeBench leaderboard hosted on Hugging Face.
|
109 |
+
- [GP Leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard): BigCodeBench leaderboard hosted on GitHub Pages.
|
110 |
+
- [Dataset](https://huggingface.co/datasets/bigcode/bigcodebench): BigCodeBench dataset.
|
111 |
+
- [Data Viewer](https://huggingface.co/spaces/bigcode/bigcodebench-viewer): Explore BigCodeBench data in an interactive demo.
|
112 |
+
- [Paper](https://github.com/bigcode-bench/bigcode-bench.github.io/blob/main/paper.pdf): Research paper with details about BigCodeBench.
|
113 |
</details>
|
114 |
+
|
115 |
---
|
116 |
<details>
|
117 |
<summary>
|
|
|
130 |
- [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
|
131 |
- [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
|
132 |
</details>
|
133 |
+
|
134 |
---
|
135 |
<details>
|
136 |
<summary>
|
|
|
146 |
- [Astraios-7B](https://huggingface.co/collections/bigcode/astraios-7b-65788b509c5c26f96c08d576): Collection of StarCoderBase-7B models instruction tuned on CommitPackFT + OASST with 7 method.
|
147 |
- [Astraios-15B](https://huggingface.co/collections/bigcode/astraios-15b-65788b7476b6de79781054cc): Collection of StarCoderBase-15B models instruction tuned on CommitPackFT + OASST with 7 method.
|
148 |
</details>
|
149 |
+
|
150 |
---
|
151 |
<details>
|
152 |
<summary>
|