muhtasham commited on
Commit
3d3a04f
1 Parent(s): 87eab44

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -4
README.md CHANGED
@@ -2,9 +2,17 @@
2
  license: openrail
3
  tags:
4
  - generated_from_trainer
 
 
 
5
  model-index:
6
  - name: santacoder-finetuned-the-stack-assembly
7
  results: []
 
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -12,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
12
 
13
  # santacoder-finetuned-the-stack-assembly
14
 
15
- This model is a fine-tuned version of [bigcode/santacoder](https://huggingface.co/bigcode/santacoder) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
  - eval_loss: 0.7423
18
  - eval_runtime: 14042.2321
@@ -23,15 +31,17 @@ It achieves the following results on the evaluation set:
23
 
24
  ## Model description
25
 
26
- More information needed
 
 
27
 
28
  ## Intended uses & limitations
29
 
30
- More information needed
31
 
32
  ## Training and evaluation data
33
 
34
- More information needed
35
 
36
  ## Training procedure
37
 
 
2
  license: openrail
3
  tags:
4
  - generated_from_trainer
5
+ - code
6
+ - codegen
7
+ - assembly
8
  model-index:
9
  - name: santacoder-finetuned-the-stack-assembly
10
  results: []
11
+ datasets:
12
+ - bigcode/the-stack-dedup
13
+ language:
14
+ - code
15
+ pipeline_tag: text-generation
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
20
 
21
  # santacoder-finetuned-the-stack-assembly
22
 
23
+ This model is a fine-tuned version of [bigcode/santacoder](https://huggingface.co/bigcode/santacoder) on an on The Stack [assembly](https://huggingface.co/datasets/bigcode/the-stack-dedup) dataset.
24
  It achieves the following results on the evaluation set:
25
  - eval_loss: 0.7423
26
  - eval_runtime: 14042.2321
 
31
 
32
  ## Model description
33
 
34
+ The [SantaCoder](https://huggingface.co/bigcode/santacoder) models are a series of 1.1B parameter models trained on the Python, Java, and JavaScript subset of [The Stack (v1.1)](https://huggingface.co/datasets/bigcode/the-stack) (which excluded opt-out requests).
35
+ The main model uses [Multi Query Attention](https://arxiv.org/abs/1911.02150), was trained using near-deduplication and comment-to-code ratio as filtering criteria and using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255).
36
+ In addition, there are several models that were trained on datasets with different filter parameters and with architecture and objective variations.
37
 
38
  ## Intended uses & limitations
39
 
40
+ The predominant language in source is English although other languages are also present. As such the model is capable to generate code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient, contain bugs or exploits.
41
 
42
  ## Training and evaluation data
43
 
44
+ The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the [BigCode Project](https://www.bigcode-project.org/), an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs). The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets. **This is the near-deduplicated version with 3TB data.**
45
 
46
  ## Training procedure
47