File size: 3,568 Bytes
2f92d85 01666d6 2f92d85 01666d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
license: bsd-3-clause
---
# starcoder-toolbench
<!-- Provide a quick summary of what the model is/does. -->
starcoder-toolbench is a 15 billion parameter model used for api based action generation. It is instruction tuned from [starcoder](https://huggingface.co/bigcode/starcoder) on api based action generation datasets.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [SambaNova Systems](https://sambanova.ai/)
- **Model type:** Language Model
- **Language(s):** English
- **License:**
- **Finetuned from model:** [starcoder](https://huggingface.co/bigcode/starcoder)
### Basic Information
<!-- Provide the basic links for the model. -->
- **Paper**: [Link]
- **Github**: [Link]
### Licensing
TBD
## Uses
<details>
<summary>Click to expand</summary>
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
This model is intended for commercial and research use.
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
starcoder-toolbench should NOT be used for purpose other than API based action generation.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.
</details>
---
## How to Get Started with the Model
<details>
<summary>Click to expand</summary>
### Loading in model with Huggingface
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/starcoder-toolbench")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/starcoder-toolbench", device_map="auto", torch_dtype="auto")
```
</details>
---
## Training Details
<details>
<summary>Click to expand</summary>
### Training Data
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
- [Fenglu to add](https://huggingface.co/datasets/laion/OIG)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
We trained starcoder-toolbench on 4 80GB A100 gpu's. We started from [starcoder](https://huggingface.co/bigcode/starcoder). We finetuned it on XXX dataset.
All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link)
### Prompting Style Used For Training
```
```
### Hyperparameters
- Hardware: A100 GPU
- Optimizer: AdamW
- Grad accumulation: 1
- Epochs: 8
- Global Batch size: 16
- Batch tokens: 16 * 2048 = 32,768 tokens
- Learning Rate: 1e-5
- Learning Rate Scheduler: Fixed LR
- Weight decay: 0.1
</details>
## Acknowledgment
## Cite starcoder-toolbench
```
@software{bloomchat,
title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
author = {SambaNova Systems, Together Computer},
url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
month = {5},
year = {2023},
version = {1.0},
}
``` |