mmularczyk
commited on
Commit
•
daacd76
1
Parent(s):
4b97874
Update model card
Browse files
README.md
CHANGED
@@ -34,9 +34,9 @@ It is worth noting that training a model requires several times more resources t
|
|
34 |
|
35 |
Many consumer computers are equipped with good quality graphic cards that can be used for training a model at one’s own home. This is why we have decided to use a top consumer graphic card - Nvidia’s RTX 4090 24GB VRAM.
|
36 |
|
37 |
-
All the currently available language models have been trained mainly with English corpora with a little bit of other languages, including Polish. The effect is that these models are not the best at dealing with the Polish texts. Even the popular GPT models from OpenAI and Bard from Google often have issues with correct forms. Therefore we have decided to prepare a model based only on the Polish corpus. An additional advantage of using only the Polish corpus is the size of the model - it is better to focus on one language in the case of smaller models.
|
38 |
|
39 |
-
It is important to remember that models are only as good as the data with which they are trained. Given the small size of the model, we trained it with carefully selected texts and instructions. With close collaboration and advice from the [Speakleash](https://speakleash.org) team, our team has prepared over
|
40 |
|
41 |
## Model
|
42 |
|
@@ -44,7 +44,7 @@ APT3-1B-Instruct-v1 has been trained and fine-tuned with the use of an original
|
|
44 |
|
45 |
APT3-1B-Instruct-v1 is an autoregressive language model based on the architecture of a transformer. It has been fine-tuned with 2.5 million instructions, over two epochs, on over 1 billion tokens in total.
|
46 |
|
47 |
-
The training dataset (
|
48 |
|
49 |
### Model description:
|
50 |
|
@@ -56,12 +56,12 @@ The training dataset (the Polish instructions) was created by combining 1.2 mill
|
|
56 |
|
57 |
## Instruction format
|
58 |
|
59 |
-
In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should
|
60 |
|
61 |
E.g.
|
62 |
```
|
63 |
prompt = "<s>[INST] Jakie mamy pory roku? [/INST]"
|
64 |
-
|
65 |
```
|
66 |
|
67 |
### Quickstart
|
@@ -102,7 +102,7 @@ Generated output:
|
|
102 |
|
103 |
## Limitations and Biases
|
104 |
|
105 |
-
APT3-1B-Instruct-v1 model is a quick demonstration that the base model can be easily fine-tuned to achieve
|
106 |
|
107 |
APT3-1B-Instruct-v1 can produce factually incorrect output, and should not be relied on to produce factually accurate information. APT3-1B-Base and APT3-1B-Instruct-v1 were trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that these models could generate lewd, biased or otherwise offensive outputs.
|
108 |
|
@@ -112,7 +112,7 @@ Because of an unclear legal situation, we have decided to publish the model unde
|
|
112 |
|
113 |
## Disclaimer
|
114 |
|
115 |
-
The license
|
116 |
|
117 |
## Citation
|
118 |
Please cite this model using the following format:
|
|
|
34 |
|
35 |
Many consumer computers are equipped with good quality graphic cards that can be used for training a model at one’s own home. This is why we have decided to use a top consumer graphic card - Nvidia’s RTX 4090 24GB VRAM.
|
36 |
|
37 |
+
All the currently available language models have been trained mainly with English corpora with a little bit of other languages, including Polish. The effect is that these models are not the best at dealing with the Polish texts. Even the popular GPT models from OpenAI and Bard from Google often have issues with correct forms. Therefore, we have decided to prepare a model based only on the Polish corpus. An additional advantage of using only the Polish corpus is the size of the model - it is better to focus on one language in the case of smaller models.
|
38 |
|
39 |
+
It is important to remember that models are only as good as the data with which they are trained. Given the small size of the model, we trained it with carefully selected texts and instructions. With close collaboration and advice from the [Speakleash](https://speakleash.org) team, our team has prepared over 285 GB of Polish language text corpus and 2.5 million instructions that have then been processed and used for training the model. Additionally, the unique feature of our model is that it has been trained on the largest amount of text among all available models for the Polish language.
|
40 |
|
41 |
## Model
|
42 |
|
|
|
44 |
|
45 |
APT3-1B-Instruct-v1 is an autoregressive language model based on the architecture of a transformer. It has been fine-tuned with 2.5 million instructions, over two epochs, on over 1 billion tokens in total.
|
46 |
|
47 |
+
The training dataset (instructions in Polish) was created by combining 1.2 million instructions from [Speakleash](https://speakleash.org) and 1.3 million of our private instructions.
|
48 |
|
49 |
### Model description:
|
50 |
|
|
|
56 |
|
57 |
## Instruction format
|
58 |
|
59 |
+
In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should start with the beginning of a sentence token. The generatated completion will be finished by the end-of-sentence token.
|
60 |
|
61 |
E.g.
|
62 |
```
|
63 |
prompt = "<s>[INST] Jakie mamy pory roku? [/INST]"
|
64 |
+
completion = "W polsce mamy 4 pory roku: wiosna, lato, jesień i zima.</s>"
|
65 |
```
|
66 |
|
67 |
### Quickstart
|
|
|
102 |
|
103 |
## Limitations and Biases
|
104 |
|
105 |
+
APT3-1B-Instruct-v1 model is a quick demonstration showing that the base model can be easily fine-tuned to achieve desired performance. It does not have any moderation mechanisms. It should not be used for human-facing interactions without further guardrails and user consent.
|
106 |
|
107 |
APT3-1B-Instruct-v1 can produce factually incorrect output, and should not be relied on to produce factually accurate information. APT3-1B-Base and APT3-1B-Instruct-v1 were trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that these models could generate lewd, biased or otherwise offensive outputs.
|
108 |
|
|
|
112 |
|
113 |
## Disclaimer
|
114 |
|
115 |
+
The license of this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model.
|
116 |
|
117 |
## Citation
|
118 |
Please cite this model using the following format:
|