TinySlime-1.1B-v1.0 / README_en.md
2121-8's picture
Upload 2 files
ca5c70e verified
|
raw
history blame
3.14 kB
# TinyLlama-1.1B Intermediate Step Model
This repository contains the pre-trained model `TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T`, fine-tuned on the `augmxnt/shisa-pretrain-en-ja-v1` dataset. The model has been trained on 5.5 billion tokens, offering a robust performance for various natural language processing (NLP) tasks.
## Model Overview
- **Base Model**: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
- **Training Dataset**: augmxnt/shisa-pretrain-en-ja-v1
- **Training Tokens**: 5.5 billion
This model is designed for a range of NLP tasks, including but not limited to language translation, text generation, and sentiment analysis. It is particularly effective in handling bilingual content in English and Japanese.
## Usage
### Installation
To use this model, you'll need to install the `transformers` library from Hugging Face:
```bash
pip install transformers
```
### Loading the Model
You can load the model using the `transformers` library as follows:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
```
### Generating Text
Here is an example of how to generate text using the loaded model:
```python
input_text = "Translate the following English text to Japanese: Hello, how are you?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
# Generate text
outputs = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
## Model Performance
This model has been trained on a diverse dataset to ensure high performance across various tasks. Below are some benchmark results:
- **Language Translation**: Achieves high accuracy in translating between English and Japanese.
- **Text Generation**: Produces coherent and contextually relevant text for prompts in both languages.
- **Sentiment Analysis**: Effectively classifies sentiments with a high degree of accuracy.
## Fine-Tuning
For users interested in fine-tuning this model on their own datasets, the following code snippet provides a starting point:
```python
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=10_000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=my_train_dataset,
eval_dataset=my_eval_dataset,
)
trainer.train()
```
Replace `my_train_dataset` and `my_eval_dataset` with your own dataset objects.
## Acknowledgements
This model was built upon the work of the TinyLlama project and trained using the `augmxnt/shisa-pretrain-en-ja-v1` dataset. We acknowledge their contributions to the NLP community.
## License
This model is released under the [MIT License](LICENSE).
## Contact
For questions or feedback, please open an issue in this repository or contact us at [[email protected]].