|
|
|
# TinyLlama-1.1B Intermediate Step Model |
|
|
|
This repository contains the pre-trained model `TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T`, fine-tuned on the `augmxnt/shisa-pretrain-en-ja-v1` dataset. The model has been trained on 5.5 billion tokens, offering a robust performance for various natural language processing (NLP) tasks. |
|
|
|
## Model Overview |
|
|
|
- **Base Model**: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T |
|
- **Training Dataset**: augmxnt/shisa-pretrain-en-ja-v1 |
|
- **Training Tokens**: 5.5 billion |
|
|
|
This model is designed for a range of NLP tasks, including but not limited to language translation, text generation, and sentiment analysis. It is particularly effective in handling bilingual content in English and Japanese. |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
To use this model, you'll need to install the `transformers` library from Hugging Face: |
|
|
|
```bash |
|
pip install transformers |
|
``` |
|
|
|
### Loading the Model |
|
|
|
You can load the model using the `transformers` library as follows: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
``` |
|
|
|
### Generating Text |
|
|
|
Here is an example of how to generate text using the loaded model: |
|
|
|
```python |
|
input_text = "Translate the following English text to Japanese: Hello, how are you?" |
|
input_ids = tokenizer.encode(input_text, return_tensors='pt') |
|
|
|
# Generate text |
|
outputs = model.generate(input_ids, max_length=50, num_return_sequences=1) |
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(generated_text) |
|
``` |
|
|
|
## Model Performance |
|
|
|
This model has been trained on a diverse dataset to ensure high performance across various tasks. Below are some benchmark results: |
|
|
|
- **Language Translation**: Achieves high accuracy in translating between English and Japanese. |
|
- **Text Generation**: Produces coherent and contextually relevant text for prompts in both languages. |
|
- **Sentiment Analysis**: Effectively classifies sentiments with a high degree of accuracy. |
|
|
|
## Fine-Tuning |
|
|
|
For users interested in fine-tuning this model on their own datasets, the following code snippet provides a starting point: |
|
|
|
```python |
|
from transformers import Trainer, TrainingArguments |
|
|
|
training_args = TrainingArguments( |
|
output_dir='./results', |
|
num_train_epochs=3, |
|
per_device_train_batch_size=4, |
|
save_steps=10_000, |
|
save_total_limit=2, |
|
) |
|
|
|
trainer = Trainer( |
|
model=model, |
|
args=training_args, |
|
train_dataset=my_train_dataset, |
|
eval_dataset=my_eval_dataset, |
|
) |
|
|
|
trainer.train() |
|
``` |
|
|
|
Replace `my_train_dataset` and `my_eval_dataset` with your own dataset objects. |
|
|
|
## Acknowledgements |
|
|
|
This model was built upon the work of the TinyLlama project and trained using the `augmxnt/shisa-pretrain-en-ja-v1` dataset. We acknowledge their contributions to the NLP community. |
|
|
|
## License |
|
|
|
This model is released under the [MIT License](LICENSE). |
|
|
|
## Contact |
|
|
|
For questions or feedback, please open an issue in this repository or contact us at [[email protected]]. |
|
|