TinyLlama-1.1B Intermediate Step Model
This repository contains the pre-trained model TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
, fine-tuned on the augmxnt/shisa-pretrain-en-ja-v1
dataset. The model has been trained on 5.5 billion tokens, offering a robust performance for various natural language processing (NLP) tasks.
Model Overview
- Base Model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
- Training Dataset: augmxnt/shisa-pretrain-en-ja-v1
- Training Tokens: 5.5 billion
This model is designed for a range of NLP tasks, including but not limited to language translation, text generation, and sentiment analysis. It is particularly effective in handling bilingual content in English and Japanese.
Usage
Installation
To use this model, you'll need to install the transformers
library from Hugging Face:
pip install transformers
Loading the Model
You can load the model using the transformers
library as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
Generating Text
Here is an example of how to generate text using the loaded model:
input_text = "Translate the following English text to Japanese: Hello, how are you?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')
# Generate text
outputs = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Model Performance
This model has been trained on a diverse dataset to ensure high performance across various tasks. Below are some benchmark results:
- Language Translation: Achieves high accuracy in translating between English and Japanese.
- Text Generation: Produces coherent and contextually relevant text for prompts in both languages.
- Sentiment Analysis: Effectively classifies sentiments with a high degree of accuracy.
Fine-Tuning
For users interested in fine-tuning this model on their own datasets, the following code snippet provides a starting point:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=10_000,
save_total_limit=2,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=my_train_dataset,
eval_dataset=my_eval_dataset,
)
trainer.train()
Replace my_train_dataset
and my_eval_dataset
with your own dataset objects.
Acknowledgements
This model was built upon the work of the TinyLlama project and trained using the augmxnt/shisa-pretrain-en-ja-v1
dataset. We acknowledge their contributions to the NLP community.
License
This model is released under the MIT License.
Contact
For questions or feedback, please open an issue in this repository or contact us at [[email protected]].