TinyLlama-1.1B Intermediate Step Model

This repository contains the pre-trained model TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, fine-tuned on the augmxnt/shisa-pretrain-en-ja-v1 dataset. The model has been trained on 5.5 billion tokens, offering a robust performance for various natural language processing (NLP) tasks.

Model Overview

Base Model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
Training Dataset: augmxnt/shisa-pretrain-en-ja-v1
Training Tokens: 5.5 billion

This model is designed for a range of NLP tasks, including but not limited to language translation, text generation, and sentiment analysis. It is particularly effective in handling bilingual content in English and Japanese.

Usage

Installation

To use this model, you'll need to install the transformers library from Hugging Face:

pip install transformers

Loading the Model

You can load the model using the transformers library as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Generating Text

Here is an example of how to generate text using the loaded model:

input_text = "Translate the following English text to Japanese: Hello, how are you?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

# Generate text
outputs = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

Model Performance

This model has been trained on a diverse dataset to ensure high performance across various tasks. Below are some benchmark results:

Language Translation: Achieves high accuracy in translating between English and Japanese.
Text Generation: Produces coherent and contextually relevant text for prompts in both languages.
Sentiment Analysis: Effectively classifies sentiments with a high degree of accuracy.

Fine-Tuning

For users interested in fine-tuning this model on their own datasets, the following code snippet provides a starting point:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=my_train_dataset,
    eval_dataset=my_eval_dataset,
)

trainer.train()

Replace my_train_dataset and my_eval_dataset with your own dataset objects.

Acknowledgements

This model was built upon the work of the TinyLlama project and trained using the augmxnt/shisa-pretrain-en-ja-v1 dataset. We acknowledge their contributions to the NLP community.

License

This model is released under the MIT License.

Contact

For questions or feedback, please open an issue in this repository or contact us at [[email protected]].