Formal Language T5 Model

This model is fine-tuned from T5-base for formal language correction and text formalization.

Model Description

Model Type: T5-base fine-tuned
Language: English
Task: Text Formalization and Grammar Correction
License: Apache 2.0
Base Model: t5-base

Intended Uses & Limitations

Intended Uses

Converting informal text to formal language
Improving text professionalism
Grammar correction
Business communication enhancement
Academic writing improvement

Limitations

Works best with English text
Maximum input length: 128 tokens
May not preserve specific domain terminology
Best suited for business and academic contexts

Usage

from transformers import AutoModelForSeq2SeqGeneration, AutoTokenizer

model = AutoModelForSeq2SeqGeneration.from_pretrained("renix-codex/formal-lang-rxcx-model")
tokenizer = AutoTokenizer.from_pretrained("renix-codex/formal-lang-rxcx-model")

# Example usage
text = "make formal: hey whats up"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
formal_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Example Inputs and Outputs

Informal Input	Formal Output
"hey whats up"	"Hello, how are you?"
"gonna be late for meeting"	"I will be late for the meeting."
"this is kinda cool"	"This is quite impressive."

Training

The model was trained on the Grammarly/COEDIT dataset with the following specifications:

Base Model: T5-base
Training Hardware: A100 GPU
Sequence Length: 128 tokens
Input Format: "make formal: [informal text]"

License

Apache License 2.0

Citation

@misc{formal-lang-rxcx-model,
    author = {renix-codex},
    title = {Formal Language T5 Model},
    year = {2024},
    publisher = {HuggingFace},
    journal = {HuggingFace Model Hub},
    url = {https://huggingface.co/renix-codex/formal-lang-rxcx-model}
}

Developer

Model developed by renix-codex

Ethical Considerations

This model is intended to assist in formal writing while maintaining the original meaning of the text. Users should be aware that:

The model may alter the tone of personal or culturally specific expressions
It should be used as a writing aid rather than a replacement for human judgment
The output should be reviewed for accuracy and appropriateness

Updates and Versions

Initial Release - February 2024

Base implementation with T5-base
Trained on Grammarly/COEDIT dataset
Optimized for formal language conversion

renix-codex
/

formal-lang-rxcx-model