BERT Fine-tuned on MRPC

This model is a fine-tuned version of bert-base-uncased on the MRPC (Microsoft Research Paraphrase Corpus) dataset from the GLUE benchmark. It is designed to determine whether two given sentences are semantically equivalent.

Model description

The model uses the BERT base architecture (12 layers, 768 hidden dimensions, 12 attention heads) and has been fine-tuned specifically for the paraphrase identification task. The output layer predicts whether the input sentence pair expresses the same meaning.

Key specifications:

Base model: bert-base-uncased
Task type: Binary classification (paraphrase/not paraphrase)
Training method: Fine-tuning all layers
Language: English

Intended uses & limitations

Intended uses

Paraphrase detection
Semantic similarity assessment
Question duplicate detection
Content matching
Automated text comparison

Limitations

Only works with English text
Performance may degrade on out-of-domain text
May struggle with complex or nuanced semantic relationships
Limited to comparing pairs of sentences (not longer texts)

Training and evaluation data

The model was trained on the Microsoft Research Paraphrase Corpus (MRPC) from the GLUE benchmark:

Training set: 3,667 sentence pairs
Validation set: 408 sentence pairs
Each pair is labeled as either paraphrase (1) or non-paraphrase (0)
Class distribution: approximately 67.4% positive (paraphrase) and 32.6% negative (non-paraphrase)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Learning rate: 3e-05
Batch size: 8 (train and eval)
Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
LR scheduler: Linear decay
Number of epochs: 3
Max sequence length: 512
Weight decay: 0.01

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
No log	1.0	459	0.3905	0.8382	0.8878
0.5385	2.0	918	0.4275	0.8505	0.8961
0.3054	3.0	1377	0.5471	0.8652	0.9057

Framework versions

Transformers 4.46.2
PyTorch 2.5.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3

Performance analysis

The model achieves strong performance on the MRPC validation set:

Accuracy: 86.52%
F1 Score: 90.57%

These metrics indicate that the model is effective at identifying paraphrases while maintaining a good balance between precision and recall.

Example usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("real-jiakai/bert-base-uncased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("real-jiakai/bert-base-uncased-finetuned-mrpc")

# Example function
def check_paraphrase(sentence1, sentence2):
    inputs = tokenizer(sentence1, sentence2, return_tensors="pt", padding=True, truncation=True)
    outputs = model(**inputs)
    prediction = outputs.logits.argmax().item()
    return "Paraphrase" if prediction == 1 else "Not paraphrase"

# Example usage
sentence1 = "The cat sat on the mat."
sentence2 = "A cat was sitting on the mat."
result = check_paraphrase(sentence1, sentence2)
print(f"Result: {result}")

real-jiakai
/

bert-base-uncased-finetuned-mrpc