Edit model card

GATE-AraBert-v0

This is a General Arabic Text Embedding trained using SentenceTransformers in a multi-task setup. The system trains on the AllNLI and on the STS dataset.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Datasets:
- all-nli
- sts
Language: ar

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/GATE-AraBert-v0")
# Run inference
sentences = [
    'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.',
    'لقد مات الكلب',
    'شخص طويل القامة',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.8384
spearman_cosine	0.8389
pearson_manhattan	0.8248
spearman_manhattan	0.8329
pearson_euclidean	0.825
spearman_euclidean	0.8337
pearson_dot	0.8072
spearman_dot	0.8098
pearson_max	0.8384
spearman_max	0.8389

Semantic Similarity

Dataset: sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7908
spearman_cosine	0.7893
pearson_manhattan	0.7923
spearman_manhattan	0.7947
pearson_euclidean	0.7904
spearman_euclidean	0.7934
pearson_dot	0.7404
spearman_dot	0.7354
pearson_max	0.7923
spearman_max	0.7947

Downloads last month: 609

Safetensors

Model size

135M params

Tensor type

F32

Inference Examples

Sentence Similarity

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Omartificial-Intelligence-Space/GATE-AraBert-v0

Base model

aubmindlab/bert-base-arabertv02

Finetuned

Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka

Finetuned

(1)

this model

Dataset used to train Omartificial-Intelligence-Space/GATE-AraBert-v0

Collection including Omartificial-Intelligence-Space/GATE-AraBert-v0

GATE: General Arabic Text Embedding Models

Collection

This Collection includes GATE Models, a new series of trained sentence transformer models trained on multi-task datasets and using different losses. • 2 items • Updated Aug 29

Evaluation results

Pearson Cosine on sts dev
self-reported

0.838
Spearman Cosine on sts dev
self-reported

0.839
Pearson Manhattan on sts dev
self-reported

0.825
Spearman Manhattan on sts dev
self-reported

0.833
Pearson Euclidean on sts dev
self-reported

0.825
Spearman Euclidean on sts dev
self-reported

0.834
Pearson Dot on sts dev
self-reported

0.807
Spearman Dot on sts dev
self-reported

0.810
Pearson Max on sts dev
self-reported

0.838
Spearman Max on sts dev
self-reported

0.839

View on Papers With Code