jinaai/jina-embeddings-v2-base-es · A practical use case from your great job for the spanish language

Model

In this project, I developed an efficient and fast user intent classification system by leveraging an ensemble of logistic regression, SVM, and k-NN classifiers. The model uses text embeddings from the jinaai/jina-embeddings-v2-base-es model to achieve high accuracy while being significantly more resource-efficient compared to large language models (LLMs).

Motivation

Detecting user intent is crucial for retrieve-augmented generation (RAG) pipelines in conversational AI. These pipelines often require multiple calls to LLMs and sophisticated prompt engineering, which can be both time-consuming and costly. Our approach seeks to drastically reduce the time and number of calls to LLMs, providing a fast and cost-effective solution without compromising accuracy. This model focuses on classifying requests and questions in Spanish, supporting intents like censorship, others, lead, contact, directions, meet, negation, affirmation, and casual chat.

Results

Intent	Precision	Recall	F1-Score	Support
Afirmación	1.00	1.00	1.00	14
Censura	0.99	1.00	0.99	539
Charla	1.00	0.67	0.80	15
Contacto	0.97	1.00	0.99	38
Direcciones	1.00	1.00	1.00	71
Lead	0.99	0.99	0.99	140
Meet	0.97	1.00	0.98	29
Negación	1.00	0.94	0.97	18
Otros	0.98	0.97	0.98	171
Micro Avg	0.99	0.99	0.99	1035
Macro Avg	0.99	0.95	0.97	1035
Weighted Avg	0.99	0.99	0.99	1035

HF Model link

Great job Jina AI your embeddings Rockz!