intel-optimized-model-for-embeddings-int8-v1
This is a text embedding model model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. For sample code that uses this model in a torch serve container see Intel-Optimized-Container-for-Embeddings. The model was quantized using static quantization from the Intel Neural Compressor library.
Usage
Install the required packages:
pip install -U torch==2.3.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu
pip install -U transformers==4.42.4 intel-extension-for-pytorch==2.3.100
Use the following example below to load the model with the transformers library, tokenize the text, run the model, and apply pooling to the output.
import os
import torch
from transformers import AutoTokenizer, AutoModel
import intel_extension_for_pytorch as ipex
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded,
1) / torch.clamp(input_mask_expanded.sum(1),
min=1e-9)
# load model
tokenizer = AutoTokenizer.from_pretrained('Intel/intel-optimized-model-for-embeddings-int8-v1')
file_name = "pytorch_model.bin"
model_file_path = os.path.join(model_dir, file_name)
model = torch.jit.load(model_file_path)
model = ipex.optimize(model, level="O1",auto_kernel_selection=True,
conv_bn_folding=False, dtype=torch.int8)
model = torch.jit.freeze(model.eval())
text = ["This is a test."]
with torch.no_grad(), torch.autocast(device_type='cpu', cache_enabled=False, dtype=torch.int8):
tokenized_text = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
model_output = model(**tokenized_text)
sentence_embeddings = mean_pooling((model_output["last_hidden_state"], ),
tokenized_text['attention_mask'])
embeddings = sentence_embeddings[0].tolist()
# Embeddings output
print(embeddings)
Model Details
Model Description
This model was fine-tuned using the sentence-transformers library based on the BERT-Medium_L-8_H-512_A-8 model using UAE-Large-V1 as a teacher.
Training Datasets
Dataset | Description | License |
---|---|---|
beir/dbpedia-entity | DBpedia-Entity is a standard test collection for entity search over the DBpedia knowledge base. | CC BY-SA 3.0 license |
beir/nq | To help spur development in open-domain question answering, the Natural Questions (NQ) corpus has been created, along with a challenge website based on this data. | CC BY-SA 3.0 license |
beir/scidocs | SciDocs is a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. | CC-BY-SA-4.0 license |
beir/trec-covid | TREC-COVID followed the TREC model for building IR test collections through community evaluations of search systems. | CC-BY-SA-4.0 license |
beir/touche2020 | Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals. | CC BY 4.0 license |
WikiAnswers | The WikiAnswers corpus contains clusters of questions tagged by WikiAnswers users as paraphrases. | MIT |
Cohere/wikipedia-22-12-en-embeddings Dataset | The Cohere/Wikipedia dataset is a processed version of the wikipedia-22-12 dataset. It is English only, and the articles are broken up into paragraphs. | Apache 2.0 |
MLNI | GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems. | MIT |
- Downloads last month
- 6