Typesense Built-in Embedding Models

This repository holds all the built-in ML models supported by Typesense for semantic search currently.

If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).

Usage

Here's an example of how to specify the model to use for auto-embedding generation when creating a collection in Typesense:

curl -X POST \
  'http://localhost:8108/collections' \
  -H 'Content-Type: application/json' \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -d '{
        "name": "products",
        "fields": [
          {
            "name": "product_name",
            "type": "string"
          },
          {
            "name": "embedding",
            "type": "float[]",
            "embed": {
              "from": [
                "product_name"
              ],
              "model_config": {
                "model_name": "ts/all-MiniLM-L12-v2"
              }
            }
          }
        ]
      }'

Replace all-MiniLM-L12-v2 with any model name from this repository.

Here's a detailed step-by-step article with more information: https://typesense.org/docs/guide/semantic-search.html

Contributing

If you have a model that you would like to add to our supported list, you can convert it to the ONNX format and create a Pull Request (PR) to include it. (See below for instructions).

Convert a model to ONNX format

Converting a Hugging Face Transformers Model

To convert any model from Hugging Face to ONNX format, you can follow the instructions in this link using the optimum-cli.

Converting a PyTorch Model

If you have a PyTorch model, you can use the torch.onnx APIs to convert it to the ONNX format. More information on the conversion process can be found here.

Converting a Tensorflow Model

For Tensorflow models, you can utilize the tf2onnx tool to convert them to the ONNX format. Detailed guidance on this conversion can be found here.

Creating model config

Before submitting your ONNX model through a PR, you need to organize the necessary files under a folder with the model's name. Ensure that your model configuration adheres to the following structure:

Model File: The ONNX model file.
Vocab File: The vocabulary file required for the model.

Model Config File: Named as config.json, this file should contain the following keys:

Key	Description	Optional
model_md5	MD5 checksum of model file as string	No
vocab_md5	MD5 checksum of vocab file as string	No
model_type	Model type (currently only `bert` and `xlm_roberta` supported)	No
vocab_file_name	File name of vocab file	No
indexing_prefix	Prefix to be added before embedding documents	Yes
query_prefix	Prefix to be added before embedding queries	Yes

Please make sure that the information in the configuration file is accurate and complete before submitting your PR.

We appreciate your contributions to expand our collection of supported embedding models!