mpjan
/

msmarco-distilbert-base-tas-b-mmarco-pt-300k

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

mpjan commited on Nov 5, 2022

Commit

8cabd9a

•

1 Parent(s): 2e47f8b

Update README.md

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -1,17 +1,23 @@
 ---
 pipeline_tag: sentence-similarity
 tags:
 - sentence-transformers
 - feature-extraction
 - sentence-similarity
 - transformers
 ---
-# {MODEL_NAME}
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 <!--- Describe your model here -->
 ## Usage (Sentence-Transformers)
@@ -28,7 +34,7 @@ Then you can use the model like this:
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('{MODEL_NAME}')
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
@@ -51,8 +57,8 @@ def cls_pooling(model_output, attention_mask):
 sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
-tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
-model = AutoModel.from_pretrained('{MODEL_NAME}')
 # Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

 ---
 pipeline_tag: sentence-similarity
+language:
+  - 'pt'
 tags:
 - sentence-transformers
 - feature-extraction
 - sentence-similarity
 - transformers
+datasets:
+- 'unicamp-dl/mmarco'
 ---
+# mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
+It is a fine-tuning of [sentence-transformers/msmarco-distilbert-base-tas-b](https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b) on the first 300k triplets of the Portuguese subset in [unicamp-dl/mmarco](https://huggingface.co/datasets/unicamp-dl/mmarco).
 <!--- Describe your model here -->
 ## Usage (Sentence-Transformers)
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer('mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k')
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
 sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained('mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k')
+model = AutoModel.from_pretrained('mpjan/msmarco-distilbert-base-tas-b-mmarco-pt-300k')
 # Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')