|
--- |
|
datasets: |
|
- rotten_tomatoes |
|
- sst2 |
|
- amazon_polarity |
|
- imdb |
|
- yelp_polarity |
|
language: |
|
- en |
|
tags: |
|
- sentiment |
|
pipeline_tag: text-classification |
|
--- |
|
# SentiCSE |
|
This is a RoBERTa-base model trained on MR dataset and finetuned for sentiment analysis with the Sentiment tasks. |
|
This model is suitable for English. |
|
|
|
+ Reference Paper: SentiCSE (Main of Coling 2024). |
|
+ Git Repo: https://github.com/nayohan/SentiCSE. |
|
|
|
```python |
|
import torch |
|
from scipy.spatial.distance import cosine |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("DILAB-HYU/SentiCSE") |
|
model = AutoModel.from_pretrained("DILAB-HYU/SentiCSE") |
|
|
|
# Tokenize input texts |
|
texts = [ |
|
"The food is delicious.", |
|
"The atmosphere of the restaurant is good.", |
|
"The food at the restaurant is devoid of flavor.", |
|
"The restaurant lacks a good ambiance." |
|
] |
|
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") |
|
|
|
# Get the embeddings |
|
with torch.no_grad(): |
|
embeddings = model(**inputs, output_hidden_states=True, return_dict=True).pooler_output |
|
|
|
# Calculate cosine similarities |
|
# Cosine similarities are in [-1, 1]. Higher means more similar |
|
cosine_sim_0_1 = 1 - cosine(embeddings[0], embeddings[1]) |
|
cosine_sim_0_2 = 1 - cosine(embeddings[0], embeddings[2]) |
|
cosine_sim_0_3 = 1 - cosine(embeddings[0], embeddings[3]) |
|
|
|
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosine_sim_0_1)) |
|
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosine_sim_0_2)) |
|
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[3], cosine_sim_0_3)) |
|
|
|
``` |
|
Output: |
|
|
|
``` |
|
Cosine similarity between "The food is delicious." and "The atmosphere of the restaurant is good." is: 0.942 |
|
Cosine similarity between "The food is delicious." and "The food at the restaurant is devoid of flavor." is: 0.703 |
|
Cosine similarity between "The food is delicious." and "The restaurant lacks a good ambiance." is: 0.656 |
|
``` |
|
|
|
## BibTeX entry and citation info |
|
Please cite the reference paper if you use this model. |
|
|
|
``` |
|
@article{2024SentiCSE, |
|
title={SentiCSE: A Sentiment-aware Contrastive Sentence Embedding Framework with Sentiment-guided Textual Similarity}, |
|
author={Kim, Jaemin and Na, Yohan and Kim, Kangmin and Lee, Sangrak and Chae, Dong-Kyu}, |
|
journal={Proceedings of the 30th International Conference on Computational Linguistics (COLING)}, |
|
year={2024}, |
|
} |
|
``` |