--- language: ru tags: - spam-detection - text-classification - russian license: mit datasets: - RUSpam/spam_dataset_v4 metrics: - F1 model-index: - name: spam_deberta_v4 results: - task: name: Классификация текста type: text-classification dataset: name: RUSpam/russian_spam_dataset type: RUSpam/russian_spam_dataset metrics: - name: F1 type: F1 value: 0.9897 --- # RUSpam/spam_deberta_v4 ## Описание Это модель определения спама, основанная на архитектуре Deberta, дообученная на русскоязычных данных о спаме. Она классифицирует текст как спам или не спам. ## Использование ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_path = "RUSpam/spam_deberta_v4" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForSequenceClassification.from_pretrained(model_path) def predict(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class = torch.argmax(logits, dim=1).item() return "Спам" if predicted_class == 1 else "Не спам" text = "Ваш текст для проверки здесь" result = predict(text) print(f"Результат: {result}") ``` # Цитирование ``` @MISC{RUSpam/spam_deberta_v4, author = {Denis Petrov, Kirill Fedko (Neurospacex), Sergey Yalovegin}, title = {Russian Spam Classification Model}, url = {https://huggingface.co/RUSpam/spam_deberta_v4/}, year = 2024 } ```