Summarization models
Collection
2 items
•
Updated
Model forked from ru-bart-large which is smaller version of the facebook/mbart-large-50 with only Russian and English embeddings.
All 'train' subsets was concatenated and shuffled with seed 1000 - 7
.
Train subset = 155678 rows.
Evaluation on 10% of concatenated 'validation' subsets = 1458 rows.
See WandB logs.
See report at REPORT WIP.
from transformers import pipeline
pipe = pipeline('summarization', model='d0rj/ru-mbart-large-summ')
pipe(text)
import torch
from transformers import AutoTokenizer, MBartModel
tokenizer = AutoTokenizer.from_pretrained('d0rj/ru-mbart-large-summ')
model = MBartModel.from_pretrained('d0rj/ru-mbart-large-summ')
inputs = tokenizer('Всё в порядке, мимо двигал Утром прозвенел будильник', return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state