davda54 commited on
Commit
cc51a30
1 Parent(s): a4479da

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - 'no'
4
+ - nb
5
+ - nn
6
+ inference: false
7
+ tags:
8
+ - T5
9
+ - NorT5
10
+ - Norwegian
11
+ - encoder-decoder
12
+ license: cc-by-4.0
13
+ ---
14
+
15
+ # NorT5 x-small
16
+
17
+
18
+ ## Other sizes:
19
+ - [NorBERT 3 xs (15M)](https://huggingface.co/ltg/nort5-xs)
20
+ - [NorBERT 3 small (40M)](https://huggingface.co/ltg/nort5-small)
21
+ - [NorBERT 3 base (123M)](https://huggingface.co/ltg/nort5-base)
22
+ - [NorBERT 3 large (323M)](https://huggingface.co/ltg/nort5-large)
23
+
24
+
25
+ ## Example usage
26
+
27
+ This model currently needs a custom wrapper from `modeling_nort5.py`. Then you can use it like this:
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import AutoTokenizer
32
+ from modeling_norbert import NorT5ForConditionalGeneration
33
+
34
+ tokenizer = AutoTokenizer.from_pretrained("path/to/folder")
35
+ t5 = NorT5ForConditionalGeneration.from_pretrained("path/to/folder")
36
+
37
+
38
+ # MASKED LANGUAGE MODELING
39
+
40
+ sentence = "Brukseksempel: Elektrisk oppvarming. Definisjonen på ordet oppvarming er[MASK_0]."
41
+ encoding = tokenizer(sentence)
42
+
43
+ input_tensor = torch.tensor([encoding.input_ids])
44
+ output_tensor = model.generate(input_tensor, decoder_start_token_id=7, eos_token_id=8)
45
+ tokenizer.decode(output_tensor.squeeze(), skip_special_tokens=True)
46
+
47
+ # should output: å varme opp
48
+
49
+
50
+ # PREFIX LANGUAGE MODELING
51
+ # you need to finetune this model or use `nort5-{size}-lm` model, which is finetuned on prefix language modeling
52
+
53
+ sentence = "Brukseksempel: Elektrisk oppvarming. Definisjonen på ordet oppvarming er (Wikipedia) "
54
+ encoding = tokenizer(sentence)
55
+
56
+ input_tensor = torch.tensor([encoding.input_ids])
57
+ output_tensor = model.generate(input_tensor, max_new_tokens=50, num_beams=4, do_sample=False)
58
+ tokenizer.decode(output_tensor.squeeze())
59
+
60
+ # should output: [BOS]ˈoppvarming, det vil si at det skjer en endring i temperaturen i et medium, f.eks. en ovn eller en radiator, slik at den blir varmere eller kaldere, eller at den blir varmere eller kaldere, eller at den blir
61
+ ```