Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
language: fr
|
4 |
+
pipeline_tag: "token-classification"
|
5 |
+
widget:
|
6 |
+
- text: "je voudrais réserver une chambre à paris pour demain et lundi"
|
7 |
+
- text: "d'accord pour l'hôtel à quatre vingt dix euros la nuit"
|
8 |
+
- text: "deux nuits s'il vous plait"
|
9 |
+
- text: "dans un hôtel avec piscine à marseille"
|
10 |
+
tags:
|
11 |
+
- bert
|
12 |
+
- flaubert
|
13 |
+
- natural language understanding
|
14 |
+
- NLU
|
15 |
+
- spoken language understanding
|
16 |
+
- SLU
|
17 |
+
- understanding
|
18 |
+
- MEDIA
|
19 |
+
---
|
20 |
+
|
21 |
+
# vpelloin/MEDIA_NLU-flaubert_oral_mixed
|
22 |
+
This is a Natural Language Understanding (NLU) model for the French [MEDIA benchmark](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/).
|
23 |
+
It maps each input words into outputs concepts tags (76 available).
|
24 |
+
|
25 |
+
This model is trained with [`flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed) as it's inital checkpoint.
|
26 |
+
|
27 |
+
Available MEDIA NLU models:
|
28 |
+
- [MEDIA_NLU-flaubert_base_cased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_cased): model trained with [`flaubert_base_cased`](https://huggingface.co/flaubert/flaubert_base_cased) as it's inital checkpoint
|
29 |
+
- [MEDIA_NLU-flaubert_base_uncased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_uncased): model trained with [`flaubert_base_uncased`](https://huggingface.co/flaubert/flaubert_base_uncased) as it's inital checkpoint
|
30 |
+
- [MEDIA_NLU-flaubert_oral_ft](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_ft): model trained with [`flaubert-oral-ft`](https://huggingface.co/nherve/flaubert-oral-ft) as it's inital checkpoint
|
31 |
+
- [MEDIA_NLU-flaubert_oral_mixed](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_mixed): model trained with [`flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed) as it's inital checkpoint
|
32 |
+
- [MEDIA_NLU-flaubert_oral_asr](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr): model trained with [`flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as it's inital checkpoint
|
33 |
+
- [MEDIA_NLU-flaubert_oral_asr_nb](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr_nb): model trained with [`flaubert-oral-asr_nb`](https://huggingface.co/nherve/flaubert-oral-asr_nb) as it's inital checkpoint
|
34 |
+
|
35 |
+
## Usage with Pipeline
|
36 |
+
```python
|
37 |
+
from transformers import pipeline
|
38 |
+
|
39 |
+
generator = pipeline(model="vpelloin/MEDIA_NLU-flaubert_oral_mixed", task="token-classification")
|
40 |
+
sentences = [
|
41 |
+
"je voudrais réserver une chambre à paris pour demain et lundi",
|
42 |
+
"d'accord pour l'hôtel à quatre vingt dix euros la nuit",
|
43 |
+
"deux nuits s'il vous plait",
|
44 |
+
"dans un hôtel avec piscine à marseille"
|
45 |
+
]
|
46 |
+
|
47 |
+
for sentence in sentences:
|
48 |
+
print([(tok['word'], tok['entity']) for tok in generator(sentence)])
|
49 |
+
```
|
50 |
+
## Usage with AutoTokenizer/AutoModel
|
51 |
+
```python
|
52 |
+
from transformers import (
|
53 |
+
AutoTokenizer,
|
54 |
+
AutoModelForTokenClassification
|
55 |
+
)
|
56 |
+
tokenizer = AutoTokenizer.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_mixed")
|
57 |
+
model = AutoModelForTokenClassification.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_mixed")
|
58 |
+
|
59 |
+
sentences = [
|
60 |
+
"je voudrais réserver une chambre à paris pour demain et lundi",
|
61 |
+
"d'accord pour l'hôtel à quatre vingt dix euros la nuit",
|
62 |
+
"deux nuits s'il vous plait",
|
63 |
+
"dans un hôtel avec piscine à marseille"
|
64 |
+
]
|
65 |
+
inputs = tokenizer(sentences, padding=True, return_tensors='pt')
|
66 |
+
outptus = model(**inputs).logits
|
67 |
+
print([[model.config.id2label[i] for i in b] for b in outptus.argmax(dim=-1).tolist()])
|
68 |
+
```
|
69 |
+
|
70 |
+
## Reference
|
71 |
+
|
72 |
+
If you use this model for your scientific publication, or if you find the resources in this repository useful, please cite the [following paper](http://doi.org/10.21437/Interspeech.2022-352):
|
73 |
+
```
|
74 |
+
@inproceedings{pelloin22_interspeech,
|
75 |
+
author={Valentin Pelloin and Franck Dary and Nicolas Hervé and Benoit Favre and Nathalie Camelin and Antoine LAURENT and Laurent Besacier},
|
76 |
+
title={ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks},
|
77 |
+
year=2022,
|
78 |
+
booktitle={Proc. Interspeech 2022},
|
79 |
+
pages={3453--3457},
|
80 |
+
doi={10.21437/Interspeech.2022-352}
|
81 |
+
}
|
82 |
+
```
|
83 |
+
|