Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +83 -0

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+---
+language: fr
+pipeline_tag: "token-classification"
+widget:
+ - text: "je voudrais réserver une chambre à paris pour demain et lundi"
+ - text: "d'accord pour l'hôtel à quatre vingt dix euros la nuit"
+ - text: "deux nuits s'il vous plait"
+ - text: "dans un hôtel avec piscine à marseille"
+tags:
+- bert
+- flaubert
+- natural language understanding
+- NLU
+- spoken language understanding
+- SLU
+- understanding
+- MEDIA
+---
+# vpelloin/MEDIA_NLU-flaubert_oral_mixed
+This is a Natural Language Understanding (NLU) model for the French [MEDIA benchmark](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/).
+It maps each input words into outputs concepts tags (76 available).
+This model is trained with [`flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed) as it's inital checkpoint.
+Available MEDIA NLU models:
+- [MEDIA_NLU-flaubert_base_cased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_cased): model trained with [`flaubert_base_cased`](https://huggingface.co/flaubert/flaubert_base_cased) as it's inital checkpoint
+- [MEDIA_NLU-flaubert_base_uncased](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_base_uncased): model trained with [`flaubert_base_uncased`](https://huggingface.co/flaubert/flaubert_base_uncased) as it's inital checkpoint
+- [MEDIA_NLU-flaubert_oral_ft](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_ft): model trained with [`flaubert-oral-ft`](https://huggingface.co/nherve/flaubert-oral-ft) as it's inital checkpoint
+- [MEDIA_NLU-flaubert_oral_mixed](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_mixed): model trained with [`flaubert-oral-mixed`](https://huggingface.co/nherve/flaubert-oral-mixed) as it's inital checkpoint
+- [MEDIA_NLU-flaubert_oral_asr](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr): model trained with [`flaubert-oral-asr`](https://huggingface.co/nherve/flaubert-oral-asr) as it's inital checkpoint
+- [MEDIA_NLU-flaubert_oral_asr_nb](https://huggingface.co/vpelloin/MEDIA_NLU-flaubert_oral_asr_nb): model trained with [`flaubert-oral-asr_nb`](https://huggingface.co/nherve/flaubert-oral-asr_nb) as it's inital checkpoint
+## Usage with Pipeline
+```python
+from transformers import pipeline
+generator = pipeline(model="vpelloin/MEDIA_NLU-flaubert_oral_mixed", task="token-classification")
+sentences = [
+    "je voudrais réserver une chambre à paris pour demain et lundi",
+    "d'accord pour l'hôtel à quatre vingt dix euros la nuit",
+    "deux nuits s'il vous plait",
+    "dans un hôtel avec piscine à marseille"
+ ]
+for sentence in sentences:
+    print([(tok['word'], tok['entity']) for tok in generator(sentence)])
+```
+## Usage with AutoTokenizer/AutoModel
+```python
+from transformers import (
+    AutoTokenizer,
+    AutoModelForTokenClassification
+)
+tokenizer = AutoTokenizer.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_mixed")
+model = AutoModelForTokenClassification.from_pretrained("vpelloin/MEDIA_NLU-flaubert_oral_mixed")
+sentences = [
+    "je voudrais réserver une chambre à paris pour demain et lundi",
+    "d'accord pour l'hôtel à quatre vingt dix euros la nuit",
+    "deux nuits s'il vous plait",
+    "dans un hôtel avec piscine à marseille"
+ ]
+inputs = tokenizer(sentences, padding=True, return_tensors='pt')
+outptus = model(**inputs).logits
+print([[model.config.id2label[i] for i in b] for b in outptus.argmax(dim=-1).tolist()])
+```
+## Reference
+If you use this model for your scientific publication, or if you find the resources in this repository useful, please cite the [following paper](http://doi.org/10.21437/Interspeech.2022-352):
+```
+@inproceedings{pelloin22_interspeech,
+  author={Valentin Pelloin and Franck Dary and Nicolas Hervé and Benoit Favre and Nathalie Camelin and Antoine LAURENT and Laurent Besacier},
+  title={ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks},
+  year=2022,
+  booktitle={Proc. Interspeech 2022},
+  pages={3453--3457},
+  doi={10.21437/Interspeech.2022-352}
+}
+```