Spaces:
Running
Running
Add description to card metadata
#1
by
julien-c
HF staff
- opened
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
---
|
2 |
title: BLEU
|
3 |
-
emoji: 🤗
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
sdk: gradio
|
@@ -8,10 +8,44 @@ sdk_version: 3.0.2
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
11 |
-
- evaluate
|
12 |
-
- metric
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
# Metric Card for BLEU
|
16 |
|
17 |
|
|
|
1 |
---
|
2 |
title: BLEU
|
3 |
+
emoji: 🤗
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
sdk: gradio
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
11 |
+
- evaluate
|
12 |
+
- metric
|
13 |
+
description: >-
|
14 |
+
BLEU (bilingual evaluation understudy) is an algorithm for evaluating the
|
15 |
+
quality of text which has been machine-translated from one natural language to
|
16 |
+
another.
|
17 |
+
|
18 |
+
Quality is considered to be the correspondence between a machine's output and
|
19 |
+
that of a human: "the closer a machine translation is to a professional human
|
20 |
+
translation,
|
21 |
+
|
22 |
+
the better it is" – this is the central idea behind BLEU. BLEU was one of the
|
23 |
+
first metrics to claim a high correlation with human judgements of quality,
|
24 |
+
and
|
25 |
+
|
26 |
+
remains one of the most popular automated and inexpensive metrics.
|
27 |
+
|
28 |
+
|
29 |
+
Scores are calculated for individual translated segments—generally
|
30 |
+
sentences—by comparing them with a set of good quality reference translations.
|
31 |
+
|
32 |
+
Those scores are then averaged over the whole corpus to reach an estimate of
|
33 |
+
the translation's overall quality. Intelligibility or grammatical correctness
|
34 |
|
35 |
+
are not taken into account[citation needed].
|
36 |
+
|
37 |
+
|
38 |
+
BLEU's output is always a number between 0 and 1. This value indicates how
|
39 |
+
similar the candidate text is to the reference texts, with values closer to 1
|
40 |
+
|
41 |
+
representing more similar texts. Few human translations will attain a score of
|
42 |
+
1, since this would indicate that the candidate is identical to one of the
|
43 |
+
|
44 |
+
reference translations. For this reason, it is not necessary to attain a score
|
45 |
+
of 1. Because there are more opportunities to match, adding additional
|
46 |
+
|
47 |
+
reference translations will increase the BLEU score.
|
48 |
+
---
|
49 |
# Metric Card for BLEU
|
50 |
|
51 |
|