PetraAI commited on
Commit
4ecd964
1 Parent(s): 2a80237

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -1
README.md CHANGED
@@ -7,4 +7,135 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ ---
11
+ # PETRA
12
+
13
+ ## Overview
14
+
15
+ PETRA is a multilingual dataset for training and evaluating AI systems on a diverse range of tasks across multiple modalities. It contains data in Arabic and English for tasks including translation, summarization, question answering, and more.
16
+
17
+ ## Dataset Structure
18
+
19
+ - Data is separated by language into `/ar` and `/en` directories
20
+ - Within each language directory, data is separated by task into subdirectories
21
+ - Tasks include:
22
+ - Translation
23
+ - Summarization
24
+ - Conversational
25
+ - Feature extraction
26
+ - Zero-shot classification
27
+ - Text generation
28
+ - Fill mask
29
+ - Sentence similarity
30
+ - Text-to-speech
31
+ - Automatic speech recognition
32
+ - Text classification
33
+ - Token classification
34
+ - Table question answering
35
+ - Question answering
36
+ - Text2text generation
37
+ - Audio-to-audio
38
+ - Audio classification
39
+ - Voice activity detection
40
+ - Depth estimation
41
+ - Image classification
42
+ - Object detection
43
+ - Image segmentation
44
+ - Text-to-image
45
+ - Image-to-text
46
+ - Image-to-image
47
+ - Unconditional image generation
48
+ - Reinforcement learning
49
+ - Video classification
50
+ - Robotics
51
+ - Tabular classification
52
+ - Tabular regression
53
+ - Table-to-text
54
+ - Multiple choice
55
+ - Text retrieval
56
+ - Tabular-to-text
57
+ - Text-to-video
58
+ - Time series forecasting
59
+ - Visual question answering
60
+ - Zero-shot image classification
61
+ - Graph ML
62
+
63
+ ## Dataset Tags
64
+
65
+ - code
66
+ - art
67
+ - chemistry
68
+ - biology
69
+ - finance
70
+ - legal
71
+ - music
72
+ - climate
73
+ - medical
74
+
75
+ ## Dataset Size
76
+
77
+ 1M < n < 10M samples
78
+
79
+ ## Licenses
80
+
81
+ Apache 2.0
82
+
83
+ ## Citation
84
+
85
+ If you use this dataset, please cite it as:
86
+
87
+ [cite paper, arXiv, etc]
88
+
89
+ @article{PetraAI2022PetraAI,
90
+ title={PetraAI: A Massive Multilingual Dataset for Machine Learning},
91
+ author={First Last and First Last},
92
+ journal={arXiv},
93
+ year={2022},
94
+ url={https://huggingface.co/datasets/PetraAI/PetraAI}
95
+ }
96
+
97
+ ## Contact
98
+
99
+ For any questions, please reach out to [[email protected]]
100
+
101
+
102
+ # Dataset Cards
103
+
104
+ ## What are Dataset Cards?
105
+
106
+ Each dataset may be documented by the `README.md` file in the repository. This file is called a **dataset card**, and the Hugging Face Hub will render its contents on the dataset’s main page. To inform users about how to responsibly use the data, it’s a good idea to include information about any potential biases within the dataset. Generally, dataset cards help users understand the contents of the dataset and give context for how the dataset should be used.
107
+
108
+ You can also add dataset metadata to your card. The metadata describes important information about a dataset such as its license, language, and size. It also contains tags to help users discover a dataset on the Hub. Tags are defined in a YAML metadata section at the top of the `README.md` file.
109
+
110
+ ## Dataset card metadata
111
+
112
+ A dataset repo will render its README.md as a dataset card. To control how the Hub displays the card, you should create a YAML section in the README file to define some metadata. Start by adding three --- at the top, then include all of the relevant metadata, and close the section with another group of --- like the example below:
113
+
114
+
115
+ The metadata that you add to the dataset card enables certain interactions on the Hub. For example:
116
+
117
+ - Allow users to filter and discover datasets at https://huggingface.co/datasets.
118
+
119
+ - If you choose a license using the keywords listed in the right column of this table, the license will be displayed on the dataset page.
120
+
121
+ When creating a README.md file in a dataset repository on the Hub, use Metadata UI to fill the main metadata:
122
+
123
+ To see metadata fields, see the detailed dataset card metadata specification here.
124
+
125
+ ### Dataset card creation guide
126
+
127
+ For a step-by-step guide on creating a dataset card, check out the Create a dataset card guide.
128
+
129
+ Reading through existing dataset cards, such as the ELI5 dataset card, is a great way to familiarize yourself with the common conventions.
130
+
131
+ ### Linking a Paper
132
+
133
+ If the dataset card includes a link to a paper on arXiv, the Hub will extract the arXiv ID and include it in the dataset tags with the format `arxiv:<PAPER ID>`. Clicking on the tag will let you:
134
+
135
+ - Visit the Paper page
136
+
137
+ - Filter for other models on the Hub that cite the same paper.
138
+
139
+ Read more about paper pages here.
140
+
141
+ https://huggingface.co/docs/hub/paper-pages