Update README.md
Browse files
README.md
CHANGED
@@ -73,7 +73,7 @@ We pre-process with face-blurring.
|
|
73 |
- We used "[depicts](https://www.wikidata.org/wiki/Property:P180)" and "[made from material](https://www.wikidata.org/wiki/Property:P186)" property for training.
|
74 |
- To check if it is in the public domain, Wikimedia Commons category tags and Wikidata artist property were used together.
|
75 |
- Only images that were in the public domain in at least all of the source of origin country, Japan, EU, and the United States were used.
|
76 |
-
- Finally 267,573 images are used for training. [All attributions are found here](
|
77 |
|
78 |
* Even if the dataset itself is CC-licensed, we did not use it if the image contained in the dataset is not properly licensed, is based on unauthorized use of copyrighted works, or is based on the synthetic data output of other pretrained models.
|
79 |
* English captions are translated into Japanese using [ElanMT](https://huggingface.co/Mitsua/elan-mt-bt-en-ja) model which is trained solely on openly licensed corpus.
|
|
|
73 |
- We used "[depicts](https://www.wikidata.org/wiki/Property:P180)" and "[made from material](https://www.wikidata.org/wiki/Property:P186)" property for training.
|
74 |
- To check if it is in the public domain, Wikimedia Commons category tags and Wikidata artist property were used together.
|
75 |
- Only images that were in the public domain in at least all of the source of origin country, Japan, EU, and the United States were used.
|
76 |
+
- Finally 267,573 images are used for training. [All attributions are found here](wikimedia_commons_pd_attribution.csv).
|
77 |
|
78 |
* Even if the dataset itself is CC-licensed, we did not use it if the image contained in the dataset is not properly licensed, is based on unauthorized use of copyrighted works, or is based on the synthetic data output of other pretrained models.
|
79 |
* English captions are translated into Japanese using [ElanMT](https://huggingface.co/Mitsua/elan-mt-bt-en-ja) model which is trained solely on openly licensed corpus.
|