RomanShnurov
commited on
Commit
•
80176fd
1
Parent(s):
af67224
init commit
Browse files- README.md +117 -1
- __init__.py +0 -0
- config.json +12 -0
- images/1.jpg +0 -0
- images/2.jpg +0 -0
- images/3.jpg +0 -0
- images/4.webp +0 -0
- pipeline.py +53 -0
- requirements.txt +4 -0
- synthetic.pt +3 -0
- synthetic.py +22 -0
README.md
CHANGED
@@ -1,3 +1,119 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
library_name: generic
|
3 |
+
license: cc-by-sa-3.0
|
4 |
+
pipeline_tag: image-classification
|
5 |
+
tags:
|
6 |
+
- ai_or_not
|
7 |
+
- sumsub
|
8 |
+
- image_classification
|
9 |
+
- sumsubaiornot
|
10 |
+
- aiornot
|
11 |
+
- deepfake
|
12 |
+
- synthetic
|
13 |
+
- generated
|
14 |
+
- pytorch
|
15 |
+
metrics:
|
16 |
+
- accuracy
|
17 |
---
|
18 |
+
|
19 |
+
# For Fake's Sake: a set of models for detecting generated and synthetic images
|
20 |
+
|
21 |
+
Many people on the internet have recently been tricked by fake images of Pope Francis wearing a coat or of Donald Trump's arrest.
|
22 |
+
To help combat this issue, we provide detectors for such images generated by popular tools like Midjourney and Stable Diffusion.
|
23 |
+
|
24 |
+
| ![Image1](images/3.jpg) | ![Image2](images/2.jpg) | ![Image3](images/4.webp) |
|
25 |
+
|-------------------------|-------------------------|--------------------------|
|
26 |
+
|
27 |
+
## Model Details
|
28 |
+
|
29 |
+
### Model Description
|
30 |
+
|
31 |
+
- **Developed by:** [Sumsub AI team](https://sumsub.com/)
|
32 |
+
- **Model type:** Image classification
|
33 |
+
- **License:** CC-By-SA-3.0
|
34 |
+
- **Types:**
|
35 |
+
- **Finetuned from model:** *convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384*
|
36 |
+
|
37 |
+
## Demo
|
38 |
+
|
39 |
+
The demo page can be found [here](https://huggingface.co/spaces/Sumsub/Sumsub-ffs-demo).
|
40 |
+
|
41 |
+
## How to Get Started with the Model & Model Sources
|
42 |
+
|
43 |
+
Use the code below to get started with the model:
|
44 |
+
|
45 |
+
```bash
|
46 |
+
git lfs install
|
47 |
+
git clone https://huggingface.co/Sumsub/Sumsub-ffs-synthetic-2.0 sumsub-ffs-synthetic-v2
|
48 |
+
```
|
49 |
+
|
50 |
+
```python
|
51 |
+
from sumsub-ffs-synthetic-v2.pipeline import PreTrainedPipeline
|
52 |
+
from PIL import Image
|
53 |
+
|
54 |
+
pipe = PreTrainedPipeline("sumsub-ffs-synthetic-v2/")
|
55 |
+
|
56 |
+
img = Image.open("sumsub-ffs-synthetic-v2/images/2.jpg")
|
57 |
+
|
58 |
+
result = pipe(img)
|
59 |
+
print(result)
|
60 |
+
```
|
61 |
+
|
62 |
+
You may need these prerequsites installed:
|
63 |
+
|
64 |
+
```bash
|
65 |
+
pip install -r requirements.txt
|
66 |
+
pip install "git+https://github.com/rwightman/pytorch-image-models"
|
67 |
+
pip install "git+https://github.com/huggingface/huggingface_hub"
|
68 |
+
```
|
69 |
+
|
70 |
+
## Training Details
|
71 |
+
|
72 |
+
### Training Data
|
73 |
+
|
74 |
+
The models were trained on the following datasets:
|
75 |
+
|
76 |
+
TBD
|
77 |
+
|
78 |
+
### Training Procedure
|
79 |
+
|
80 |
+
TBD
|
81 |
+
|
82 |
+
## Evaluation
|
83 |
+
|
84 |
+
TBD
|
85 |
+
|
86 |
+
## Metrics
|
87 |
+
|
88 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
89 |
+
|
90 |
+
TBD
|
91 |
+
|
92 |
+
## Limitations
|
93 |
+
|
94 |
+
- It should be noted that achieving 100% accuracy is not possible. Therefore, the model output should only be used as an indication that an image may have been (but not definitely) artificially generated.
|
95 |
+
- Our models may face challenges in accurately predicting the class for real-world examples that are extremely vibrant and of exceptionally high quality. In such cases, the richness of colors and fine details may lead to misclassifications due to the complexity of the input. This could potentially cause the model to focus on visual aspects that are not necessarily indicative of the true class.
|
96 |
+
|
97 |
+
![Image1](images/1.jpg)
|
98 |
+
|
99 |
+
## Citation
|
100 |
+
|
101 |
+
If you find this useful, please cite as:
|
102 |
+
|
103 |
+
```text
|
104 |
+
@misc{sumsubaiornot,
|
105 |
+
publisher = {Sumsub},
|
106 |
+
url = {https://huggingface.co/Sumsub/Sumsub-ffs-synthetic-2.0},
|
107 |
+
year = {2023},
|
108 |
+
author = {Savelyev, Alexander and Toropov, Alexey and Goldman-Kalaydin, Pavel and Samarin, Alexey},
|
109 |
+
title = {For Fake's Sake: a set of models for detecting deepfakes, generated images and synthetic images}
|
110 |
+
}
|
111 |
+
```
|
112 |
+
|
113 |
+
## References
|
114 |
+
|
115 |
+
- Stöckl, Andreas. (2022). Evaluating a Synthetic Image Dataset Generated with Stable Diffusion. 10.48550/arXiv.2211.01777.
|
116 |
+
- Lin, Tsung-Yi & Maire, Michael & Belongie, Serge & Hays, James & Perona, Pietro & Ramanan, Deva & Dollár, Piotr & Zitnick, C.. (2014). Microsoft COCO: Common Objects in Context.
|
117 |
+
- Howard, Andrew & Zhu, Menglong & Chen, Bo & Kalenichenko, Dmitry & Wang, Weijun & Weyand, Tobias & Andreetto, Marco & Adam, Hartwig. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
|
118 |
+
- Liu, Zhuang & Mao, Hanzi & Wu, Chao-Yuan & Feichtenhofer, Christoph & Darrell, Trevor & Xie, Saining. (2022). A ConvNet for the 2020s.
|
119 |
+
- Wang, Zijie & Montoya, Evan & Munechika, David & Yang, Haoyang & Hoover, Benjamin & Chau, Polo. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. 10.48550/arXiv.2210.14896.
|
__init__.py
ADDED
File without changes
|
config.json
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"id2label": {
|
3 |
+
"0": "by AI",
|
4 |
+
"1": "by human"
|
5 |
+
},
|
6 |
+
"label2id": {
|
7 |
+
"by AI": "0",
|
8 |
+
"by human": "1"
|
9 |
+
},
|
10 |
+
"pretrained": false,
|
11 |
+
"timm_model": "convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384"
|
12 |
+
}
|
images/1.jpg
ADDED
images/2.jpg
ADDED
images/3.jpg
ADDED
images/4.webp
ADDED
pipeline.py
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from typing import Dict, List, Any
|
2 |
+
from PIL import Image
|
3 |
+
|
4 |
+
import os
|
5 |
+
import json
|
6 |
+
import torch
|
7 |
+
from torch.nn import functional as F
|
8 |
+
import timm
|
9 |
+
|
10 |
+
from synthetic import SyntheticModel
|
11 |
+
|
12 |
+
class PreTrainedPipeline():
|
13 |
+
def __init__(self, path=""):
|
14 |
+
self.model = SyntheticModel()
|
15 |
+
ckpt = torch.load(os.path.join(path, "synthetic.pt"), map_location=torch.device('cpu'))
|
16 |
+
self.model.load_state_dict(ckpt)
|
17 |
+
self.model.eval()
|
18 |
+
|
19 |
+
with open(os.path.join(path, "config.json")) as config:
|
20 |
+
config = json.load(config)
|
21 |
+
self.id2label = config["id2label"]
|
22 |
+
|
23 |
+
transform_config = {'input_size': (3, 384, 384),
|
24 |
+
'interpolation': 'bicubic',
|
25 |
+
'mean': (0.48145466, 0.4578275, 0.40821073),
|
26 |
+
'std': (0.26862954, 0.26130258, 0.27577711),
|
27 |
+
'crop_pct': 1.0,
|
28 |
+
'crop_mode': 'squash'}
|
29 |
+
|
30 |
+
self.transform_m = timm.data.create_transform(**transform_config, is_training=False)
|
31 |
+
|
32 |
+
def __call__(self, inputs: "Image.Image") -> List[Dict[str, Any]]:
|
33 |
+
"""
|
34 |
+
Args:
|
35 |
+
inputs (:obj:`PIL.Image`):
|
36 |
+
The raw image representation as PIL.
|
37 |
+
No transformation made whatsoever from the input. Make all necessary transformations here.
|
38 |
+
Return:
|
39 |
+
A :obj:`list`:. The list contains items that are dicts should be liked {"label": "XXX", "score": 0.82}
|
40 |
+
It is preferred if the returned list is in decreasing `score` order
|
41 |
+
"""
|
42 |
+
img = self.transform_m(inputs)
|
43 |
+
return self.predict_from_model(img)
|
44 |
+
|
45 |
+
def predict_from_model(self, img):
|
46 |
+
y = self.model.forward(img.unsqueeze(0).to('cpu'))
|
47 |
+
y_1 = F.softmax(y, dim=1)[:, 1].cpu().detach().numpy()
|
48 |
+
y_2 = F.softmax(y, dim=1)[:, 0].cpu().detach().numpy()
|
49 |
+
labels = [
|
50 |
+
{"label": str(self.id2label["0"]), "score": y_1.tolist()[0]},
|
51 |
+
{"label": str(self.id2label["1"]), "score": y_2.tolist()[0]},
|
52 |
+
]
|
53 |
+
return labels
|
requirements.txt
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
timm==0.9.5
|
2 |
+
torch==2.0.1
|
3 |
+
torchvision==0.15.2
|
4 |
+
pytorch-lightning==2.0.9
|
synthetic.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:89a955ec54bddab759228757e437d300b6b86bbba9f45cfd5ecd0e3d7dec83a2
|
3 |
+
size 795263437
|
synthetic.py
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import timm
|
2 |
+
from torch import nn
|
3 |
+
from torch.nn import functional as F
|
4 |
+
import pytorch_lightning as pl
|
5 |
+
from pytorch_lightning.core.mixins import HyperparametersMixin
|
6 |
+
|
7 |
+
|
8 |
+
class SyntheticModel(pl.LightningModule, HyperparametersMixin):
|
9 |
+
def __init__(self):
|
10 |
+
super().__init__()
|
11 |
+
self.model = timm.create_model('convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384',
|
12 |
+
pretrained=False,
|
13 |
+
num_classes=0)
|
14 |
+
|
15 |
+
self.clf = nn.Sequential(
|
16 |
+
nn.Linear(1536, 128),
|
17 |
+
nn.ReLU(inplace=True),
|
18 |
+
nn.Linear(128, 2))
|
19 |
+
|
20 |
+
def forward(self, image):
|
21 |
+
image_features = self.model(image)
|
22 |
+
return self.clf(image_features)
|