guu980-dev
commited on
Commit
β’
89df510
1
Parent(s):
760ce89
Update Readme.md
Browse files
Readme.md
CHANGED
@@ -1,19 +1,117 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
<p align="center">
|
6 |
+
<img src="https://i.ibb.co/68ZBkzQ/narrow.png" alt="Description of the image", height="300", width="400">
|
7 |
+
</p>
|
8 |
+
|
9 |
+
# **πΌοΈ**Β Model description
|
10 |
+
|
11 |
+
This model, fine-tuned with LoRA to specialize in art criticism, is based on PaliGemma. It is trained on a diverse dataset that includes a wide range of artworks, from traditional paintings to contemporary art, focusing on 11 detailed aesthetic elements such as form, composition, and symbolism. The combination of PaliGemma's image and language processing capabilities with the LoRA fine-tuning technique provides users with insights into understanding the visual features and artistic intentions of a work from multiple perspectives.
|
12 |
+
|
13 |
+
# **π**Β Intended Uses & Limitations
|
14 |
+
|
15 |
+
**GemmArte** is designed for:
|
16 |
+
|
17 |
+
- Analyzing artworks from various perspectives (formal, historical, emotional, technical, cultural, comparative)
|
18 |
+
- Providing detailed interpretations of visual elements in paintings
|
19 |
+
- Assisting in art education, appreciation, and research
|
20 |
+
|
21 |
+
**Limitations**:
|
22 |
+
|
23 |
+
- The model's knowledge is limited to its training data and the base PaliGemma model
|
24 |
+
- It may not accurately analyze very recent artworks or obscure art styles
|
25 |
+
- Performance may vary depending on the quality and clarity of the input image
|
26 |
+
|
27 |
+
# **π**Β Training dataset
|
28 |
+
|
29 |
+
The model training was conducted using the HuggingFace [π Painting-Form dataset](https://huggingface.co/datasets/steven16/Painting-Form). This dataset consists of images of art pieces along with 215 questions and answers pertaining to each image. By leveraging this rich dataset, the model is able to provide detailed insights and critiques, enhancing its ability to understand and interpret various elements of art.
|
30 |
+
|
31 |
+
**Datset key features**
|
32 |
+
- High-quality images of paintings
|
33 |
+
- Detailed descriptions and analyses of these paintings
|
34 |
+
|
35 |
+
<p align="center">
|
36 |
+
<img src="https://i.postimg.cc/fLBvbQZJ/image.png)](https://postimg.cc/8fh6G6dX" alt="Description of the image", width="250">
|
37 |
+
</p>
|
38 |
+
|
39 |
+
| question | `Please construct a formal analysis paragraph of approximately 175 words that is coherent and focuses solely on visual characteristics.\n<image>` |
|
40 |
+
| --- | --- |
|
41 |
+
| answer | `The painting depicts a young shepherdess, her gaze directed towards the viewer. Her figure is rendered with meticulous detail, her skin glowing with a soft, porcelain-like quality. The folds of her dress are depicted with a delicate touch, creating a sense of movement and fluidity...` |
|
42 |
+
|
43 |
+
## **π**Β Training Dataset example
|
44 |
+
|
45 |
+
The Painting-Form dataset was preprocessed by organizing the 215 questions into 11 categories. Each category is represented by a new, representative question (instruction), creating a refined training dataset. This approach enhances the model's ability to provide structured critiques, focusing on key aesthetic and interpretative elements of art.
|
46 |
+
|
47 |
+
[π Preprecessed_dataset](https://www.kaggle.com/datasets/baeksj01/gemmaarte-dataset/data)
|
48 |
+
|
49 |
+
| **id** | **image** | **category** | **question** | **answer** |
|
50 |
+
| --- | --- | --- | --- | --- |
|
51 |
+
| identity_0 | 2505.png | General Visual Analysis | Create a detailed and cohesive analysis paragr... | The painting depicts a young shepherdess, her ... |
|
52 |
+
| identity_1 | 13963.png | General Visual Analysis | Create a detailed and cohesive analysis paragr... | The artwork presents a captivating visual symp... |
|
53 |
+
| identity_2 | 6686.png | General Visual Analysis | Create a detailed and cohesive analysis paragr... | The painting features a rich and vibrant color... |
|
54 |
+
|
55 |
+
## **π**Β Training Dataset statistics
|
56 |
+
| images | 13,812 |
|
57 |
+
| :---: | :---: |
|
58 |
+
| question- answers | 224,850 |
|
59 |
+
| question categories | 11 |
|
60 |
+
| dataset size | 7.5 (GB) |
|
61 |
+
|
62 |
+
## **βοΈ**Β Training Procedure
|
63 |
+
|
64 |
+
### **π§**Β Training Hyperparameters
|
65 |
+
|
66 |
+
- Learning rate: 2e-5
|
67 |
+
- Batch size: 4
|
68 |
+
- Weight decay: 1e-6
|
69 |
+
- Number of epochs: 2
|
70 |
+
- Optimizer: paged_adamw_8bit
|
71 |
+
|
72 |
+
### **π₯οΈ**Β Training Code
|
73 |
+
|
74 |
+
To train the model, use the following command:
|
75 |
+
|
76 |
+
```bash
|
77 |
+
python article_base_train.py \
|
78 |
+
--dataset_dir "/path/to/your/dataset" \
|
79 |
+
--model_id "google/paligemma-3b-pt-224" \
|
80 |
+
--output_dir "/path/to/your/output" \
|
81 |
+
--use_qlora [true|false]
|
82 |
+
--metadata_type [csv|parquet]
|
83 |
+
```
|
84 |
+
|
85 |
+
Note: Replace `/path/to/your/dataset` and `/path/to/your/output` with your actual dataset and output directory paths.
|
86 |
+
|
87 |
+
[π article_base_train.py script](https://github.com/guu980-dev/GemmArte)
|
88 |
+
|
89 |
+
This script handles the entire training process for the GemmArte model, including data loading, model initialization, and training loop setup.
|
90 |
+
|
91 |
+
## **π**Β Inference
|
92 |
+
|
93 |
+
To run inference using the trained model, use the following command
|
94 |
+
|
95 |
+
```bash
|
96 |
+
python test_inference.py
|
97 |
+
```
|
98 |
+
|
99 |
+
[π test_inference.py script](https://github.com/guu980-dev/GemmArte)
|
100 |
+
|
101 |
+
<p align="center">
|
102 |
+
<img src="https://i.postimg.cc/D0NdqQSh/starry-night.jpg)](https://postimg.cc/8fh6G6dX" alt="Description of the image", width="500" height="500">
|
103 |
+
</p>
|
104 |
+
|
105 |
+
```python
|
106 |
+
prompt = "Provide a step-by-step formal analysis of this image, covering: 1) Composition, 2) Color, 3) Line, 4) Texture, 5) Space, and 6) Form.β
|
107 |
+
'''
|
108 |
+
The composition of this artwork is striking, with a strong emphasis on the vertical and horizontal lines. The starry night sky dominates the canvas, filling the majority of the space with swirling patterns of light and dark. The use of bold, contrasting colors further enhances the dramatic effect of the scene, with the bright yellow stars against the deep blue sky creating a sense of tension and energy. The forms are simplified and stylized, with the only recognizable features being the church steeple and the distant village. The overall effect is one of a dynamic and expressive composition that captures the viewer's attention and draws them into the world of the painting.
|
109 |
+
'''
|
110 |
+
|
111 |
+
prompt = "Please construct a formal analysis paragraph that is coherent and focuses solely on visual characteristic.β
|
112 |
+
'''
|
113 |
+
The painting under consideration is a masterful example of the artist's use of color and brushwork to convey a sense of emotion and atmosphere. The dominant hues of blue and green create a striking contrast against the dark, swirling sky, which is rendered in a variety of shades of blue and white. The artist's use of thick, impasto brushstrokes adds texture and depth to the painting, particularly in the depiction of the swirling clouds and the distant mountains. The artist's attention to detail is evident in the intricate rendering of the stars and the church steeple, which are depicted with a combination of delicate, almost ethereal brushstrokes and bold, expressive strokes. The overall effect is a visually striking and emotionally charged work of art that captures the viewer's attention and invites them to contemplate the scene before them.
|
114 |
+
'''
|
115 |
+
```
|
116 |
+
|
117 |
+
The model will generate an analysis based on the specified prompt and the provided image. By using these structured prompts, you can guide the model to provide more detailed and comprehensive analyses of artworks.
|