guu980-dev commited on
Commit
89df510
β€’
1 Parent(s): 760ce89

Update Readme.md

Browse files
Files changed (1) hide show
  1. Readme.md +117 -19
Readme.md CHANGED
@@ -1,19 +1,117 @@
1
- # Dataset Structure
2
-
3
- /custom_vqa_project/
4
- β”‚
5
- β”œβ”€β”€ /dataset/
6
- β”‚ β”œβ”€β”€ /images/
7
- β”‚ β”‚ β”œβ”€β”€ train/
8
- β”‚ β”‚ β”‚ β”œβ”€β”€ image1.jpg
9
- β”‚ β”‚ β”‚ β”œβ”€β”€ image2.jpg
10
- β”‚ β”‚ └── val/
11
- β”‚ β”‚ β”œβ”€β”€ image3.jpg
12
- β”‚ β”‚ └── image4.jpg
13
- β”‚ β”œβ”€β”€ train.json # Metadata for the training set
14
- β”‚ └── val.json # Metadata for the validation set
15
- β”‚
16
- β”œβ”€β”€ /scripts/
17
- β”‚ └── train.py # Your fine-tuning script
18
- β”‚
19
- └── README.md
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <p align="center">
6
+ <img src="https://i.ibb.co/68ZBkzQ/narrow.png" alt="Description of the image", height="300", width="400">
7
+ </p>
8
+
9
+ # **πŸ–ΌοΈ**Β Model description
10
+
11
+ This model, fine-tuned with LoRA to specialize in art criticism, is based on PaliGemma. It is trained on a diverse dataset that includes a wide range of artworks, from traditional paintings to contemporary art, focusing on 11 detailed aesthetic elements such as form, composition, and symbolism. The combination of PaliGemma's image and language processing capabilities with the LoRA fine-tuning technique provides users with insights into understanding the visual features and artistic intentions of a work from multiple perspectives.
12
+
13
+ # **πŸ”**Β Intended Uses & Limitations
14
+
15
+ **GemmArte** is designed for:
16
+
17
+ - Analyzing artworks from various perspectives (formal, historical, emotional, technical, cultural, comparative)
18
+ - Providing detailed interpretations of visual elements in paintings
19
+ - Assisting in art education, appreciation, and research
20
+
21
+ **Limitations**:
22
+
23
+ - The model's knowledge is limited to its training data and the base PaliGemma model
24
+ - It may not accurately analyze very recent artworks or obscure art styles
25
+ - Performance may vary depending on the quality and clarity of the input image
26
+
27
+ # **πŸ“Š**Β Training dataset
28
+
29
+ The model training was conducted using the HuggingFace [πŸ”— Painting-Form dataset](https://huggingface.co/datasets/steven16/Painting-Form). This dataset consists of images of art pieces along with 215 questions and answers pertaining to each image. By leveraging this rich dataset, the model is able to provide detailed insights and critiques, enhancing its ability to understand and interpret various elements of art.
30
+
31
+ **Datset key features**
32
+ - High-quality images of paintings
33
+ - Detailed descriptions and analyses of these paintings
34
+
35
+ <p align="center">
36
+ <img src="https://i.postimg.cc/fLBvbQZJ/image.png)](https://postimg.cc/8fh6G6dX" alt="Description of the image", width="250">
37
+ </p>
38
+
39
+ | question | `Please construct a formal analysis paragraph of approximately 175 words that is coherent and focuses solely on visual characteristics.\n<image>` |
40
+ | --- | --- |
41
+ | answer | `The painting depicts a young shepherdess, her gaze directed towards the viewer. Her figure is rendered with meticulous detail, her skin glowing with a soft, porcelain-like quality. The folds of her dress are depicted with a delicate touch, creating a sense of movement and fluidity...` |
42
+
43
+ ## **πŸ”„**Β  Training Dataset example
44
+
45
+ The Painting-Form dataset was preprocessed by organizing the 215 questions into 11 categories. Each category is represented by a new, representative question (instruction), creating a refined training dataset. This approach enhances the model's ability to provide structured critiques, focusing on key aesthetic and interpretative elements of art.
46
+
47
+ [πŸ”— Preprecessed_dataset](https://www.kaggle.com/datasets/baeksj01/gemmaarte-dataset/data)
48
+
49
+ | **id** | **image** | **category** | **question** | **answer** |
50
+ | --- | --- | --- | --- | --- |
51
+ | identity_0 | 2505.png | General Visual Analysis | Create a detailed and cohesive analysis paragr... | The painting depicts a young shepherdess, her ... |
52
+ | identity_1 | 13963.png | General Visual Analysis | Create a detailed and cohesive analysis paragr... | The artwork presents a captivating visual symp... |
53
+ | identity_2 | 6686.png | General Visual Analysis | Create a detailed and cohesive analysis paragr... | The painting features a rich and vibrant color... |
54
+
55
+ ## **πŸ“ˆ**Β Training Dataset statistics
56
+ | images | 13,812 |
57
+ | :---: | :---: |
58
+ | question- answers | 224,850 |
59
+ | question categories | 11 |
60
+ | dataset size | 7.5 (GB) |
61
+
62
+ ## **βš™οΈ**Β Training Procedure
63
+
64
+ ### **πŸ”§**Β Training Hyperparameters
65
+
66
+ - Learning rate: 2e-5
67
+ - Batch size: 4
68
+ - Weight decay: 1e-6
69
+ - Number of epochs: 2
70
+ - Optimizer: paged_adamw_8bit
71
+
72
+ ### **πŸ–₯️**Β Training Code
73
+
74
+ To train the model, use the following command:
75
+
76
+ ```bash
77
+ python article_base_train.py \
78
+ --dataset_dir "/path/to/your/dataset" \
79
+ --model_id "google/paligemma-3b-pt-224" \
80
+ --output_dir "/path/to/your/output" \
81
+ --use_qlora [true|false]
82
+ --metadata_type [csv|parquet]
83
+ ```
84
+
85
+ Note: Replace `/path/to/your/dataset` and `/path/to/your/output` with your actual dataset and output directory paths.
86
+
87
+ [πŸ”— article_base_train.py script](https://github.com/guu980-dev/GemmArte)
88
+
89
+ This script handles the entire training process for the GemmArte model, including data loading, model initialization, and training loop setup.
90
+
91
+ ## **πŸš€**Β  Inference
92
+
93
+ To run inference using the trained model, use the following command
94
+
95
+ ```bash
96
+ python test_inference.py
97
+ ```
98
+
99
+ [πŸ”— test_inference.py script](https://github.com/guu980-dev/GemmArte)
100
+
101
+ <p align="center">
102
+ <img src="https://i.postimg.cc/D0NdqQSh/starry-night.jpg)](https://postimg.cc/8fh6G6dX" alt="Description of the image", width="500" height="500">
103
+ </p>
104
+
105
+ ```python
106
+ prompt = "Provide a step-by-step formal analysis of this image, covering: 1) Composition, 2) Color, 3) Line, 4) Texture, 5) Space, and 6) Form.”
107
+ '''
108
+ The composition of this artwork is striking, with a strong emphasis on the vertical and horizontal lines. The starry night sky dominates the canvas, filling the majority of the space with swirling patterns of light and dark. The use of bold, contrasting colors further enhances the dramatic effect of the scene, with the bright yellow stars against the deep blue sky creating a sense of tension and energy. The forms are simplified and stylized, with the only recognizable features being the church steeple and the distant village. The overall effect is one of a dynamic and expressive composition that captures the viewer's attention and draws them into the world of the painting.
109
+ '''
110
+
111
+ prompt = "Please construct a formal analysis paragraph that is coherent and focuses solely on visual characteristic.”
112
+ '''
113
+ The painting under consideration is a masterful example of the artist's use of color and brushwork to convey a sense of emotion and atmosphere. The dominant hues of blue and green create a striking contrast against the dark, swirling sky, which is rendered in a variety of shades of blue and white. The artist's use of thick, impasto brushstrokes adds texture and depth to the painting, particularly in the depiction of the swirling clouds and the distant mountains. The artist's attention to detail is evident in the intricate rendering of the stars and the church steeple, which are depicted with a combination of delicate, almost ethereal brushstrokes and bold, expressive strokes. The overall effect is a visually striking and emotionally charged work of art that captures the viewer's attention and invites them to contemplate the scene before them.
114
+ '''
115
+ ```
116
+
117
+ The model will generate an analysis based on the specified prompt and the provided image. By using these structured prompts, you can guide the model to provide more detailed and comprehensive analyses of artworks.