Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: image-to-text
|
|
9 |
|
10 |
# PG-InstructBLIP model
|
11 |
|
12 |
-
Finetuned version of InstructBLIP with Flan-T5-
|
13 |
|
14 |
## Model description
|
15 |
|
@@ -20,6 +20,8 @@ PG-InstructBLIP is finetuned using the [PhysObjects dataset](https://drive.googl
|
|
20 |
|
21 |
This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
|
22 |
|
|
|
|
|
23 |
```
|
24 |
import torch
|
25 |
from PIL import Image
|
@@ -41,6 +43,8 @@ vlm = load_model(
|
|
41 |
device="cuda" if torch.cuda.is_available() else "cpu"
|
42 |
)
|
43 |
|
|
|
|
|
44 |
model_cls = registry.get_model_class('blip2_t5_instruct')
|
45 |
model_type = 'flant5xxl'
|
46 |
preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess
|
|
|
9 |
|
10 |
# PG-InstructBLIP model
|
11 |
|
12 |
+
Finetuned version of InstructBLIP with Flan-T5-XXL as the language model. PG-InstructBLIP was introduced in the paper [Physically Grounded Vision-Language Models for Robotic Manipulation](https://iliad.stanford.edu/pg-vlm/) by Gao et al.
|
13 |
|
14 |
## Model description
|
15 |
|
|
|
20 |
|
21 |
This model is designed to be used with the LAVIS library. Please install [salesforce-lavis](https://pypi.org/project/salesforce-lavis/) and download this model through git-lfs or direct downloading.
|
22 |
|
23 |
+
After loading the model, you can disable the qformer text input to follow the same configuration we used for fine-tuning. However, the model still works well with it enabled, so we recommend users to experiment with both and choose the optimal configuration on a case-by-case basis.
|
24 |
+
|
25 |
```
|
26 |
import torch
|
27 |
from PIL import Image
|
|
|
43 |
device="cuda" if torch.cuda.is_available() else "cpu"
|
44 |
)
|
45 |
|
46 |
+
vlm.qformer_text_input = False # Optionally disable qformer text
|
47 |
+
|
48 |
model_cls = registry.get_model_class('blip2_t5_instruct')
|
49 |
model_type = 'flant5xxl'
|
50 |
preprocess_cfg = OmegaConf.load(model_cls.default_config_path(model_type)).preprocess
|