Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -31,6 +31,8 @@ Disclaimer: The team releasing X-CLIP did not write a model card for this model
|
|
31 |
|
32 |
X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for general video-language understanding. The model is trained in a contrastive way on (video, text) pairs.
|
33 |
|
|
|
|
|
34 |
This allows the model to be used for tasks like zero-shot, few-shot or fully supervised video classification and video-text retrieval.
|
35 |
|
36 |
## Intended uses & limitations
|
|
|
31 |
|
32 |
X-CLIP is a minimal extension of [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for general video-language understanding. The model is trained in a contrastive way on (video, text) pairs.
|
33 |
|
34 |
+
![X-CLIP architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/xclip_architecture.png)
|
35 |
+
|
36 |
This allows the model to be used for tasks like zero-shot, few-shot or fully supervised video classification and video-text retrieval.
|
37 |
|
38 |
## Intended uses & limitations
|