Tonic
/

video-swin-transformer

TF-Keras

Model card Files Files and versions Community

Tonic commited on Apr 23

Commit

0f456cd

•

1 Parent(s): 286e65f

Create README.md

Browse files

Files changed (1) hide show

README.md +62 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+license: other
+library_name: keras
+---
+# Collection shoaib6174/video_swin_transformer/1
+Collection of Video Swin Transformers feature extractor models.
+<!-- task: video-feature-extraction -->
+## Overview
+This collection contains different Video Swin Transformer [1] models. The original model weights are provided from [2]. There were ported to Keras models
+(`tf.keras.Model`) and then serialized as TensorFlow SavedModels. The porting steps are available in [3].
+## About the models
+These models can be directly used to extract features from videos. These models are accompanied by
+Colab Notebooks with fine-tuning steps for action-recognition task and video-classification.
+The table below provides a performance summary:
+| model_name                                     |   pre-train dataset |   fine-tune dataset   |   acc@1(%) |  acc@5(%) |
+|:----------------------------------------------:|:-------------------:|:---------------------:|:----------:|----------:|
+| swin_tiny_patch244_window877_kinetics400_1k    |    ImageNet-1K      | Kinetics 400(1k       |       78.8 |      93.6 |
+| swin_small_patch244_window877_kinetics400_1k   |    ImageNet-1K      | Kinetics 400(1k)      |       80.6 |      94.5 |
+| swin_base_patch244_window877_kinetics400_1k    |    ImageNet-1K      | Kinetics 400(1k)      |       80.6 |      96.6 |
+| swin_base_patch244_window877_kinetics400_22k   |    ImageNet-12K     | Kinetics 400(1k)      |       82.7 |      95.5 |
+| swin_base_patch244_window877_kinetics600_22k   |    ImageNet-1K      | Kinetics 600(1k)      |       84.0 |      96.5 |
+| swin_base_patch244_window1677_sthv2            |    Kinetics 400     | Something-Something V2|       69.6 |      92.7 |
+These scores for all the models are taken from [2].
+### Video Swin Transformer Feature extractors Models
+* [swin_tiny_patch244_window877_kinetics400_1k](https://tfhub.dev/shoaib6174/swin_tiny_patch244_window877_kinetics400_1k)
+* [swin_small_patch244_window877_kinetics400_1k](https://tfhub.dev/shoaib6174/swin_small_patch244_window877_kinetics400_1k)
+* [swin_base_patch244_window877_kinetics400_1k](https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics400_1k)
+* [swin_base_patch244_window877_kinetics400_22k](https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics400_22k)
+* [swin_base_patch244_window877_kinetics600_22k](https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics600_22k)
+* [swin_base_patch244_window1677_sthv2](https://tfhub.dev/shoaib6174/swin_base_patch244_window1677_sthv2)
+## Notes
+The input shape for these models are `[None, 3, 32, 224, 224]` representing `[batch_size, channels, frames, height, width]`. To create models with different input shape use [this notebook](https://colab.research.google.com/drive/1sZIM7_OV1__CFV-WSQguOOZ8VyOsDaGM).
+## References
+[1] [Video Swin Transformer Ze et al.](https://arxiv.org/abs/2106.13230)
+[2] [Video Swin Transformers GitHub](https://github.com/SwinTransformer/Video-Swin-Transformerr)
+[3] [GSOC-22-Video-Swin-Transformers GitHub](https://github.com/shoaib6174/GSOC-22-Video-Swin-Transformers)
+## Acknowledgements
+* [Google Summer of Code 2022](https://summerofcode.withgoogle.com/)
+* [Luiz GUStavo Martins](https://www.linkedin.com/in/luiz-gustavo-martins-64ab5891/)
+* [Sayak Paul](https://www.linkedin.com/in/sayak-paul/)