Tonic commited on
Commit
0f456cd
1 Parent(s): 286e65f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: keras
4
+ ---
5
+ # Collection shoaib6174/video_swin_transformer/1
6
+
7
+ Collection of Video Swin Transformers feature extractor models.
8
+
9
+
10
+ <!-- task: video-feature-extraction -->
11
+
12
+ ## Overview
13
+
14
+ This collection contains different Video Swin Transformer [1] models. The original model weights are provided from [2]. There were ported to Keras models
15
+ (`tf.keras.Model`) and then serialized as TensorFlow SavedModels. The porting steps are available in [3].
16
+
17
+
18
+ ## About the models
19
+
20
+ These models can be directly used to extract features from videos. These models are accompanied by
21
+ Colab Notebooks with fine-tuning steps for action-recognition task and video-classification.
22
+
23
+ The table below provides a performance summary:
24
+
25
+ | model_name | pre-train dataset | fine-tune dataset | acc@1(%) | acc@5(%) |
26
+ |:----------------------------------------------:|:-------------------:|:---------------------:|:----------:|----------:|
27
+ | swin_tiny_patch244_window877_kinetics400_1k | ImageNet-1K | Kinetics 400(1k | 78.8 | 93.6 |
28
+ | swin_small_patch244_window877_kinetics400_1k | ImageNet-1K | Kinetics 400(1k) | 80.6 | 94.5 |
29
+ | swin_base_patch244_window877_kinetics400_1k | ImageNet-1K | Kinetics 400(1k) | 80.6 | 96.6 |
30
+ | swin_base_patch244_window877_kinetics400_22k | ImageNet-12K | Kinetics 400(1k) | 82.7 | 95.5 |
31
+ | swin_base_patch244_window877_kinetics600_22k | ImageNet-1K | Kinetics 600(1k) | 84.0 | 96.5 |
32
+ | swin_base_patch244_window1677_sthv2 | Kinetics 400 | Something-Something V2| 69.6 | 92.7 |
33
+
34
+
35
+ These scores for all the models are taken from [2].
36
+
37
+
38
+
39
+ ### Video Swin Transformer Feature extractors Models
40
+
41
+ * [swin_tiny_patch244_window877_kinetics400_1k](https://tfhub.dev/shoaib6174/swin_tiny_patch244_window877_kinetics400_1k)
42
+ * [swin_small_patch244_window877_kinetics400_1k](https://tfhub.dev/shoaib6174/swin_small_patch244_window877_kinetics400_1k)
43
+ * [swin_base_patch244_window877_kinetics400_1k](https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics400_1k)
44
+ * [swin_base_patch244_window877_kinetics400_22k](https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics400_22k)
45
+ * [swin_base_patch244_window877_kinetics600_22k](https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics600_22k)
46
+ * [swin_base_patch244_window1677_sthv2](https://tfhub.dev/shoaib6174/swin_base_patch244_window1677_sthv2)
47
+
48
+
49
+
50
+ ## Notes
51
+
52
+ The input shape for these models are `[None, 3, 32, 224, 224]` representing `[batch_size, channels, frames, height, width]`. To create models with different input shape use [this notebook](https://colab.research.google.com/drive/1sZIM7_OV1__CFV-WSQguOOZ8VyOsDaGM).
53
+
54
+ ## References
55
+ [1] [Video Swin Transformer Ze et al.](https://arxiv.org/abs/2106.13230)
56
+ [2] [Video Swin Transformers GitHub](https://github.com/SwinTransformer/Video-Swin-Transformerr)
57
+ [3] [GSOC-22-Video-Swin-Transformers GitHub](https://github.com/shoaib6174/GSOC-22-Video-Swin-Transformers)
58
+
59
+ ## Acknowledgements
60
+ * [Google Summer of Code 2022](https://summerofcode.withgoogle.com/)
61
+ * [Luiz GUStavo Martins](https://www.linkedin.com/in/luiz-gustavo-martins-64ab5891/)
62
+ * [Sayak Paul](https://www.linkedin.com/in/sayak-paul/)