roman-bachmann
commited on
Commit
•
a5acadc
1
Parent(s):
ecd7d61
Init
Browse files- LICENSE +10 -0
- README.md +49 -0
- config.json +1 -0
- pytorch_model.bin +3 -0
LICENSE
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Sample Code License
|
2 |
+
Version: 1.1
|
3 |
+
|
4 |
+
IMPORTANT: This software is supplied to you by École Polytechnique Fédérale de Lausanne (“EPFL”) and Apple Inc. ("Apple") in consideration of your agreement to the following terms, and your use, installation, modification or redistribution of this software constitutes acceptance of these terms. If you do not agree with these terms, please do not use, install, modify or redistribute this software.
|
5 |
+
|
6 |
+
In consideration of your agreement to abide by the following terms, and subject to these terms, EPFL and Apple (collectively, “Licensor”) grant you a personal, non-exclusive license, under Licensor’s copyrights in this original software (the " Software"), to use, reproduce, modify and redistribute the Software, with or without modifications, in source and/or binary forms for non-commercial use; provided that if you redistribute the Software in its entirety and without modifications, you must retain this notice and the following text and disclaimers in all such redistributions of the Software. Neither the name, trademarks, service marks or logos of Licensor may be used to endorse or promote products derived from the Software without specific prior written permission from Licensor. Except as expressly stated in this notice, no other rights or licenses, express or implied, are granted by Licensor herein, including but not limited to any patent rights that may be infringed by your derivative works or by other works in which the Software may be incorporated.
|
7 |
+
|
8 |
+
The Software is provided by Licensor on an "AS IS" basis. LICENSOR MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, REGARDING THE SOFTWARE OR ITS USE AND OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS. IN NO EVENT SHALL LICENSOR BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION, MODIFICATION AND/OR DISTRIBUTION OF THE SOFTWARE, HOWEVER CAUSED AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY OR OTHERWISE, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
9 |
+
|
10 |
+
Copyright (C) 2024. All Rights Reserved.
|
README.md
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
license_name: sample-code-license
|
4 |
+
license_link: LICENSE
|
5 |
+
library_name: ml-4m
|
6 |
+
---
|
7 |
+
|
8 |
+
# 4M: Massively Multimodal Masked Modeling
|
9 |
+
|
10 |
+
*David Mizrahi\*, Roman Bachmann\*, Oguzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir*
|
11 |
+
|
12 |
+
Official implementation and pre-trained models for "4M: Massively Multimodal Masked Modeling" (NeurIPS 2023).
|
13 |
+
|
14 |
+
[`Website`](https://4m.epfl.ch) | [`Paper`](https://arxiv.org/abs/2312.06647) | [`GitHub`](https://github.com/apple/ml-4m)
|
15 |
+
|
16 |
+
4M is a framework for training "any-to-any" foundation models, using tokenization and masking to scale to many diverse modalities.
|
17 |
+
Models trained using 4M can perform a wide range of vision tasks, transfer well to unseen tasks and modalities, and are flexible and steerable multimodal generative models.
|
18 |
+
|
19 |
+
|
20 |
+
## Installation
|
21 |
+
For install instructions, please see https://github.com/apple/ml-4m.
|
22 |
+
|
23 |
+
|
24 |
+
## Usage
|
25 |
+
|
26 |
+
The COCO semantic segmentation map tokenizer can be loaded from Hugging Face Hub as follows:
|
27 |
+
```python
|
28 |
+
from fourm.vq.vqvae import VQVAE
|
29 |
+
tok_rgb = VQVAE.from_pretrained('EPFL-VILAB/4M_tokenizers_semseg_4k_224-448')
|
30 |
+
```
|
31 |
+
|
32 |
+
Please see https://github.com/apple/ml-4m/README_TOKENIZATION.md for more detailed instructions and https://github.com/apple/ml-4m for other tokenizer and 4M model checkpoints.
|
33 |
+
|
34 |
+
|
35 |
+
## Citation
|
36 |
+
|
37 |
+
If you find this repository helpful, please consider citing our work:
|
38 |
+
```
|
39 |
+
@inproceedings{mizrahi2023fourm,
|
40 |
+
title={{4M}: Massively Multimodal Masked Modeling},
|
41 |
+
author={David Mizrahi and Roman Bachmann and Oguzhan Fatih Kar and Teresa Yeo and Mingfei Gao and Afshin Dehghan and Amir Zamir},
|
42 |
+
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
|
43 |
+
year={2023},
|
44 |
+
}
|
45 |
+
```
|
46 |
+
|
47 |
+
## License
|
48 |
+
|
49 |
+
The model weights in this repository are released under the Sample Code license as found in the [LICENSE](LICENSE) file.
|
config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"image_size": 448, "image_size_enc": null, "n_channels": 64, "n_labels": 134, "enc_type": "vit_b_enc", "patch_proj": true, "post_mlp": true, "patch_size": 16, "quant_type": "lucid", "codebook_size": 4096, "num_codebooks": 1, "latent_dim": 32, "norm_codes": true, "norm_latents": false, "sync_codebook": false, "undo_std": false, "dec_type": "vit_b_dec", "out_conv": false, "image_size_dec": null, "patch_size_dec": null}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f58fd98f05aa38bb7ae63ca0f900d622cb35b62a4fb7b7a1d99f9a6a89b87b28
|
3 |
+
size 879788274
|