RDNet Collection DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] ā¢ 9 items ā¢ Updated Oct 16 ā¢ 3
timm tiny test models Collection A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. ā¢ 13 items ā¢ Updated Oct 2 ā¢ 3
view article Article LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25 ā¢ 18
š MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" ā¢ 13 items ā¢ Updated Jul 24 ā¢ 54
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper ā¢ 2406.16860 ā¢ Published Jun 24 ā¢ 57
An Image is Worth 32 Tokens for Reconstruction and Generation Paper ā¢ 2406.07550 ā¢ Published Jun 11 ā¢ 55
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. ā¢ 22 items ā¢ Updated Oct 4 ā¢ 24
MobileNetV4 pretrained weights Collection Weights for MobileNet-V4 pretrained in timm ā¢ 17 items ā¢ Updated Sep 22 ā¢ 17
MobileNetV4 -- Universal Models for the Mobile Ecosystem Paper ā¢ 2404.10518 ā¢ Published Apr 16 ā¢ 2
view article Article Multimodal Augmentation for Documents: Recovering āComprehensionā in āReading and Comprehensionā task By danaaubakirova ā¢ May 16 ā¢ 17
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma ā¢ 16 items ā¢ Updated Jul 31 ā¢ 137
Searching for Better ViT Baselines Collection Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). ā¢ 25 items ā¢ Updated Aug 21 ā¢ 13
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. ā¢ 2 items ā¢ Updated Mar 30 ā¢ 47