Ross Wightman's picture

Ross Wightman

rwightman

·

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

updated a collection about 9 hours ago

All the ImageNets

posted an update about 15 hours ago

updated a dataset about 16 hours ago

timm/mini-imagenet

Articles

Trick or ResNet Treat

Mamba Out

Tiny Test Models

Searching for better (Full) ImageNet ViT Baselines

MobileNet Baselines

MobileNet-V4 (now in timm)

Organizations

rwightman's activity

upvoted 2 collections about 2 months ago

RDNet

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] • 9 items • Updated Oct 16 • 3

timm tiny test models

A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2 • 3

upvoted 2 articles 4 months ago

Article

MobileNet Baselines

By

•

Jul 26

• 23

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

upvoted a collection 4 months ago

🍃 MINT-1T

Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 54

upvoted 2 papers 4 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 67

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 57

upvoted a collection 4 months ago

Cambrian Data

3 items • Updated Jun 25 • 9

upvoted a paper 5 months ago

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

upvoted 2 collections 5 months ago

MobileCLIP Models + DataCompDR Data

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4 • 24

MobileNetV4 pretrained weights

Weights for MobileNet-V4 pretrained in timm • 17 items • Updated Sep 22 • 17

upvoted 2 papers 6 months ago

MobileNetV4 -- Universal Models for the Mobile Ecosystem

Paper • 2404.10518 • Published Apr 16 • 2

On the Efficiency of Convolutional Neural Networks

Paper • 2404.03617 • Published Apr 4 • 4

upvoted 3 articles 6 months ago

Article

MobileNet-V4 (now in timm)

By

•

Jun 17

• 39

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

May 16

• 17

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 210

upvoted 3 collections 6 months ago

PaliGemma Release

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 137

PaliGemma FT Models

108 items • Updated Jul 31 • 27

Searching for Better ViT Baselines

Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 25 items • Updated Aug 21 • 13

upvoted a collection 8 months ago

PDF Document / OCR Datasets

Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 47