multilingual vision models - a rchan26 Collection

rchan26 's Collections

mechanistic interpretability with sparse autoencoders

multilingual vision models

updated Sep 3

Some papers I read for understanding vision models and also adding multilingual capabilities to them

Upvote

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

Note Great overview on vision-language modelling approaches
Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 13

Note - Among the first models to incorporate instruction fine-tuning in vision language models to improve multimodal chat capabilities - Generate 158k synthetically generated visual instruction samples using GPT-4 - Original LLaVA model incorporated a pretrained Vicuna LM and a pretrained CLIP vision encoder and fine-tuned end-to-end on generated vision-language instruction-following data
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
PALO: A Polyglot Large Multimodal Model for 5B People

Paper • 2402.14818 • Published Feb 22 • 23

Note - Develop a multilingual LLM covering 10 languages using similar architecture as in LLaVA - use pretrained CLIP and Vicuna using a two-layer MLP with GELU as the projector between modalities - Multilingual dataset curated using a semi-automated translation pipeline - Translated LLaVA dataset
Aya 23: Open Weight Releases to Further Multilingual Progress

Paper • 2405.15032 • Published May 23 • 27

Note - Introduce Aya 23, a family of multilingual (text-only) language models supporting 23 languages based on Cohere’s “Command” model which are pre-trained using a data mixture that includes texts from 23 languages and fine-tuned on the Aya multilingual instruction data - Available in 8B and 35B sizes
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12 • 45
Parrot: Multilingual Visual Instruction Tuning

Paper • 2406.02539 • Published Jun 4 • 35
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 49
PaLI: A Jointly-Scaled Multilingual Language-Image Model

Paper • 2209.06794 • Published Sep 14, 2022 • 2

Upvote