Leo Tronchon's picture

Leo Tronchon

Leyo

·

AI & ML interests

Multimodal, Self-Supervised Learning

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Putting ethical principles at the core of research lifecycle

Organizations

Leyo's activity

upvoted an article 4 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 66

upvoted an article 5 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 164

upvoted an article 6 months ago

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

May 16

• 17

upvoted a paper 6 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

upvoted a collection 7 months ago

Idefics2 🐶

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88

upvoted a paper 8 months ago

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14 • 54

upvoted a paper 9 months ago

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

Paper • 2402.10896 • Published Feb 16 • 14

upvoted a paper 10 months ago

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 14

upvoted 6 papers about 1 year ago

ConvNets Match Vision Transformers at Scale

Paper • 2310.16764 • Published Oct 25, 2023 • 20

FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 31

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 34

Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 19

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Paper • 2309.02591 • Published Sep 5, 2023 • 14

upvoted 4 papers over 1 year ago

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Paper • 2308.01907 • Published Aug 3, 2023 • 11

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

Generative Pretraining in Multimodality

Paper • 2307.05222 • Published Jul 11, 2023 • 21

OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Paper • 2306.16527 • Published Jun 21, 2023 • 47